# Aggregate Transformation

### Overview

The *Aggregate* transformation object provides the functionality to create aggregations of your dataset, using aggregate functions such as Sum, Count, First, Last, Min, Max, Average, Var or Standard Deviation. The dataset can be split into groups so that the aggregate value(s) can be generated for the group instead of the whole dataset. For example, calculate product count by month of year, or get average sales price by region and year.

*Aggregate* Transformation can be applied to unsorted data or data sorted on group by values. When applied to an input stream that is sorted on group by fields, *Aggregate* Transformation performs substantially better and consumes very little memory. Alternatively, when applied to unsorted datasets, *Aggregate* Transformation may consume substantial memory resources for large data sets and may slow down the performance of the server.

### Video

{% embed url="<https://www.youtube.com/watch?t=95s&v=M9aILyEphFc>" %}

## Use Case

In this scenario, we have products data stored in a csv file. The source file contains information such as *ProductID*, *Supplier ID*, *UnitPrice* of the various products, *QuantityPerUnit* of products available etc. We want to derive the following information from our source data:

1. Number of products per category
2. Total price of all the products per category
3. Minimum price per category
4. Maximum price per category

We will use the *Aggregate* Transformation object to derive the required information.

## How to Work with Aggregate Transformation

1. From the sources section in the Toolbox, drag-and-drop a [*Delimited File Source*](https://documentation.astera.com/astera-data-stack-v10/dataflows/sources/delimited-file-source) object to the dataflow designer.
2. To work with the *Aggregate* Transformation, drag-and-drop the *Aggregate* Transformation object from *Toolbox > Transformations > Aggregate*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F9k5o0jVxArvyDibFsya9%2Fgif-drag-and-drop.gif?alt=media\&token=568de67b-3088-4a14-acee-604d812e4092)

3. Right-click on the transformation object and select *Properties*. The *Layout builder* window will now open.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F9RU25EGUDaVgm5AD032u%2F1%20v2.png?alt=media\&token=3336e97b-e0d7-424f-9606-89c28e02665e)

4. Here, you can write names of fields that you want to map to the transformation object in the *Name* column and specify the relevant *Aggregate Functions* for them.

For this case:

* *CategoryID*: We will select the *Group-By* option from the *Aggregate Function* drop-down list for this field as we want to group the records based on the available product categories.
* *ProductID*: For this field we will select the *Aggregate Function* *Count*, in order to calculate the number of products per category.
* *UnitPrice*: We will map this field thrice.

  * To calculate *TotalPricePerCategory,* select the function *Sum* function.
  * To calculate *MaxPricePerCategory,* select the *Max* function.
  * To calculate *MinPricePerCategory,* select the *Min* function.

  ![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F8N5ztDtONPyPvcnMgcqP%2F1.png?alt=media\&token=0ef3aa7a-1107-4d78-9684-0a4045948596)

5. Click on *Next*. The *Aggregate Transformation Properties* window will now open.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FFpowPGDKn4H45KpUnxwz%2F5.png?alt=media\&token=dfd2fb25-e5ba-40fd-a684-5f6130caa59d)

There are three sorting options in *Aggregate* transformation:

* *Incoming data is pre-sorted on group by fields*: This option *Aggregate* requires data to be sorted by the specified *Group-By* field.
* *Sort Incoming data before building aggregate*: This option will first sort the incoming data, then build its aggregate.
* *Build aggregate using unsorted data*: This option will build aggregate using the incoming data  whether it is sorted or not.

6. Click on *Next*. The *Config Parameters* window will now open, where you can further configure and define parameters for the *Aggregate* transformation.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Fju1GhRddxQUVRPpvWbKd%2F6.png?alt=media\&token=bde41338-f5a8-4a74-955e-0d8dc4404c34)

7\. Click *Next*. This is the *General Options* window. Click *OK*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FwGPJWZIdYC6xS2C9JLH3%2F7.png?alt=media\&token=63621469-5e24-47ff-9882-160ada09f157)

*General Options* Window:

This window shares options common to most objects in the dataflow.

*Clear Incoming Record Messages*

When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.

*Do Not Process Records with Errors*

When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.

The *Comments* input allows you to enter comments associated with this object.

8. After you have configured the properties, click *OK*.
9. You will see the fields in the object that were added in the *Layout Builder* window.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FY0yDywVrVBCacWgQzw74%2F8.png?alt=media\&token=33e8e606-0683-4bb7-9922-fd11d4f02393)

10\. Map the data fields from the source object to the transformation object. You can auto-map the entire dataset from the source to the transformation object, or only map selected fields that you want to work with. In this case, we will map *CategoryID*, *ProductID* and *UnitPrice* as those are the fields we want to find aggregations for\*.\*

{% hint style="info" %}
**Note:** *UnitPrice* field has been mapped three times as these will determine *TotalPricePerCategory, MaximumPricePerCategory* and \**MinimumPriceperCategory*.
{% endhint %}

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FVB0VYtscnderQj2q7YHi%2F2.png?alt=media\&token=da4a551c-cea5-4ef7-86d2-464594768702)

11. Right-click on the *Aggregate* transformation object and click *Preview Output*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F0xhEmPBR8x9aS9M9nKjx%2Fgif-preview-output.gif?alt=media\&token=ffe72502-66bf-4d06-b884-e84dc820e5b6)

12. You will see that the specified *Aggregate Functions* have been applied.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FVkw8YTymMUt7hySetdrn%2F3.png?alt=media\&token=8902ad18-aeb8-49e4-9fad-f809affabac0)
