# Merge Transformation

Merge transformation in Astera is designed to merge data fragments from disparate sources, based on some predefined logic, and present it in a consolidated form to draw actionable insights.

{% embed url="<https://youtu.be/uEnurRRGlPU?list=PLmyM3TZtVUyJfz1bkpZ-CxJ33fAwHIbug>" %}

### Sample Use Case

Let’s assume that there is an organization that maintains customers’ data in two different departments – Marketing and Sales. Marketing stores information in a database table and the Sales department maintains an Excel sheet for storing customers' information. We want to merge the information from both sources so that we have consolidated data.

### Using Merge Transformation

1. Drag and drop the relevant source objects from the Toolbox to the designer. Click here to find out how to [set up sources](https://documentation.astera.com/astera-data-stack-v9/dataflows/sources/setting-up-sources).

{% hint style="info" %}
**Note:** In this case, the marketing department has the customer information stored in a database, whereas the sales department records customer information in an Excel file. Therefore, we will use a *Database Table Source* and an *Excel Workbook Source* as source objects.
{% endhint %}

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FSRehcmc46WeA0GdhUXoM%2F1.png?alt=media\&token=b9e3d3c0-8653-4e8a-92b1-688de7ccf5d2)

2. The *Merge* transformation object merges data from a single source only, we will first combine both the records using the *Union* transformation object. We will then map fields from the data sources to the *Union* transformation object and add a new field *DataSource* to keep track of which information is coming from which source.

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FSjOdVrZauCASxT6tWBKs%2F2.png?alt=media\&token=5ca94ad6-fb17-49aa-9d16-d9e5352ac239)

3. Drag the *Merge* transformation object from the transformations section in the Toolbox and drop it on the dataflow designer.

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FMVjHmZqCdouv9CmsVfHO%2F18.png?alt=media\&token=b517fb1f-995e-4280-b0d1-21f091ea1a26)

This is what a *Merge* transformation object looks like:

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FimV7QoZu7jpJ3kzTmFC9%2F4.png?alt=media\&token=087f60f6-56a1-408b-a467-25e3a89cd014)

4. Map the *Union* transformation object’s output to the *Merge* transformation object.

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FYuT8aimFyI4ZomSy5sfw%2F5.png?alt=media\&token=71e97f25-58d6-40a4-a6b7-af4ca6d7a28d)

5. Right-click on the *Merge* transformation object to set up transformation properties in the *Layout Builder* window. This is what the *Layout Builder* window looks like:

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FyJrRU6dbaPkJc0xeoJo3%2F6.png?alt=media\&token=fd877424-8895-4823-8e38-9659185b42c7)

6. In the *Layout Builder* window, specify the *Primary Key*. This is a common identifier that identifies similar records from various sources and merges the information against these records.

Since we are consolidating different customer records, we will set up *CustomerID* as the *Primary Key* in this case.

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FGUt2qBE3ryFG4YShu2QQ%2F7.png?alt=media\&token=dc28f573-ea9c-48a4-94a4-0e6f530f403d)

7. Next, you have to specify the field to be used as *Version*. If your data is coming from multiple sources, the *Version* field shows which source the data is coming from in the final merged output. In this case, we will use the *Data Source* field we added in the *Union* transformation object as the *Version* field.

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FSDq59ia3MPw8qWu9GYKf%2F8.png?alt=media\&token=104a518c-ee26-48a8-a58c-c32414758681)

8. Next, specify the *Survivor Type* for each field. *Survivor Type* allows you to choose the survivor values – the values you want to retain from your data sources – for each field. *Survivor Types* are set as *First* by default.&#x20;

However, depending on your case, you can choose the *Survivor Type* from the following options:

| ***Survivor Type***      | **Description**                                                                                                                                                                                                    |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| *First*                  | Returns data from the first data source for that field                                                                                                                                                             |
| *Last*                   | Returns data from the last data source for that field                                                                                                                                                              |
| *Maximum*                | Returns the maximum value from all available input data sources                                                                                                                                                    |
| *Minimum*                | Returns the minimum value from all available input data sources                                                                                                                                                    |
| *Count*                  | Returns the total count number of all values that exist in the field                                                                                                                                               |
| *Sum*                    | Aggregates the values that exist in that field in all the input sources and then returns the arithmetic sum of those values                                                                                        |
| *Comma Separated Values* | Separates the values that exist in that field in all the input sources with a comma and then return that representation. This option is only available when the output field is assigned the *'String' Data Type*. |

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2Fygefc7TZqVOWPIV67j67%2F9.png?alt=media\&token=644babcc-9a0f-4504-82d4-bf058a501d42)

Since *CustomerID*, *CompanyName*, and *ContactName* records are common in both the source files (*Customers\_Marketing* and *Customers\_Sales*), we will set the *Survivor Type* as *First* for these fields. For the other fields with missing records, the *Survivor Type* will be set as follows:

| **Field**      | ***Survivor Type***      |
| -------------- | ------------------------ |
| *ContactTitle* | *First*                  |
| *Address*      | *First*                  |
| *City*         | *First*                  |
| *Region*       | *Last*                   |
| *PostalCode*   | *First*                  |
| *Country*      | *First*                  |
| *Phone*        | *Last*                   |
| *Fax*          | *Last*                   |
| *DataSource*   | *Comma Separated Values* |

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FPD1bTqsBVt8xnRQx2epl%2F10.png?alt=media\&token=c637440f-4b14-458a-a557-2fe2c737c533)

9. Once you have set the *Survivor Type*, specify *Precedence* for each field. *Precedence* is the order in which you want the source data to be assessed. For instance, we have common data fields in both the sources, but different and missing records. We can set appropriate *Precedence* values to bring data from the desired data source.

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2F8gknygRdGSCuiNdNeZ8y%2F11.png?alt=media\&token=e0a01bdb-a0d1-49c7-ab24-285d8564eae8)

10. Next, you can set a specific *Condition*, and the *Merge* transformation will process records based on the criteria specified for a particular field.

In this case, we have specified *‘IsNotNull’* for the *Address* and *Region* fields since we want none of these fields to be empty or have missing records.

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FPxhgCYzBGTtNmfQCfE1r%2F12.png?alt=media\&token=c9640613-2a5c-405b-8ce0-7e8c4e5a63f1)

11. Depending on the requirements of the business case, you can add a logical expression in the *Expression* field to process the incoming data value and transform it into the output according to the logic defined. The *Expression* field can be used for mathematical and financial calculations, date and time manipulations, comparisons, and conversion functions.

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2F6autI8uD49llRqbOvJIX%2F13.png?alt=media\&token=c8b749e2-b976-40f6-8712-8387a58cd203)

12. Click *Next* to proceed to the *Merge Transformation Properties* window. Here, you will see the following three checkboxes:
    * *Case Sensitive* – Check if data is to be assessed on a case-sensitive basis
    * *Sort Input* – Check if the incoming data is not already sorted
    * *Version Order Descending* – Check if you want the data to be sorted in a descending version order

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FvfHIdKhQzLwXtvKgWM50%2F14.png?alt=media\&token=829d2e43-2c30-4234-a1c6-04ce602ae3d6)

13. Click *Next* to proceed to the *General Options* window. Here, you can add *Comments*, instructions, or any relevant information about the transformation. This will not change or alter your transformation action in any way.

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FPGiuqiSd6cZks0w4A7Cq%2F15.png?alt=media\&token=44a59158-258f-4650-a259-86502990c640)

You may also skip this step by clicking *OK* in the previous step (on the *Merge Transformation* window) to close the *Transformation Properties* window.

14. To get the output, right-click on the *Merge* transformation object, and click on *Preview Output*. You will get the merged records based on your specified transformation properties.

**Data Preview before applying Merge transformation**:

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FUDfihWEXWhm2PrrxjzVD%2F16.png?alt=media\&token=5f9c8643-b44d-4a35-a6c4-a45082d7f092)

**Data Preview after applying Merge transformation**:

![](https://3891069449-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F28dJIkGHacXhBlux3efx%2Fuploads%2FGmhu1znknCLEVziZdm00%2F17.png?alt=media\&token=96797104-0fa7-44a6-888b-375b802481a3)

### Usage and Benefits

*Merge* transformations can be applied in cases where data is sorted into different records. Astera Data Stack makes it extremely convenient for users to get consolidated data that is stored in different sources, while also allowing them the flexibility to choose how the output should appear, through the various transformation properties.
