# Distinct Transformation

### Overview

The *Distinct* transformation object in Astera removes duplicate records from the incoming dataset. You can use all fields in the layout to identify duplicate records, or specify a subset of fields, also called key fields, whose combination of values will be used to filter out duplicates.

### Video

{% embed url="<https://www.youtube.com/watch?v=Ihe6dECy_Uo>" %}

### &#x20;Use Case

Consider a scenario where we have data coming in from an *Excel Workbook Source* and the dataset contains duplicate records. We want to filter out all the duplicate records from our source data and create a new dataset with distinct records from our source data. We can do this by using the *Distinct* transformation object in Astera Data Stack. To achieve this, we will specify data fields with duplicate records as Key Values.

In order to add a separate node for duplicate records inside the *Distinct* transformation object, we will check the option: *Add Duplicate Records*. Then we will map both distinct and duplicate outputs to a *Delimited File Destination.*

Let’s see how to do that.

### How to work with Distinct Transformation

1. Drag-and-drop an Excel Workbook Source from the Toolbox to the dataflow as our source data is stored in an Excel file.
2. To apply the *Distinct* transformation to your source data, drag-and-drop the *Distinct* transformation object from the Transformations section in the Toolbox. Map the fields from the source object by dragging the top node of the *ExcelSource* and to the top node of the *Distinct* transformation object. To do this, go to *Toolbox>Transformations>Distinct*.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/RUstoup2x0dt04a8rBU4/gif-drag-and-drop.gif)

3. Now, right-click on the *Distinct* transformation object and select *Properties*. This will open the *Layout Builder* window where you can modify fields (add or remove fields) and the object layout.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/mY078MfcHXH9shM7jm7s/2.png)

4. Click *Next*. The *Distinct Transformation Properties* window will now open.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/l7SeSGD1ORkzHe7lx0iM/4.4-1578470398330.png)

*Data Ordering*:

* *Data is Presorted on Key Fields:* Select this option if the incoming data is already sorted based on defined key fields.
* *Sort Incoming Data:* Select this option if your source data is unsorted and you want to sort it.
* *Work with Unsorted Data:* When this option is selected, the *Distinct* transformation object will work with unsorted data.

5. On this window, the distinct function can be applied on the fields containing duplicate records by adding them under *Key Field*.

{% hint style="info" %}
**Note:** In this case, we will specify the *Name* and *Type* fields as *Key Fields*
{% endhint %}

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/L4E63ZM0GfauhIdIg3lb/unchecked-1578465864521.png)

You can now write the *Distinct* output to a destination object. In this case, we will write our output into a [*Delimited destination*](https://docs.astera.com/projects/centerprise/en/8/destinations/delimited-file-destination.html) object.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/7kItEKIE519TnjcVnASi/new1.png)

6. Right-click on *Delimited Destination* object and click *Preview Output*.

Your output will look like this:

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/dfCHudel50jqEngLty1O/7-1578470602652.png)

#### To add duplicate records

1. To add duplicate records in your dataset check the *Add Duplicates Output* option in the *Distinct Transformation Properties* window.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/iDVhuSSwQC3utm2e8dHY/1.1.png)

2. When you check this option, three output nodes would be added in the *Distinct* transformation object.

* *Input*
* *Output\_Distinct*
* *Output\_Duplicate*

{% hint style="info" %}
**Note:** When you check the *Add Duplicate Records* option, mappings from the source object to the *Distinct* transformation object will be removed.
{% endhint %}

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/gmVecZij5vZF1zHOrRBm/image-20200108112347728.png)

3. Now, map the objects by dragging the top node of *ExcelSource* object to the *Input* node of the *Distinct* transformation object.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/1aJ84QYIuK8oPPtI6OvT/image-20200108112817089.png)

4. You can now write the *Output\_Distinct* and *Output\_Duplicate* nodes to two different destination objects. In this case we will write our output into a [*Delimited destination*](https://documentation.astera.com/dataflows/destinations/delimited-file-destination) object.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/uL3He87if4k9iPSrB9Nc/6-1578470581631.png)

Distinct output:

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/uPKCdYb3dkldzBZ15mKx/7.png)

Duplicate output:

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/xACsjJkFuifSMgb2R42e/8.png)

As evident, the duplicate records have been successfully separated from your source data.
