# Distinct Transformation

### Overview

The *Distinct* transformation object in Astera removes duplicate records from the incoming dataset. You can use all fields in the layout to identify duplicate records, or specify a subset of fields, also called key fields, whose combination of values will be used to filter out duplicates.

### Video

{% embed url="<https://www.youtube.com/watch?v=Ihe6dECy_Uo>" %}

### &#x20;Use Case

Consider a scenario where we have data coming in from an *Excel Workbook Source* and the dataset contains duplicate records. We want to filter out all the duplicate records from our source data and create a new dataset with distinct records from our source data. We can do this by using the *Distinct* transformation object in Astera Data Stack. To achieve this, we will specify data fields with duplicate records as Key Values.

In order to add a separate node for duplicate records inside the *Distinct* transformation object, we will check the option: *Add Duplicate Records*. Then we will map both distinct and duplicate outputs to a *Delimited File Destination.*

Let’s see how to do that.

### How to work with Distinct Transformation

1. Drag-and-drop an Excel Workbook Source from the Toolbox to the dataflow as our source data is stored in an Excel file.
2. To apply the *Distinct* transformation to your source data, drag-and-drop the *Distinct* transformation object from the Transformations section in the Toolbox. Map the fields from the source object by dragging the top node of the *ExcelSource* and to the top node of the *Distinct* transformation object. To do this, go to *Toolbox>Transformations>Distinct*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FqN1lHzXROpYrIHR0utoO%2Fgif-drag-and-drop.gif?alt=media\&token=caa5323d-d500-47fd-8cea-f2865cb7b3bc)

3. Now, right-click on the *Distinct* transformation object and select *Properties*. This will open the *Layout Builder* window where you can modify fields (add or remove fields) and the object layout.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FDnkMjqTl93dePryK2r84%2F2.png?alt=media\&token=485f78ab-a762-4030-b093-69e439bb3586)

4. Click *Next*. The *Distinct Transformation Properties* window will now open.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FhCDGdJnQjttsFxDbeKcT%2F4.4-1578470398330.png?alt=media\&token=9bfda3cb-5d78-463e-95c1-666625380f4b)

*Data Ordering*:

* *Data is Presorted on Key Fields:* Select this option if the incoming data is already sorted based on defined key fields.
* *Sort Incoming Data:* Select this option if your source data is unsorted and you want to sort it.
* *Work with Unsorted Data:* When this option is selected, the *Distinct* transformation object will work with unsorted data.

5. On this window, the distinct function can be applied on the fields containing duplicate records by adding them under *Key Field*.

{% hint style="info" %}
**Note:** In this case, we will specify the *Name* and *Type* fields as *Key Fields*
{% endhint %}

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FvjxVqfW3Srudd9LbZUTN%2Funchecked-1578465864521.png?alt=media\&token=9d816b1c-a356-413e-8d2e-3bb1a5cee9b8)

You can now write the *Distinct* output to a destination object. In this case, we will write our output into a [*Delimited destination*](https://docs.astera.com/projects/centerprise/en/8/destinations/delimited-file-destination.html) object.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FdXUsgvakWu0xDbQfkg2g%2Fnew1.png?alt=media\&token=0c3cd627-2a59-4157-99b3-1528d1f24a31)

6. Right-click on *Delimited Destination* object and click *Preview Output*.

Your output will look like this:

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FoO98eUicihFxlNRlJyMZ%2F7-1578470602652.png?alt=media\&token=d5368a1b-4d0d-46c4-a1ca-21d6bf558084)

#### To add duplicate records

1. To add duplicate records in your dataset check the *Add Duplicates Output* option in the *Distinct Transformation Properties* window.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Fmzjt0V1FY1h4WBgJUEjR%2F1.1.png?alt=media\&token=4655fa13-7db3-462c-a001-63deb9951fbe)

2. When you check this option, three output nodes would be added in the *Distinct* transformation object.

* *Input*
* *Output\_Distinct*
* *Output\_Duplicate*

{% hint style="info" %}
**Note:** When you check the *Add Duplicate Records* option, mappings from the source object to the *Distinct* transformation object will be removed.
{% endhint %}

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F1roK0LMuxOheQyBMd7CZ%2Fimage-20200108112347728.png?alt=media\&token=7072c581-104d-494d-915d-a8c6a5d7b2bd)

3. Now, map the objects by dragging the top node of *ExcelSource* object to the *Input* node of the *Distinct* transformation object.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FHd9x1D53XoFWWrSdgpuJ%2Fimage-20200108112817089.png?alt=media\&token=b0472f52-538c-4464-b003-9e41b1ae097a)

4. You can now write the *Output\_Distinct* and *Output\_Duplicate* nodes to two different destination objects. In this case we will write our output into a [*Delimited destination*](https://documentation.astera.com/astera-data-stack-v10/dataflows/destinations/delimited-file-destination) object.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FiU9FzygHSn5RZC8qYiOD%2F6-1578470581631.png?alt=media\&token=e92d2f60-a2c4-44ed-909e-9f91310d19ee)

Distinct output:

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FiNZbw9U9TByphrvRSxl3%2F7.png?alt=media\&token=8b738de1-85a7-486d-9e46-63707612d414)

Duplicate output:

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FhhLWrqVc3HKx8gRj7ng5%2F8.png?alt=media\&token=1dc38e7e-7c62-40c3-ba8f-3db09ce27719)

As evident, the duplicate records have been successfully separated from your source data.
