# Distinct Transformation

### Overview

The *Distinct* transformation object in Astera removes duplicate records from the incoming dataset. You can use all fields in the layout to identify duplicate records, or specify a subset of fields, also called key fields, whose combination of values will be used to filter out duplicates.

### Video

{% embed url="<https://www.youtube.com/watch?v=Ihe6dECy_Uo>" %}

### &#x20;Use Case

Consider a scenario where we have data coming in from an *Excel Workbook Source* and the dataset contains duplicate records. We want to filter out all the duplicate records from our source data and create a new dataset with distinct records from our source data. We can do this by using the *Distinct* transformation object in Astera Data Stack. To achieve this, we will specify data fields with duplicate records as Key Values.

In order to add a separate node for duplicate records inside the *Distinct* transformation object, we will check the option: *Add Duplicate Records*. Then we will map both distinct and duplicate outputs to a *Delimited File Destination.*

Let’s see how to do that.

### How to work with Distinct Transformation

1. Drag-and-drop an Excel Workbook Source from the Toolbox to the dataflow as our source data is stored in an Excel file.
2. To apply the *Distinct* transformation to your source data, drag-and-drop the *Distinct* transformation object from the Transformations section in the Toolbox. Map the fields from the source object by dragging the top node of the *ExcelSource* and to the top node of the *Distinct* transformation object. To do this, go to *Toolbox>Transformations>Distinct*.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/RUstoup2x0dt04a8rBU4/gif-drag-and-drop.gif)

3. Now, right-click on the *Distinct* transformation object and select *Properties*. This will open the *Layout Builder* window where you can modify fields (add or remove fields) and the object layout.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/mY078MfcHXH9shM7jm7s/2.png)

4. Click *Next*. The *Distinct Transformation Properties* window will now open.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/l7SeSGD1ORkzHe7lx0iM/4.4-1578470398330.png)

*Data Ordering*:

* *Data is Presorted on Key Fields:* Select this option if the incoming data is already sorted based on defined key fields.
* *Sort Incoming Data:* Select this option if your source data is unsorted and you want to sort it.
* *Work with Unsorted Data:* When this option is selected, the *Distinct* transformation object will work with unsorted data.

5. On this window, the distinct function can be applied on the fields containing duplicate records by adding them under *Key Field*.

{% hint style="info" %}
**Note:** In this case, we will specify the *Name* and *Type* fields as *Key Fields*
{% endhint %}

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/L4E63ZM0GfauhIdIg3lb/unchecked-1578465864521.png)

You can now write the *Distinct* output to a destination object. In this case, we will write our output into a [*Delimited destination*](https://docs.astera.com/projects/centerprise/en/8/destinations/delimited-file-destination.html) object.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/7kItEKIE519TnjcVnASi/new1.png)

6. Right-click on *Delimited Destination* object and click *Preview Output*.

Your output will look like this:

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/dfCHudel50jqEngLty1O/7-1578470602652.png)

#### To add duplicate records

1. To add duplicate records in your dataset check the *Add Duplicates Output* option in the *Distinct Transformation Properties* window.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/iDVhuSSwQC3utm2e8dHY/1.1.png)

2. When you check this option, three output nodes would be added in the *Distinct* transformation object.

* *Input*
* *Output\_Distinct*
* *Output\_Duplicate*

{% hint style="info" %}
**Note:** When you check the *Add Duplicate Records* option, mappings from the source object to the *Distinct* transformation object will be removed.
{% endhint %}

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/gmVecZij5vZF1zHOrRBm/image-20200108112347728.png)

3. Now, map the objects by dragging the top node of *ExcelSource* object to the *Input* node of the *Distinct* transformation object.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/1aJ84QYIuK8oPPtI6OvT/image-20200108112817089.png)

4. You can now write the *Output\_Distinct* and *Output\_Duplicate* nodes to two different destination objects. In this case we will write our output into a [*Delimited destination*](https://documentation.astera.com/dataflows/destinations/delimited-file-destination) object.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/uL3He87if4k9iPSrB9Nc/6-1578470581631.png)

Distinct output:

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/uPKCdYb3dkldzBZ15mKx/7.png)

Duplicate output:

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/xACsjJkFuifSMgb2R42e/8.png)

As evident, the duplicate records have been successfully separated from your source data.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.astera.com/dataflows/transformations/distinct-transformation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
