> For the complete documentation index, see [llms.txt](https://documentation.astera.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.astera.com/astera-data-stack-v9/dataflows/transformations/distinct-transformation.md).

# Distinct Transformation

### Overview

The *Distinct* transformation object in Astera Data Stack removes duplicate records from the incoming dataset. You can use all fields in the layout to identify duplicate records or specify a subset of fields, also called key fields, whose combination of values will be used to filter out duplicates.

### Use Case

Consider a scenario where we have data coming in from an *Excel Workbook Source* and the dataset contains duplicate records. We want to filter out all the duplicate records from our source data and create a new dataset with distinct records from our source data. We can do this by using the *Distinct* transformation object in Astera. To achieve this, we will specify data fields with duplicate records as Key Values.

In order to add a separate node for duplicate records inside the *Distinct* transformation object, we will check the option: *Add Duplicate Records*. Then we will map both distinct and duplicate outputs to a *Delimited File Destination.*

### Using the Distinct Transformation

1. Drag-and-drop an [*Excel Workbook Source*](/astera-data-stack-v9/dataflows/sources/excel-workbook-source.md) from the *Toolbox* to the dataflow as our source data is stored in an Excel file.
2. To apply the *Distinct* transformation to your source data, drag and drop the *Distinct* transformation object from the *Transformations* section in the *Toolbox*. Map the fields from the source object by dragging the top node of the *ExcelSource* and to the top node of the *Distinct* transformation object. To do this, go to *Toolbox > Transformations > Distinct*.

![](/files/IkW6Zb4bj5B3HxJkpGNf)

3. Now, right-click on the *Distinct* transformation object and select *Properties*. This will open the *Layout Builder* window where you can modify fields (add or remove fields) and the object layout.

![](/files/Np4g6CBVMMfSFOjtjQq7)

4. Click *Next*. The *Distinct Transformation Properties* window will now open.

![](/files/jnHt3kFLhVlBX3NheMOs)

*Data Ordering*:

* *Data is Presorted on Key Fields:* Select this option if the incoming data is already sorted based on defined key fields.
* *Sort Incoming Data:* Select this option if your source data is unsorted and you want to sort it.
* *Work with Unsorted Data:* When this option is selected, the *Distinct* transformation object will work with unsorted data.

5. On this window, the distinct function can be applied to the fields containing duplicate records by adding them under the *Key Field*.

{% hint style="info" %}
**Note:** In this case, we will specify the *Name* and *Type* fields as *Key Fields*
{% endhint %}

![](/files/blDjH9JfHiak1niHaSt9)

You can now write the *Distinct* output to a destination object. In this case, we will write our output into a [*Delimited File Destination*](/astera-data-stack-v9/dataflows/destinations/delimited-file-destination.md) object.

![](/files/4VXQ5JVUvFvb7VrRUbhZ)

6. Right-click on the *Delimited File Destination* object and click *Preview Output*.

Your output will look like this:

![](/files/IppVcBAS8UUc36rBa3zB)

#### Adding Duplicate Records

1. To add duplicate records in your dataset check the *Add Duplicates Output* option in the *Distinct Transformation Properties* window.

![](/files/vysgqH5ZnzU3yZapwjwK)

2. When you check *Add Duplicates Output*, three output nodes will be added in the *Distinct* transformation object.
   1. *Input*
   2. *Output\_Distinct*
   3. *Output\_Duplicate*

{% hint style="info" %}
**Note:** When you check the *Add Duplicate Records* option, mappings from the source object to the *Distinct* transformation object will be removed.
{% endhint %}

![](/files/mI2b5UkL56wO9wRsoHOa)

3. Now, map the objects by dragging the top node of the *ExcelSource* object to the *Input* node of the *Distinct* transformation object.

![](/files/5XgoY5kDyfLuJvote7KS)

4. You can now write the *Output\_Distinct* and *Output\_Duplicate* nodes to two different destination objects. In this case, we will write our output into a [*Delimited File Destination*](/astera-data-stack-v9/dataflows/destinations/delimited-file-destination.md) object.

![](/files/8OBG1JbnLkmNu1XYyY57)

Distinct output:

![](/files/XWIEfewkFFOvDFYFwF4r)

Duplicate output:

![](/files/9vazSeAllc3rsQfBNN8d)

As evident, the duplicate records have been successfully separated from your source data.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.astera.com/astera-data-stack-v9/dataflows/transformations/distinct-transformation.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.