# Creating Data Profile

The *Data Profile* feature provides a complete data field statistic – basic and detailed – containing information such as the data type, minimum/maximum values, data count, error count, etc. The statistics are collected for each of the selected fields at the time the dataflow runs.

{% embed url="<https://youtu.be/9_Dj3Qxp27Y>" %}

## **Using Data Profile**

In this case, we will use data from a *Customers* [*Database Table Source*](https://documentation.astera.com/v/astera-data-stack-v7/dataflows/sources/database-table-source).

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2FjSbbog1c8ZfDBf6QxsQM%2F0.png?alt=media)

We want to collect statistics on these fields of data. For this purpose, we will use Astera's *Data Profile* feature.

1. To get the *Data Profile* object from the Toolbox, go to *Toolbox > Data Profiling > Data Profile*. If you are unable to see the Toolbox, go to *View > Toolbox* or press Ctrl + Alt + X.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2FTimNDKdiKi9yle4mmvY1%2F1.png?alt=media)

2. Drag and drop the *Data Profile* object onto the dataflow designer.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2F3MTx65OinWpCkY9xTkL5%2F2.png?alt=media)

You can see that the *Data* *Profile* object is empty right now. This is because we have not mapped any fields to it yet.

3. Auto-map the fields from the source object onto the profile object.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2F9SvR4m4qbTbmCNJwFaMl%2F3.png?alt=media)

{% hint style="info" %}
**Note:** A *Data Profile* object is designed to capture statistics for an entire field layout. For this reason, it should be linked to the main *Output* port of the object whose field statistics you wish to collect.
{% endhint %}

## **Configuring the Data Profile Object**

1. To configure the *Data Profile* object, right-click on its header and select *Properties* from the context menu.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2F261h8hkoeojTghSto12U%2F4.png?alt=media)

A configuration window will open. The first screen you will see is the *Layout Builder*. This is where we can create or delete fields, change field names, and their data type.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2FT9mwnWTYbgnjeAWHljOW%2F5.png?alt=media)

2. Click *Next*. This is the *Properties* window.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2FE2jm7NUOvwmTrb9nU1ea%2F6.png?alt=media)

3. Here we will provide the *Profile File* path to specify where the profile should be stored.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2FV8SSeBn1IJpkFrsGIXP4%2F7.png?alt=media)

4. Specify the type of *Field Statistics* to be collected.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2F8iiE7scUJquSurDzsBce%2F8.png?alt=media)

*The Field Statistics* dropdown allows you to choose detailed levels of statistics to collect. Select among the following detail levels:

* *Basic Statistics*: This is the default mode. It captures the most common statistical measures for the field’s data type.
* *No Statistics*: No statistics are captured by the *Data Profile*.
* *Detailed Statistics – Case Sensitive Comparison*: Additional statistical measures are captured by the *Data Profile*, for example, Mean, Mode, Median, etc. using case-sensitive comparison for strings.
* *Detailed Statistics – Case-Insensitive Comparison*: Additional statistics are captured by the *Data Profile*, using case-insensitive comparison for strings.

In this case, we are collecting a *Detailed Statistics – Case Sensitive Comparison*

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2F6ViIAaTpsVA0XKyxe2MK%2F9.png?alt=media)

Click *OK*.

## **Executing the Task**

1. After configuring the settings for the *Data Profile* object, click on the *Start Dataflow* icon from the toolbar located at the top of the window.

A *Job Progress* window will open at this instant and will show you the trace of the job.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2FJDgnbBOB7WiqvdtU6yCJ%2F10.png?alt=media)

2. Click on the *Profile* link provided in the *Job Progress* window and the profile will open in Astera. Expand the *Profile* node to see each field inside the object. Click on these fields to see the collected statistical values.

![](https://627607815-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6xzBT0roYJkfVS5klkLl%2Fuploads%2FzRsIyEtpAxkeOQENNuLY%2F11.png?alt=media)
