# Creating Data Profile

The *Data Profile* feature provides a complete data field statistic – basic and detailed – containing information such as the data type, minimum/maximum values, data count, error count etc. The statistics are collected for each of the selected fields at the time the dataflow runs.

In this document, we will learn how to create a *Data Profile* in Astera.

### Video

{% embed url="<https://www.youtube.com/watch?v=qgd0HycKobo>" %}

### Using Data Profile

In this case, we will use data from a *Customers* [*Database Table Source*](https://documentation.astera.com/astera-data-stack-v10/dataflows/sources/database-table-source).

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FftnkHBygjV4ow0q7yStM%2F1_data.PNG?alt=media\&token=d27fc71b-9670-4bee-a30a-93713ea4848c)

We want to collect statistics on these fields of data. For this purpose, we will use Astera’s *Data Profile* feature.

1. To get the *Data Profile* object from the Toolbox, go to *Toolbox > Data Profiling > Data Profile*. If you are unable to see the Toolbox, go to *View > Toolbox* or press Ctrl + Alt + X.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FT4YlrmkC5URi7QHiAUj3%2F2_toolbox.png?alt=media\&token=7194889d-6864-496a-bc07-6ea8925706b0)

2. Drag-and-drop the *Data Profile* object onto the dataflow designer.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FcSnIL8GfRRWADjeZYeLk%2F3_object.png?alt=media\&token=2a3de451-1181-4ff6-b7f5-1edf94000a41)

You can see that the *Data* *Profile* object is empty right now. This is because we have not mapped any fields to it yet.

3. Auto-map the fields from the source object onto the profile object.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Fu48ypzrSFVl1qfWmwxYN%2F4_mapping.PNG?alt=media\&token=248da425-9c0c-4380-8617-1fd028b02788)

{% hint style="info" %}
**Note:** A Data Profile object is designed to capture statistics for an entire field layout. For this reason, it should be linked to the main Output port of the object whose field statistics you wish to collect.
{% endhint %}

#### Configuring the Data Profile Object

1. To configure the *Data Profile* object, right-click on its header and select *Properties* from the context menu.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Fp6HwSZHTj8F8RHDVqth5%2F5_properties.png?alt=media\&token=f41aef4c-4095-44d2-9c91-776a0b1f4b82)

A configuration window will open. The first screen you will see is the *Layout Builder*. This is where we can create or delete fields, change field names, and their data type.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FIX8pgBnwIt8KBoXg2obJ%2F6_layout_builder.PNG?alt=media\&token=b59dfa10-ad16-499f-b544-f27cafd7d552)

2. Click *Next*. This is the *Properties* window.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FSRf41oohlvR5l5xghG8a%2F7_config_window.PNG?alt=media\&token=ba0430d0-0de8-4f5b-92fe-77a38a07b9e5)

3. Here we will provide the *Profile File* path to specify where the profile should be stored.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FgR87cTUtBouZzyDORUo2%2F8_file_path.png?alt=media\&token=2fca788f-1e51-44ee-ba1d-957efb411465)

4. Specify the type of *Field Statistic*s to be collected.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FjiJ6kpOoOfvrGBTOwQ5k%2F9_field_statistics.PNG?alt=media\&token=747e3f95-7cf8-4f2f-9cdd-6ab9d75a0172)

*Field Statistics* dropdown allows you to choose detail levels of statistics to collect. Select among the following detail levels:

* *Basic Statistics*: This is the default mode. It captures the most common statistical measures for the field’s data type.
* *No Statistics*: No statistics is captured by the *Data Profile*.
* *Detailed Statistics – Case Sensitive Comparison*: Additional statistical measures are captured by the *Data Profile*, for example Mean, Mode, Median etc. using case-sensitive comparison for strings.
* *Detailed Statistics – Case Insensitive Comparison*: Additional statistics are captured by the *Data Profile*, using case insensitive comparison for strings.

In this case, we are collecting a *Detailed Statistics – Case Sensitive Comparison*

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F59F8guFZlNKDPkJefgq8%2F10.png?alt=media\&token=e0eea059-cb22-4326-918a-72fa6dd81b42)

Click *OK*.

#### Executing the Task

1. After configuring the settings for the *Data Profile* object, click on the *Start Dataflow* icon <img src="https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FbQprSmIbgFT72CrFZ0sA%2F11_%20icon.PNG?alt=media&#x26;token=483daf2c-6698-408c-9aad-54738363ad58" alt="" data-size="line"> from the toolbar located at the top of the window.

A *Job Progress* window will open at this instant and will show you the trace of the job.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FWaeCIkoTStbpScOFVB98%2F12_job_progress.png?alt=media\&token=ce724b60-857e-4606-bc81-b73e4802bf98)

2\. Click on the *Profile* link provided in the *Job Progress* window and the profile will open in Astera. Expand the *Profile* node to see each field inside the object. Click on these fields to see the collected statistical values.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F8p7f4SC0XH2FxFKGptR7%2F13_profile.PNG?alt=media\&token=d8f86bf1-f78f-47df-8880-56b5c18a2c15)
