# Data Cleanse Transformation

The *Data Cleanse* Transformation object is a new addition to Astera's library of transformations. It makes it all the more convenient for business users to cleanse raw data and present it in a more refined, standardized, and enterprise-ready format.

Using the *Data Cleanse* Transformation object, users can clean up data from null values and redundant text and characters, and prepare raw data for transformation, validation, profiling, and record-matching functions.

{% embed url="<https://youtu.be/qaJQEttqHdA>" %}

## **Using the Data Cleanse Transformation**

1. Retrieve the data you want to cleanse using the relevant [*Source* ](https://documentation.astera.com/v/astera-data-stack-v8/dataflows/sources/setting-up-sources)object.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Fl9rS1X0aaI5FitvIzuMg%2Fimage.png?alt=media&#x26;token=46e2ee2c-cec2-46f8-a74d-ee1192d0b403" alt=""><figcaption></figcaption></figure>

1. Now drag the *Data Cleanse* Transformation object from the *Transformations* section in the *Toolbox* and drop it onto the designer.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Fa5UK9iVU50Oe6i7pdDwX%2Fimage.png?alt=media&#x26;token=e4f69a38-c7e8-433e-8da5-29c493c6cf35" alt=""><figcaption></figcaption></figure>

1. This is what a *Data Cleanse* transformation object looks like.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FGNeoQcNutkALdAh943dS%2Fimage.png?alt=media&#x26;token=a9716f4e-1a35-4b23-bc0f-d7fb35f3eeb1" alt=""><figcaption></figcaption></figure>

1. Map data from the source object to the *Data Cleanse* transformation object. You can either auto-map the entire data set or map a few fields manually.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2F9ESRYYFXuwHswFEsz9KT%2Fimage.png?alt=media&#x26;token=cffccb28-419b-435a-865c-0013ec5406ba" alt=""><figcaption></figcaption></figure>

1. Now you have to specify the criteria for data cleansing. Right-click on the *Data Cleanse* transformation object and select *Properties* from the context menu.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FwU5BwpW95uIeAgOsM0u0%2Fimage.png?alt=media&#x26;token=7242158f-5a8d-4611-905e-3cc549ceb8ad" alt=""><figcaption></figcaption></figure>

1. This will open a new window where you have to set up the properties for data cleansing. The first window is the *Layout Builder* window. Here you can customize the layout of your dataset by adding, removing, or renaming fields. Once you have created the layout, click *Next* to proceed to the next window.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Fv9TocYqrILh3kDkmM831%2Fimage.png?alt=media&#x26;token=618984ed-4e36-4d6b-99cf-bf431ffbd3aa" alt=""><figcaption></figcaption></figure>

1. This is where you set up the data cleanse criteria for your source data.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2F1oGgIPHUy2l5r2E94YUM%2Fimage.png?alt=media&#x26;token=7453513c-f943-429c-9ba3-d00df81ab914" alt=""><figcaption></figcaption></figure>

You can find various data cleanse options arranged in different sections. Let’s explore them one by one.

### **Remove**

The options provided within this category allow you to remove values, spaces, tabs, and line breaks from your data. You can find the following options within this category:

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FSEP0z3fd9lmmXTRMh8fC%2Fimage.png?alt=media&#x26;token=8e3f85d9-b232-4597-83b3-b711c7aaaafe" alt=""><figcaption></figcaption></figure>

* *All whitespaces* – Removes all whitespaces from the data
* *Leading and trailing whitespaces* – Removes whitespaces preceding and succeeding the values
* *Tabs and line breaks* – Removes tabs and line breaks within source values
* *Duplicate whitespaces* – Removes double spaces from the data
* *Letters* – Removes all alphabets from the data
* *Digits* – Removes all digits from the data
* *Punctuation* – Removes all punctuation from the data
* *Specified Character* – Removes any specific character from the data

### **Replace Nulls**

As the name suggests, the options within this category allow you to replace null values inside a string or numeric field with a corresponding value – blank in case of a string, and zero, in case of a numeric field.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Fc013NOLI9vEhaodkXRWx%2Fimage.png?alt=media&#x26;token=e8c05420-cdc6-42a0-901c-7490aab3a829" alt=""><figcaption></figcaption></figure>

* *Null strings with blanks:* Replace all null strings with blanks
* *Null numerics with zeros*: Replace all null numeric values with zeros

## **Find and Replace**

The *Find and Replace* options enable users to replace a value in the source dataset with another value.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FIuasqatkwfbHsFOSxfAW%2Fimage.png?alt=media&#x26;token=8f6c425d-77f4-4f55-8d06-af371b02feec" alt=""><figcaption></figcaption></figure>

It also provides users the option to choose whether the find and replace function is to be performed on a case-sensitive basis. You can select a search mode from three options:

1. *Normal* – Will perform a normal find and replace function

As in this example, we want to change the status from ‘Planned’ to ‘Scheduled.’

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FzOaOSwknYGiB8wkOw7mM%2Fimage.png?alt=media&#x26;token=8a7d10ed-52b7-44d0-915a-19c517a95ae4" alt=""><figcaption></figcaption></figure>

So, we will type in ‘Planned’ in the *Find* field and ‘Scheduled’ in the *Replace* field.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FH65nEIaFDHBfcTHkZvah%2Fimage.png?alt=media&#x26;token=7d99c4c9-4eaf-4d7e-b613-a9d302b4ac0c" alt=""><figcaption></figcaption></figure>

Now, if we look at the output, we can see that the *Data Cleanse* transformation object has found and replaced the status values from ‘Planned’ to ‘Scheduled.’

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2F2DgxlzbU7t6KU0f3i3HD%2Fimage.png?alt=media&#x26;token=d758b7fc-c5fb-4e74-a1cb-69918945aa53" alt=""><figcaption></figcaption></figure>

2. *Extended –* Allows you to search for tabs(\t), newline(\r\n), or a character by its value (\o, \x, \b, \d, \t, \n, \r and \\) and replace with the desired value

In the example below, we want to replace whitespaces within our source values with a hyphen (-).

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Fkb5gJqyopIlTyaLCOLke%2Fimage.png?alt=media&#x26;token=9c8e2a1c-3a40-49dd-8085-76cfae098e5b" alt=""><figcaption></figcaption></figure>

So, we will type ‘\s’ in the *Find* field and ‘-’ in the *Replace* field.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2F0NzWdDRk6Hyiz0x5VHoj%2Fimage.png?alt=media&#x26;token=29518715-b817-477c-b4ba-ebd438efe69f" alt=""><figcaption></figcaption></figure>

Now, if we look at the output, we can see that the *Data Cleanse* transformation object has found and replaced whitespaces from within the values with a hyphen.

Preview before applying the ‘’*Extended*’’ *Find and Replace* function

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Fnbw3cVDafHZLARoUR64a%2Fimage.png?alt=media&#x26;token=23f73fcd-5886-46c4-b6bd-701a818e3ecf" alt=""><figcaption></figcaption></figure>

Preview after applying the *“Extended Find and Replace”* function

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Fecq7h0lIsb2ydIInnTLi%2Fimage.png?alt=media&#x26;token=93a65424-c9e4-41cc-8168-a47cdf70dd4d" alt=""><figcaption></figcaption></figure>

3. *Regular Expressions* – Allows you to find and replace a value based on a regular expression

In the example below, we want to replace the “ALFKI” value(s) in the *CustomerID* field with “A1234”.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FqsRATZf5HnTD3A8MfYyq%2Fimage.png?alt=media&#x26;token=4cb64570-295f-4669-af30-a8344f74124a" alt=""><figcaption></figcaption></figure>

For this, we will write a regex in the Find field and the desired value in the Replace field.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FPs4YvSXMRZSrXimQ9Qxh%2Fimage.png?alt=media&#x26;token=3aa8356b-77fb-44f2-9e42-f49bbd095bd7" alt=""><figcaption></figcaption></figure>

Now, if we look at the preview, we can see that Astera has replaced values in the source data with the desired values.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2F8nVB27hwmqG5DEtGBxsf%2Fimage.png?alt=media&#x26;token=a5c5e2e9-a4b2-4203-a209-f0d442b21cc7" alt=""><figcaption></figcaption></figure>

### **Case**

*Case* options allow users to convert the letter case of source data to *Upper*, *Lower*, or *Title* case.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2F1Pq5rXytQRkbrfemSXmq%2Fimage.png?alt=media&#x26;token=ffc92b85-1f43-4e1e-bd86-5dd8b28c5429" alt=""><figcaption></figcaption></figure>

You can choose from the following options:

* *None* – Keeps the letter case as is

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2F97zMlLtTaoqpIYxOHFw9%2Fimage.png?alt=media&#x26;token=a13e62e0-11f0-4c10-bbbe-97033df58bb9" alt=""><figcaption></figcaption></figure>

* *Upper* – Changes all letters to upper-case

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Fn9OqYLZ66f3TCsHv38Nv%2Fimage.png?alt=media&#x26;token=9e0a00d0-4acf-4da4-9aa4-ce85c185d447" alt=""><figcaption></figcaption></figure>

* *Lower* – Changes all letters to lower-case

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Ftrl9uPH13udfG4hXIFXY%2Fimage.png?alt=media&#x26;token=556cc147-fe83-4203-9647-57b60ba5057c" alt=""><figcaption></figcaption></figure>

* *Title* – Changes all letters to title case

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FrYOisPfhsMIc137Rw8xE%2Fimage.png?alt=media&#x26;token=110e3a7c-e496-46d0-87c6-c2a016473233" alt=""><figcaption></figcaption></figure>

### **Modify Data**

The *Modify Data* option provides you the flexibility and convenience of applying an expression to all fields in your data. Check the *Run expression on all fields* option to activate this feature.

The *Run Expression on all fields* feature was previously called *ApplyToAll* and offered as a standalone transformation in Astera 7.5 and previous releases. It had a limited functionality though, compared to the all-new *Data Cleanse* Transformation object, which is why it has been replaced altogether with the *Data Cleanse* Transformation object in Astera Data Stack 7.6 and now in Astera 8.0.

The *Run Expression on all fields* feature is enabled by default for any existing flows created prior to Astera 7.6. This means that existing flows created on Astera 7.5, or a prior release will continue to work seamlessly on the 8.0 upgrade and won’t require any modification at all.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2Fp3KibKc6ldUSp5Pb937s%2Fimage.png?alt=media&#x26;token=f9851e92-9d39-4af4-97a5-78c51375d3c7" alt=""><figcaption></figcaption></figure>

Now click on this button to open the *Expression Builder*.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FoNcDrNeQgSuyidUYbI3a%2Fimage.png?alt=media&#x26;token=e5495e60-e6e9-4cd3-83b9-e06f537aca4f" alt=""><figcaption></figcaption></figure>

Here, you can choose from the extensive library of built-in expressions and apply it to all mapped fields by adding it to a “$FieldValue” parameter.

As in this example, we have mapped a regular expression to the “$FieldValue” parameter.

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FGV4JKwiWUFDILptuIgZU%2Fimage.png?alt=media&#x26;token=671a1bdb-2852-42ee-80e8-c2978d89af77" alt=""><figcaption></figcaption></figure>

Now if we look at the preview, you can see that Astera Data Stack has applied the regular expression to all fields and removed whitespaces from the values.

Preview before running the expression on all fields:

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FHv7bNUTawgqiGwwDW2PF%2Fimage.png?alt=media&#x26;token=23ceb5f1-3625-4862-bc59-f921561b8a71" alt=""><figcaption></figcaption></figure>

Preview after running the expression on all fields:

<figure><img src="https://750977703-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqHxyVNGb7tSdIWecl6Ru%2Fuploads%2FoEzmadqwGlFLaUPx07Jt%2Fimage.png?alt=media&#x26;token=839f3353-b42f-4aaa-836f-46a9b0cf2e96" alt=""><figcaption></figcaption></figure>

This function was previously performed using the [*ApplyToAll* transformation](https://documentation.astera.com/astera-data-stack-v8/dataflows/transformations/broken-reference) in Astera 7.5 and previous releases. However, in Astera 7.6 and Astera 8.0, you can perform this and other data cleanse tasks using the *Data Cleanse* Transformation object.
