Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Constant Value Transformation returns a single, predefined value for all records in a dataset.
In this example, we have an excel worksheet containing Employees’ data. The data is being written to a database table. The database table contains an additional field that stores department information. We want to pass a constant value to the Department field. To do this, we will use the Constant Value transformation object in Astera. Constant Value transformation will be mapped to the Department field to append the name of the department with the final output.
Drag-and-drop an Excel Workbook source from the Sources section in the Toolbox.
Select a destination object from the Destinations section in the Toolbox. This is where the transformed data will be written and stored. We will use a Database Table Destination.
Map the Employees dataset from the source object to the destination table, Employees_Database.
Note: There is an additional field (Department) in the destination object, but there is no such field in the incoming dataset. To append the Department field to the final output, we will use the Constant Value transformation object.
Now, drag-and-drop the Constant Value transformation object from Toolbox > Transformations > Constant Value.
Right-click on the Constant Value transformation object and select Properties.
The Constant Value Map Properties window will now open. Here you will see a Constant Value section where you can write any value to be appended to your output dataset.
In this case, the value will be ‘Marketing’, to specify the department of the employees in the source dataset.
Click Next. A General Options window will now open. Click OK.
General Options window:
This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages
When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors
When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.
Your transformation object will now look like this:
Now map the Value field from the Constant Value object with to the Department field in the Employees_Database destination object.
Right-click on the Employees_Database destination object and click Preview Output.
Your final output will look like this:
The Department field from our source dataset has been successfully appended with specified records in the final output, through the use of a Constant Value transformation object.
You can create many-to-one mappings with the help of a Denormalize transformation object in Astera. Denormalizing, also known as pivoting, allows you to combine a number of records into a single record (simply stating, it brings data from rows to columns). It is useful for reducing the number of tables in the schema, which simplifies querying and possibly improves reading performance.
The TaxInfo source data contains information about TaxType (City Tax, County Tax, State Tax, and Federal Tax), Tax Amount, and SSN (Social Security Number) of taxpayers.
We want to reduce the number of rows and create separate fields for City tax, County tax, State tax, and Federal tax.
Let’s see how we can use the Denormalize transformation object to achieve this.
First, we will use the Sort object to sort our source data based on the key field, SSN in our case.
Drag-and-drop the Denormalize transformation object from the Transformations section in the Toolbox.
Right-click on the Denormalize transformation object and select Properties from the context menu.
Following are the properties available for the Denormalize transformation object:
Layout Builder Window:
The Layout Builder window is used to add and/or remove fields, as well as to select their data type. The fields added in the Layout Builder will show in the Output node inside the object, as well as in all Input nodes corresponding to the number of mapping groups created (see below), with the exception of the key field(s).
Denormalize (Many-to-One) Transformation Properties Window:
Select Keys: Using the Select Keys dropdown, select the field or fields that uniquely identify the record. These keys will be used to match records between the normalized source and the denormalized destination.
Sort Input: Check this option only if values in the matching field (or fields) are not already sorted.
Driver Field Value: Enter the pivot values for your Denormalize transformation object. Using the example below, the pivot values would be City, State, Federal, and County.
Note: Entering Driver Key Values is required prior to mapping the Denormalize object. For each entry in the Driver Field Value grid, a new input mapping group is created in the object box.
General Options Window: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.
The Comments input allows you to enter comments associated with this object.
After you have configured the properties, click OK.
An Input mapping node will be created for each value previously specified in the Driver Field Value grid.
Map the fields and preview the output to view the denormalized data.
Transformations are used to perform a variety of operations on data as it moves through the dataflow pipeline. Astera Data Stack provides a full complement of built-in transformation enabling you to build sophisticated data maps. Astera Data Stack transformations are divided into two types— record level and set level.
Record level transformations are used to create derived values by applying a lookup, function, or expression to fields from a single record. Example of record level transformations include lookups, expression, and function transformations.
Set level transformations, on the other hand, operate on a group of records and may result in joining, reordering, elimination, or aggregation of records. Set transformations include join, sort, route, distinct, etc. Data sources and destinations are also considered set transformations.
In Astera, data records flow between set transformations. Record level transformations are used to transform or augment individual fields during these movements.
A record level transformation can be connected to only one set transformation. For instance, a lookup or expression cannot receive input from two different set transformations.
Other than transformations that enable combining multiple data sources—such as join, merge, and union—transformations can receive input from only one set transformation. Transformations, however, can receive input from any number of record level transformations as long as all these record level transformations receive input from the same transformation.
The Distinct transformation object in Astera removes duplicate records from the incoming dataset. You can use all fields in the layout to identify duplicate records, or specify a subset of fields, also called key fields, whose combination of values will be used to filter out duplicates.
Consider a scenario where we have data coming in from an Excel Workbook Source and the dataset contains duplicate records. We want to filter out all the duplicate records from our source data and create a new dataset with distinct records from our source data. We can do this by using the Distinct transformation object in Astera Data Stack. To achieve this, we will specify data fields with duplicate records as Key Values.
In order to add a separate node for duplicate records inside the Distinct transformation object, we will check the option: Add Duplicate Records. Then we will map both distinct and duplicate outputs to a Delimited File Destination.
Let’s see how to do that.
Drag-and-drop an Excel Workbook Source from the Toolbox to the dataflow as our source data is stored in an Excel file.
To apply the Distinct transformation to your source data, drag-and-drop the Distinct transformation object from the Transformations section in the Toolbox. Map the fields from the source object by dragging the top node of the ExcelSource and to the top node of the Distinct transformation object. To do this, go to Toolbox>Transformations>Distinct.
Now, right-click on the Distinct transformation object and select Properties. This will open the Layout Builder window where you can modify fields (add or remove fields) and the object layout.
Click Next. The Distinct Transformation Properties window will now open.
Data Ordering:
Data is Presorted on Key Fields: Select this option if the incoming data is already sorted based on defined key fields.
Sort Incoming Data: Select this option if your source data is unsorted and you want to sort it.
Work with Unsorted Data: When this option is selected, the Distinct transformation object will work with unsorted data.
On this window, the distinct function can be applied on the fields containing duplicate records by adding them under Key Field.
Note: In this case, we will specify the Name and Type fields as Key Fields
You can now write the Distinct output to a destination object. In this case, we will write our output into a Delimited destination object.
Right-click on Delimited Destination object and click Preview Output.
Your output will look like this:
To add duplicate records in your dataset check the Add Duplicates Output option in the Distinct Transformation Properties window.
When you check this option, three output nodes would be added in the Distinct transformation object.
Input
Output_Distinct
Output_Duplicate
Note: When you check the Add Duplicate Records option, mappings from the source object to the Distinct transformation object will be removed.
Now, map the objects by dragging the top node of ExcelSource object to the Input node of the Distinct transformation object.
You can now write the Output_Distinct and Output_Duplicate nodes to two different destination objects. In this case we will write our output into a Delimited destination object.
Distinct output:
Duplicate output:
As evident, the duplicate records have been successfully separated from your source data.
The Join transformation object joins records from two record sets. The join functionality is similar to standard SQL joins, but the distinguishing advantage of Astera's implementation is that you can join records from any two sources and not just two database tables.
This article covers how you can use Join transformation in Astera.
Suppose we have two database tables - Customers and Orders, as shown in the screenshot below, and we want to join these two tables.
Let’s see how we can join the two tables using the Join transformation object in Astera:
Drag-and-drop the Join transformation object from the Transformations section in the Toolbox. To open the Toolbox, go to View > Toolbox.
Map the fields from the source objects to the Join transformation object.
Note: To quickly add fields to the layout, drag-and-drop the bold node’s output port of the object whose layout you wish to replicate to the bold Join node of the Join object.
To set the properties for the Join transformation, double-click on the object or right-click and select Properties.
The first window is the Layout Builder window. You can manage the layout for your transformation (add or remove fields) from this window. Click Next to go to the next window.
The next window is the Relation Join Transformation Properties window. Select the Join Type from the drop-down menu. Astera supports four types of joins:
Inner Join – Joins records from two record sets based on matching values in key fields. Any unmatched records are discarded.
Left Outer Join – Similar to Inner Join, but unmatched records from the left record set (also called ‘first record set’) are preserved, and null values are written for the unmatched record in the right record set (also called ‘second record set’).
Right Outer Join – Similar to Inner Join, but unmatched records from the right record set (also called ‘second record set’) are preserved, and null values are written for the unmatched record in the left record set (also called ‘first record set’).
Full Outer Join - similar to Inner Join, but unmatched records from either record set are preserved, and null values are written for the unmatched record in the other record set.
Other options in this window:
Join in Database: Check this option if you want to join the tables in the database.
Case Sensitivity: Check this option if you want a case sensitive match of the values in the key fields.
Sort (Left/Right) Input: Specify whether the left input, the right input, or both, need to be sorted.
Select the key fields from the Left Field and Right Field drop-down lists. Click Next, then *OK.
Note: You can add multiple fields to create a composite key field.
You can now preview the output and see the consolidated data.
This window consists of options common to most objects in a dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then feed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect statistics.
The Comments input allows you to enter comments associated with this object.
The Expression transformation object in Astera defines an expression or logic to process an incoming value (or values). As a result, it may return a new set of values which do not depend on any user-provided input data. Expressions can also be modified and used as variables for various other calculations.
The Expression Transformation object uses an expression as a logic to transform data. You can write an expression of your own or use different functions or operations from Astera’s extensive library of built-in functions, such as string manipulations, data conversion, date and time manipulation, etc. You can also perform various operations such as mathematical calculations and comparisons, etc. using the Expression transformation object.
In this example, we have a sample dataset, Customers, stored in an Excel file. The address information in this source is split into multiple fields such as Address, Region, Country, and PostalCode. We want to concatenate the information in these fields and return it as full address in a new data field. For this, we will use Expression transformation object.
Retrieve your data in the dataflow by using one of the source objects from the Sources section in the Toolbox. In this example, we will work with an Excel Workbook Source.
Drag-and-drop the Expression transformation object from Toolbox>Transformations>Expression onto the designer.
Map the fields to be concatenated from the source object to the Expression transformation object. In this example, we have mapped the Address, Region, City, Country and PostalCode fields.
Now right-click on the Expression transformation object and select Properties from the context menu.
This will open the Layout Builder window where you can add or remove fields and modify your layout.
These are the following options on the Layout Builder window:
Name: This is where the field name is specified. You can change the name of existing fields if required.
Data Type: Specifies data type of the mapped fields.
Input: When checked, the field will be mapped as an input, with an input mapping port, to take data input from a source.
Output: When checked, the field will be mapped as an output. If an expression is present, the expression will be applied to this output.
Variable: Turns the field into a variable which can then be applied to other fields. These expressions are calculated first and then assigned to other expressions using it. Once a field turns into a Variable, it cannot be assigned for input or output mapping.
Expression: This is where the expression used for modifying the field or group of fields is specified.
Since we want to write the address details from multiple fields into a single field, let’s create a new field named Full_Address, and specify the Data Type as String and check the Output option.
You will find the following options in the Expression Builder window:
Functions: An extensive library of built-in functions from where you can select any function according to your requirement.
Expressions: Here, you can write an expression rule or choose one from the built-in functions in Astera.
Objects: In this panel, you can find all the fields in your layout listed under the Expression node. You can double click on any field name to map it to your expression.
In this example, we can either use a concatenate function from the built-in functions or write an expression of our own to return the complete address information in a single field.
Address + ‘ ‘ + Region + ‘ ‘ + City + ‘ ‘ + Country + ‘ ‘ + PostalCode
Note: Once you’ve built the expression, click on the Compile button to check whether or not the expression was compiled successfully. If the Compile Status is ‘Successful’, click OK.
You can now see your expression appear in the Expression field. Click Next.
A General Options screen will now open where you can add Comments and specify other General Options. Once you’re through with these general settings, click OK.
General Options screen: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example, a destination file that will capture record messages or a log that will capture messages and collect statistics as well.
In the Comments input section, you can write comments related to the object.
To preview the output, right-click on the Expression transformation object and select Preview Output from the context menu.
Here’s a preview of the concatenated output:
You can now map your output to a destination or some other transformation in the dataflow . In this example, we will be writing our output to a Delimited File destination.
You may rename your Destination object from the context menu options for this object. Here, we will rename it as Full_Address.
This concludes using the Expression transformation object in Astera.
The Filter transformation object in Astera is used to filter out records based on some pre-defined rule. The records that match the specified rule are filtered out and can be mapped further in the dataflow whereas the records that do not satisfy the specified condition are omitted. The rule or logic to filter out records from data can either be chosen from an extensive library of built-in functions or you can write one of your own.
Filter transformation is quite similar to Data Quality Rules in its functionality. However, unlike Data Quality Rules, which return an error or warning when the rule condition fails while still passing the record downstream, the Filter transformation will completely filter out any such records. The filtered records, as well as their status will not be accessible to any downstream object on the dataflow, including any type of log.
In this case, we have Customers data for a fictitious organization stored in a Delimited file source. We want to filter out the records in which:
Country = Germany
Contact title = Owner
To filter out these records from our source data, we will use Filter transformation object and write the relevant expression in the Expression Builder to achieve our desired output. We will then write our filtered output to a Fixed length destination.
Next, drag and drop the Filter transformation object from the Transformations section of the Toolbox to the designer and map fields from the source object to the Filter transformation object.
Right-click on the Filter transformation object and select Properties.
A Layout builder window will now open. Here you can modify fields and the object layout.
Click Next. This will take you to the Filter Transformation Properties window. Here, you can see the following three sections:
Functions: An extensive library of built-in functions organized in various categories. From here, you can select functions according to your requirement.
Expression: The filter expression will appear in this Expression box. You can either write your own expression or choose from the built-in functions library.
Objects: Contains the object layout. You can double click on any element in the layout to write it in the Expression field.
In this example, we want to filter out records of customers with the ContactTitle, ‘Owner’, and Country, ‘Germany’. For this, we will write the following expression in the Expression box:
Country = “Germany” and ContactTitle = “Owner”
After writing the expression, click on the Compile button to check if the expression is correct. If the Compile Status is Successful, the expression is correct. If not, then you need to check and rectify your expression before proceeding to the next window.
Click Next. This will take you to the Config Parameters window where you can further configure and define parameters for the Filter transformation object.
Click Next to proceed to the General Options window. This window consists of options common to most objects in a dataflow, such as:
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then feed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect statistics.
The Comments input allows you to enter comments associated with this object.
The Filter transformation object has been configured. Click OK.
To preview filtered records, right-click on the Filter transformation object and select Preview Output.
The output would look like this:
You can now write your output to a destination or further transform it by applying some transformation.
Rename your destination object by double-clicking its header. Here, we will rename it as German_Customers.
This concludes using the Filter transformation object in Astera.
Merge transformation in Astera is designed to merge data fragments from disparate sources, based on some predefined logic, and present it in a consolidated form to draw actionable insights.
Let’s assume that there is an organization that maintains customers’ data in two different departments – Marketing and Sales. Marketing department stores information in a database table and Sales department maintains an Excel sheet for storing customers information. We want to merge the information from both the sources, so that we have consolidated data.
Note: In this case, the marketing department has the customer information stored in a database, whereas the sales department records customer information in an Excel file. Therefore, we will use a Database Table Source and an Excel Workbook Source as source objects.
The Merge transformation object merges data from a single source only, we will first combine both the records using the Union transformation object. We will then map fields from the data sources to the Union transformation object and add a new field DataSource to keep track of which information is coming from which source.
Drag the Merge transformation object from the transformations section in the Toolbox and drop it on the dataflow designer.
This is what a Merge transformation object looks like:
Map the Union transformation object’s output to the Merge transformation object.
Right-click on the Merge transformation object to set up transformation properties in the Layout Builder window. This is what the Layout Builder window looks like:
In the Layout builder window, specify the Primary Key. This is a common identifier that identifies similar records from various sources and merges the information against these records.
(Since we are consolidating different customer records, we will set up CustomerID as the Primary Key in this case.)
Next, you have to specify the field to be used as Version. If your data is coming from multiple sources, the Version field shows which source the data is coming from in the final merged output. In this case, we will use the Data Source field we added in the Union transformation object as the Version field.
Next, specify the Survivor Type for each field. Survivor Type allows you to choose the survivor values – the values you want to retain from your data sources – for each field. Survivor Types are set as First by default. However, depending on your case, you can choose the Survivor Type from the following options:
Since CustomerID, CompanyName, and ContactName records are common in both the source files (Customers_Marketing and Customers_Sales), we will set the Survivor Type as First for these fields. For the other fields with missing records, the Survivor Type will be set as follows:
Once you have set the Survivor Type, specify Precedence for each field. Precedence is the order in which you want the source data to be assessed. For instance, we have common data fields in both the sources, but different and missing records. We can set appropriate Precedence values to bring data from the desired data source.
Next, you can set a specific Condition, and the Merge transformation will process records based on the criteria specified for a particular field.
(In this case, we have specified ‘IsNotNull’ for Address and Region fields since we want none of these fields to be empty or have missing records.)
Depending on the requirements of the business case, you can add a logical expression in the Expression field to process the incoming data value and transform it into the output according to the logic defined. The Expression field can be used for mathematical and financial calculations, date and time manipulations, comparisons and conversion functions.
Click Next to proceed to the Merge Transformation Properties window. Here, you will see the following three checkboxes:
Case Sensitive – Check if data is to be assessed on a case-sensitive basis
Sort Input – Check if the incoming data is not already sorted
Version Order Descending – Check if you want the data to be sorted in a descending version order
Click Next to proceed to the General Options window. Here, you can add Comments, instructions, or any relevant information about the transformation. This will not change or alter your transformation action in any way.
You may also skip this step by clicking OK in the previous step (on the Merge Transformation window) to close the Transformation Properties window.
To get the output, right-click on the Merge transformation object, and click on Preview Output. You will get the merged records based on your specified transformation properties.
Data Preview before applying Merge transformation:
Data Preview after applying Merge transformation:
Merge transformations can be applied in cases where data is sorted into different records. Centerprise makes it extremely convenient for users to get consolidated data that is stored in different sources, while also allowing them the flexibility to choose how the output should appear, through the various transformation properties.
The List Lookup transformation object is a type of lookup that stores information in the metadata. Which means that your lookup data is stored in the dataflow itself. List Lookup uses a list of values for both the input and output fields. You can use it to look up certain values in your source data and replace them with other desired information. Or you can define a list of values in the lookup grid in properties, and the value is then looked up in the grid when you run your dataflow.
Let’s see how this object functions in Astera.
In this example, we are working with a Customers Fixed-Length File Source that contains customer information for a fictitious organization. The Customers data contains information about customers belonging to different countries. We want to convert the country names in this data into CountryCodes by using the List Lookup transformation object.
To preview the incoming data, right-click on the source object’s header and select Preview Output.
To start working, drag-and-drop the List Lookup object from Toolbox>Transformations>List Lookup.
This is what a List Lookup object looks like:
Map the field from the source dataset you want to look up values for, to the Value field in the List Lookup object.
Now, right-click on the List Lookup object and select Properties from the context menu. The List Lookup Map Properties window will now open.
Here, the first option we have is the Case Sensitive Lookup checkbox which is always checked by default. When this option is checked, the List Lookup will look up values on a case sensitive basis. If you do not want to perform a case sensitive look up, you can uncheck this option.
Next, you can see that there is a table where we can specify the Source Value and the Destination Value. Source Values are the values from your source data, and Destination Values are values which you want to replace with the source values.
For example, if we write the Destination Value as ‘DE’, against the Source Value ‘Germany’, Centerprise will write ‘DE’ in place of ‘Germany’ in the output.
This is one way of specifying the lookup values. However, there can be a lot of source values and typing them manually can be a tedious task. There is a more efficient way of doing this in Centerprise.
If you right-click on the List Lookup object, you can see that there is an option called Fill Lookup List with Unique Input Values.
Selecting this option prepopulates the source values in the Source Value column with unique source values.
Now, all you have to do is type in the Destination Values, that is, the codes corresponding to each country name.
Once you have populated the lookup list, click Next to proceed to the Lookup Options window.
In case, the lookup field does not return any value for a given source value, one of the following options should be selected:
No Message – Will not mark the unmatched source value as an error or warning
Add Error – The List Lookup table will trigger an error for the records that found no match in the lookup field
Add Warning – The List Lookup will generate a warning and return a null value for records from the source that do not have any matches in the lookup table
Additionally, when the value is not found in the lookup list, you can choose from the following options to assign it a value:
Assign Source Value – Will return the original value from the source.
Assign Null – Will return a null value for each unmatched source record.
This Value – You can type in a specific value in the given field, and the List Lookup will return the same value for each unmatched source value.
In this example, we will add an error and return the source value if the lookup value is not found. We will select the Add Error and Assign Source Value options. You can choose your preferred option and click *OK.
Now, if we preview the output, you can see that for each country name from the source table, the List Lookup has returned a corresponding code value.
These CountryCodes will flow through the annotated output port if you want to write your data to a destination.
This is how we can map the lookup values to a target or a transformation in the dataflow using the output port.
This concludes using the List Lookup transformation object in Astera.
The Aggregate transformation object provides the functionality to create aggregations of your dataset, using aggregate functions such as Sum, Count, First, Last, Min, Max, Average, Var or Standard Deviation. The dataset can be split into groups so that the aggregate value(s) can be generated for the group instead of the whole dataset. For example, calculate product count by month of year, or get average sales price by region and year.
Aggregate Transformation can be applied to unsorted data or data sorted on group by values. When applied to an input stream that is sorted on group by fields, Aggregate Transformation performs substantially better and consumes very little memory. Alternatively, when applied to unsorted datasets, Aggregate Transformation may consume substantial memory resources for large data sets and may slow down the performance of the server.
In this scenario, we have products data stored in a csv file. The source file contains information such as ProductID, Supplier ID, UnitPrice of the various products, QuantityPerUnit of products available etc. We want to derive the following information from our source data:
Number of products per category
Total price of all the products per category
Minimum price per category
Maximum price per category
We will use the Aggregate Transformation object to derive the required information.
To work with the Aggregate Transformation, drag-and-drop the Aggregate Transformation object from Toolbox > Transformations > Aggregate.
Right-click on the transformation object and select Properties. The Layout builder window will now open.
Here, you can write names of fields that you want to map to the transformation object in the Name column and specify the relevant Aggregate Functions for them.
For this case:
CategoryID: We will select the Group-By option from the Aggregate Function drop-down list for this field as we want to group the records based on the available product categories.
ProductID: For this field we will select the Aggregate Function Count, in order to calculate the number of products per category.
UnitPrice: We will map this field thrice.
To calculate TotalPricePerCategory, select the function Sum function.
To calculate MaxPricePerCategory, select the Max function.
To calculate MinPricePerCategory, select the Min function.
Click on Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
Click on Next. The Config Parameters window will now open, where you can further configure and define parameters for the Aggregate transformation.
7. Click Next. This is the General Options window. Click OK.
General Options Window:
This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages
When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors
When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.
The Comments input allows you to enter comments associated with this object.
After you have configured the properties, click OK.
You will see the fields in the object that were added in the Layout Builder window.
10. Map the data fields from the source object to the transformation object. You can auto-map the entire dataset from the source to the transformation object, or only map selected fields that you want to work with. In this case, we will map CategoryID, ProductID and UnitPrice as those are the fields we want to find aggregations for*.*
Note: UnitPrice field has been mapped three times as these will determine TotalPricePerCategory, MaximumPricePerCategory and *MinimumPriceperCategory.
Right-click on the Aggregate transformation object and click Preview Output.
You will see that the specified Aggregate Functions have been applied.
You can either write an expression directly in the Expression box in the Layout Builder window, or you can build an expression in the Expression Builder. To access the Expression Builder, click on option. An Expression Builder window will open.
Retrieve your source data by using the relevant source object from the Sources section in the Toolbox. In this case, our sample Customers data is stored in a delimited file so we will be using a object to retrieve this data in the dataflow.
In this case, we will write our output to a .
Drag-and-drop the relevant source objects from the Toolbox to the designer. (Click here to find how to .)
Drag-and-drop a from the Sources section in the Toolbox to the designer window.
From the sources section in the Toolbox, drag-and-drop a object to the dataflow designer.
Survivor Type | Description |
First | Returns data from the first data source for that field |
Last | Returns data from the last data source for that field |
Maximum | Returns the maximum value from all available input data sources |
Minimum | Returns the minimum value from all available input data sources |
Count | Returns the total count number of all values that exist in the field |
Sum | Aggregates the values that exist in that field in all the input sources and then returns the arithmetic sum of those values |
Comma Separated Values | Separates the values that exist in that field in all the input sources with a comma and then return that representation. This option is only available when the output field is assigned the 'String' Data Type. |
Field | Survivor Type |
ContactTitle | First |
Address | First |
City | First |
Region | Last |
PostalCode | First |
Country | First |
Phone | Last |
Fax | Last |
DataSource | Comma Separated Values |
The Reconcile Transformation object in Astera enables users to identify and reconcile new, updated, or deleted information entries within the existing data source. It can be applied in a wide variety of business scenarios that require a user to identify changes in multiple data records and capture them efficiently to drive critical business decisions.
Consider an example where we have a sample data of complaints filed by customers regarding the products and services provided by a company. Assume that source file 1 contains details and status of complaints on January 1st, 2018, and source file 2 contains details and status of complaints on February 1st, 2018. We want to track the progress of the resolved complaints during that one month.
To do so, we will reconcile the information contained in the source data files and capture changes using the Reconcile Transformation object.
Drag-and-drop the appropriate source objects and point them towards the files that you want to reconcile. In this example, we will be working with an Excel Workbook Source.
Drag-and-drop the Reconcile Transformation object from Toolbox> Transformations> Reconcile on the data flow designer.
This is what a Reconcile Transformation object looks like:
You can see the transformation object contains three child nodes (Output, Input_1, and Input_2) under the parent node, Reconcile.
Expand the input nodes to map fields from the source files.
Map the data fields from the source objects that you want to reconcile to the respective input node in the Reconcile Transformation object.
Right click on the Reconcile Transformation object’s header and select Properties.
This will open the Reconcile Transformation Properties window where you will see the following options:
Case Sensitive – Check this option, if you want to derive a case sensitive output
Sort Input 1 – Check this option, if the incoming data from source 1 is not sorted
Sort Input 2– Check this option, if the incoming data from source 2 is not sorted
You can choose the Reconcile Output Type from the following options:
Side By Side Element With Change Flag – If you want to get values from both sources presented side by side, with a separate column presenting the reconciled output by putting a flag – true, in case of an update, and false if it remains unchanged.
Original Layout – If you want to get the reconciled output for each record and corresponding information in the reconciled field.
Original Layout With Changed Element Collection – Applies when working with hierarchical data, to reconcile the information contained in child nodes.
Once you have selected the preferred Output Type, you can specify the records to be shown in the output by applying the Record filter and Inner Node Filter. You may choose one, multiple, or all of the following options by check marking the box.
Click Next to proceed to the Layout Builder window. Here you will have to specify a Key. This will be the common identifier in both the source files that will identify and reconcile records. In this case, we want to reconcile the progress on complaints made against each complaint_ID; therefore, we will select Complaint_ID as our Key.
Now go to the Survivor Value drop-down list to set the Survivor Value for each data field. Survivor Values are the values from your source datasets which you want to retain in the output.
You may select from the following Survivor Value options:
Second – If you want to derive the output value from the second source
First – If you want to derive the output value from the first source
First If Not Null, Otherwise Second – If you want to output a value from the first source if the record is not null, otherwise from the second source.
Second If not Null, Otherwise First – If you want to output a value from the second source if it is not null, otherwise from the first source.
Higher – If the input values are integers, and you want to choose the higher value
Lower – If the input values are integers, and you want to select the lower value
Expression – If you want to derive the output value based on a formula expression
Note: You will only need to specify the Survivor Value if you want to get the Original Layout or Original Layout With Changed Element Collection as output. The Survivor Value option does not apply if you want to get Side by Side Element with Change Flag as your output, since both of the source values are retained when this option is selected.
Click Next to proceed to the General Options window, then click OK.
General Options window - This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages - When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors - When this option checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect their statistics.
Now, right-click on the Reconcile Transformation object’s header and select Preview Output to get the reconciled output.
You will get one of the following outputs according to the output type selected in the Reconcile Transformation Properties window.
Side by Side Element with Change Flag
Original Layout
Original Layout With Changed Element Collection
Reconcile Transformation objects can be applied in a variety of business cases, particularly those where monitoring the changes in assorted data records is crucial in driving critical business decisions. Here are some of the benefits and uses of the Reconcile Transformation object:
Reconciles data by deriving old and new values for specific fields in the source data
Allows users to choose from various layout options to reconcile changes in the most appropriate way
Works effectively with structured and unstructured (hierarchical) data formats
Offers the flexibility to select the information to be retained through different survivor value options
The Subflow transformation object is used to call a Subflow that will run as part of your Dataflow. The Subflow acts like a wrapper for the objects it contains. Subflows can be seen as ‘black boxes’ inside your Dataflow, simplifying and streamlining the Dataflow design, increasing reusability, achieving an easier-to-understand view of your Dataflow, and possibly eliminating the need to know what is going on inside the Subflow so that you can focus on the output it creates. Over time, if the logic inside your Subflow changes, you can modify the Subflow, and the modified Subflow can now be used by the Dataflow calling your Subflow.
Subflows can be nested, meaning that a Subflow can call other Subflows. The output of the Subflow can be fed into downstream objects on your Dataflow, just like the output of any Dataflow object.
To add a Subflow transformation object, drag the Subflow object from the Transformations section in the Toolbox and drop it on to the dataflow designer.
An example of what a Subflow object might look like is shown below.
To configure the properties of a Subflow object after it was added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Subflow Properties window:
Enter the file path of your subflow in the Path input. Using UNC paths is recommended to allow for remote execution of your dataflow on a server.
General Options Window: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object, and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect their statistics.
The Comments input allows you to enter comments associated with this object.
Creating a Subflow is similar to creating a regular Dataflow, because a Subflow is essentially a Dataflow or a sub-Dataflow. The difference between the two, however, is that a Subflow may optionally have an input and output.
The Subflow input makes it possible to feed data into the Subflow from an upstream object on the Dataflow that calls the Subflow. The Subflow’s output is used to send data to the downstream Dataflow object connected to the Subflow.
To create a new Subflow, go to File > New > Subflow on the main menu.
Designing a Subflow is similar to designing a Dataflow. For more information on working with Dataflows, see the Creating Dataflow chapter.
When a Subflow tab is active, the flow Toolbox has an additional group labeled Subflow. This group has two objects that control the input and output properties of your subflow.
Subflow Input object is a connector controlling the input layout of your Subflow. Any data feeding into the Subflow will pass through the Subflow Input when the Subflow is called by a Dataflow or another Subflow.
To add the Subflow Input, drag the Subflow Input object from the Subflow group in the Toolbox and drop it on to the Subflow designer.
To configure the properties of a Subflow Input object after it is added to the Subflow, right-click on it and select Properties from the context menu. The following properties are available:
Layout Builder screen:
Meta Object Builder screen allows you to add or remove fields in the field layout, as well as select their data type. The fields added in Meta Object Builder will show in the SubflowInput1 node inside the object box.
General Options screen:
This screen shares the options common to most objects on the dataflow.
Subflow Output object is a connector controlling the output layout of your subflow. Any data leaving the subflow will pass through the Subflow Output when the subflow is called by a dataflow or another subflow.
To add the subflow output, drag the Subflow Output object from the Subflow group in the Flow toolbox and drop it on to the subflow.
To configure the properties of a Subflow Output object after it was added to the Subflow, right-click on it and select Properties from the context menu. The following properties are available:
Layout Builder window:
The Meta Object Builder window allows you to add or remove fields in the field layout, as well as select their data type. The fields added in the Meta Object Builder will show in the SubflowOutput1 node inside the object box.
General Options window:
This screen shares the options common to most objects on the Dataflow.
Some examples of using Subflows are shown below:
The Passthru Transformation object creates a new dataset based on the elements that were passed to the transformation. This is useful for organizing datasets for better readability and grouping of values that are otherwise calculated over and over again (e.g. a Sequence Generator Transformation).
In this document, we will learn how to use the Passthru Transformation object in Astera.
In this case, we are using an XML/JSON File Source.
The source file contains customers’ information in the parent node and their order and shipping details in the collection/child node.
Preview data by right-clicking on source object’s header and select Preview Output.
A Data Preview window will open, showing you the preview of the hierarchical data.
Now, we want to create a field in the parent node that contains the count of orders that arrived late for each customer and write these records to a destination file. This new field in the parent node will depend on two fields, RequiredDate and ShippedDate, that are already present in the collection/child node.
For this purpose, we will use the Passthru Transformation object.
To get a Passthru Transformation object from the Toolbox,* go to Toolbox > Transformations > Passthru and drag-and-drop the Passthru object onto the designer.
You can see that the dragged transformation object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map source fields to the transformation object by dragging-and-dropping the top node of the source object, SampleData to the top node of transformation object, Passthru.
Now the mapping is done. Let’s configure the Passthru Transformation object.
Right-click on the transformation object’s header and select Properties from the context menu. A configuration window for the Passthru Transformation object will open. The first window is the Layout Builder window. This is where we can create or delete fields, change their name or data types, mark any field as a variable, or attach expressions to any field.
Create a new field under the parent node. Let’s call it LateOrders and mark it as an Output field.
Click Next, and you will be directed to a Layout Modifications window.
Select the LateOrders field. Here you can modify any field by applying expressions to it.
Write the expression that counts the number of late arriving orders. Click “…” , located next to the expression box to open the Expression Builder window where you can make use of Astera’s extensive library of built-in functions and expressions.
Click on the Orders node and a panel (Options for Collection) will appear. These options are only available for collection nodes. This is where we will specify the rule for routing late arriving orders. Select Route Based on Rules and a new section for adding rules will appear on the screen.
Add a new rule by clicking on the Add switch condition icon
Now, write an expression to route late arriving orders and name this rule as “LateArrivals”.
Click OK. Now observe that a new collection node, Orders_LateArrivals, has been added to the Passthru Transformation object.
To preview data, right-click on the header of the transformation object and select Preview Output from the context menu.
A Data Preview window will open. On expanding records, you will get corresponding order details and appended details of late arriving orders.
To store the output for late arriving orders, write it to a destination file.
Right-click on the collection node, Orders_LateArrivals, and go to Write to > Excel Workbook Destination. An Excel Workbook Destination object will be added to the dataflow designer with auto-mapped fields from the collection field.
Configure settings for Excel Workbook Destination object.
Click on the Start Dataflow icon located in the toolbar at the top of the designer window to create the destination file.
Upon clicking this icon, an excel file will successfully be created. You can find the link to this file in the Job progress window.
The Switch Transformation object matches source data for the criteria specified by the user, and wherever the criteria is met, it replaces the information in the particular field with the desired output (also specified in the layout). This gives users more control over their data and helps them manage it in a better way.
There are two modes in Switch transformation:
Basic Mode
Enhanced Mode
The Basic Mode in the Switch transformation object matches for specific values in the source data and replaces it with the desired output. Enhanced Mode enables users to set lookup criteria by writing expressions and thus makes the feature more flexible.
Select your source by dragging the relevant object from the Sources section in the Toolbox on to the dataflow designer and configure the connection by putting in relevant details.
Note: In this example, we are working with an Excel Workbook Source that contains employee information for a fictitious organization, but you can select the source type from a wide range of options provided in Astera.
After setting up the connection and configuring the source file, drag the Switch transformation object from the Toolbox. If the Toolbox is hidden, go to View > Toolbox > Transformation > Switch.
Map the required fields from the source to the Switch transformation object.
Either double-click on the Switch Transformation object to open the Properties window or right-click on the object and go to Properties from the list.
The first window is the Layout Builder window. Here you can manage the fields (add and/or remove the fields) to make your Switch field layout.
Note: Switch Transformation only allows one output field. If you check the output box next to multiple fields, Astera will show a verification error.
After specifying the layout and selecting the relevant output field, click Next. This will take you to the Switch Map Properties window. At this stage, you can select the mode of the Switch transformation and assign the rules in the Case Value and Output Value sections.
Astera will look for the values specified in the Case Value column in the source file and replace it with the corresponding values in the Output Value column.
Note: Switch transformation only allows one default value and if you select multiple default values, Astera will give verification error.
In this example, the source table contains information about departments in numbers. We will use the Switch transformation object in basic mode, to switch the stored numeric information with the descriptive information.
Data Preview (Before Switch)
Data Preview (After Switch)
Steps 1-5 are going to remain the same even when working with the Enhanced Mode in Astera.
After you have created the layout in the Layout Builder window in the object’s properties, click Next and go to the Switch Map Properties window and select Enhanced Mode.
Note: Switch transformation is not field specific in the enhanced mode therefore, the option for selecting the Switch Field is disabled.
An organization stores information about employees’ salaries. They have set criteria about issuing credit cards, which is dependent on individual’s salary. In this scenario, to see which individual is eligible for which perk, define the salary range in the Case Expression field and specify the corresponding output in the Output Expression section (See the screenshot above). To store the information in a separate field, we created a new field (CreditCard) in the Layout Builder and selected it as the Output.
Data Preview (Before Switch)
Data Preview (After Switch)
The Tree Transform object in Astera enables users to transform data in a hierarchical structure. Users can create new fields in the parent node based on the data in the collection/child node. Tree Transform supports rule-based routing, filtering, merging and sorting of data while maintaining its hierarchy.
In this document, we will learn to use the Tree Transform object in Astera.
In this case, we are using the XML/JSON File Source to extract our source data. You can download this sample data from here.
The source file contains customers’ information in the parent node and their orders and shipping details in the collection/child node.
You can preview this data by right-clicking on source object’s header > Preview Output.
A Data Preview window will open, displaying a preview of the hierarchical source data.
Now, we want to create a new field in the parent node that contains the count of orders that arrived late for each customer and route these records to a destination file while maintaining the hierarchical format of the dataset.
This new field in the parent node will depend on two fields: RequiredDate and ShippedDate, that are already present in the collection/child node.
In other words, we are trying to transform a hierarchical data without flattening its structure. For this purpose, we will use the Tree Transform Transformation object.
To get a Tree Transform object from the Toolbox, go to Toolbox > Transformations > Tree Transform and drag-and-drop the Tree Transform object onto the designer.
The transformation object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map source fields onto the transformation by dragging and dropping the root node of the source object CustomerOrders onto the root node of the transformation object – TreeXform.
Now that the mapping is done, let’s configure the TreeXfrom object.
Right-click on the object’s header and select Properties from the context menu. A configuration window for Tree Transform will open. The first window is the Layout Builder window. This is where we can create or delete fields, change their name or data types, mark any field as a variable or attach expressions to fields.
Create a new field under the parent node. Let’s name it LateOrders.
Click Next, and you will be directed to a Layout Modifications window.
Select the LateOrders field. Here you can modify any field by applying expressions to it.
Write the expression that counts the number of late arriving orders.
Click “…” next to the expression box to open the Expression Builder where you can make use of the extensive library of built-in functions and expressions in Astera.
This creates the field values for LateOrders based on the field values of RequiredDate and ShippedDate fields.
Click on Orders node and a panel (Options for Collection) will appear. These options are only available for collection nodes.
Options for Collection:
Show – Shows collection node in theTree Transform object.
Flatten With Item Count – Flattens data based on the record count in the collection node against each item in the parent node.
Flatten Based on Rules – Flattens a part of hierarchical data based on predefined rules.
Route Based on Rules – Routes and creates subsets of data based on predefined rules.
Merge With Item Count – Merges data based on the record count in the collection node against each item in the parent node.
Hide – Hides collection node from the Tree Transform object.
Calculation Formula – An expression box used for writing rules to route or flatten hierarchical data.
Sort Based On Keys – Sorts hierarchical data based on the field in the collection node. Only available for Show and Flatten With Item Count options.
This is where we will specify the rule for routing late arriving orders. Select Route Based on Rules. A new section for adding rules will appear in the window.
Now, write an expression to route late arriving orders and name this rule “LateArrivals”.
Click OK. Now observe that a new collection node Orders_LateArrivals has been added to the Tree Transform object.
To preview data, right-click on the header of the transformation object and select Preview Output from the context menu.
A Data Preview window will open. On expanding the records, you will get corresponding order details and appended details of late arriving orders.
To store the output for late arriving orders, you can write it to a destination file or use that data further in the dataflow.
This concludes using the Tree Transform transformation object in Astera.
The Sort Transformation object in Astera is used to sort an incoming data stream. It also provides the option to remove duplicate values from the input.
It is a blocking transformation which means that input records are accumulated until the end of input. Blocking transformations affect the performance of overall dataflow because subsequent steps cannot be executed until all the records have been received and processed by the blocking transformation.
The Sort Transformation uses storage on the server for temporary data during sorting. The server must have enough capacity to store the entire data set and index.
We have retrieved the OrderDetails data from a database table. The dataset contains fields such as OrderID, ProductID, UnitPrice, Quantity, and Discount. This data is unsorted and we want to sort it in the ascending order of UnitPrice.
Drag the Sort Transformation object from the Transformations section in the Toolbox and drop it on the dataflow designer.
Map fields from the source object to the Sort Transformation object.
To configure the properties of the Sort Transformation object, right-click on its header and select Properties from the context menu.
A Layout Builder window will appear.
In this window you can either:
Add Member Objects or Collection Objects to the layout.
Edit the elements of the Sort object. The Layout Builder allows you to add or remove fields in the layout, as well as select their data type. The fields added in the Layout Builder will be added to the Input node inside the object box. Once you’re done making changes to the layout, click Next.
The next window is the Sort Transformation Properties window.
Here, you can specify the sorting criteria. You will see the following options on this screen:
Return Distinct Values Only: Check this option if you want to remove duplicate values from the output.
Treat Null as the Lowest Value: Check this option if you want a null value to be returned first in the ascending sort order, and conversely, have the null value returned last in the descending sort order.
Case Sensitive: Check this option if you require case sensitive comparison for strings.
On the same screen, you need to select the sorting Field from the drop-down list and set the Sort Order as Ascending or Descending.
Note: In this case, the sorting Field is UnitPrice and the Sort Order is Ascending.
The last window is the General Options window. Here you can add Comments or specify some General Options. Once done, click OK and the window will close.
You can now map the Sort Transformation object to a destination and preview the output.
Note: In this case we will write the data to an Excel Workbook Destination.
The output now shows the entire source data sorted in the ascending order of UnitPrice.
This is how the Sort Transformation can be used in Astera.
The Normalize transformation object in Astera Data Stack is used to create one-to-many mappings. It allows users to create multiple records from a single record by transposing the columns in a dataset into rows. In other words, you can take a dataset that has many columns and turn it into one that has many rows.
In this use case, we have a sample Taxpayers Excel dataset that contains information on the types and amounts of taxes paid by taxpayers. This includes taxpayers’ Social Security Number (SSN) and the different types of taxes that they have paid. These types are divided into different fields, such as City, County, State, and Federal, with each column containing the amount paid by each customer for a particular tax type. Our goal is to reduce the number of fields and increase the number of records by specifying the tax type in one consolidated field. To do this, we will use the Normalize object in Astera.
Drag the relevant source object from the Toolbox and drop it onto the designer. In this case, we will select the Excel Workbook Source object from Toolbox > Sources > Excel Workbook Source and configure it so that it reads data from the Taxpayers’ dataset.
To preview the data, right-click on the object header and select Preview Output from the context menu. Here is a look at the dataset:
To learn how you can configure an Excel Workbook Source object, click here.
Drag the Normalize object from Toolbox > Transformations > Normalize and drop it onto the designer.
You will notice that the object header contains one Output node and two Input nodes by default.
Any field mapped as a new member to one Input node will appear in all of the input nodes as well as the Output node. In this case, we will map the SSN field from the source object to an Input node.
Right-click on the header of the Normalize object and select Properties from the context menu.
A dialogue box will appear.
This dialogue box is used to configure the Normalize object.
In the Layout Builder window, create the layout of your normalized dataset by providing field names. In this case, we have already mapped SSN from the source and will create two new fields, one for the TaxAmount and the other for the TaxType.
Once you have created the layout, click Next.
In the Normalize (One to Many) Transformation Properties window, make appropriate selections for the following options:
Number of Mapping Groups: Here, you can specify the number of mapping groups that are required. Increasing this number from 2 will also increase the number of Input nodes in the object. In this case, there are four tax types. Hence, we will increase the number to 4.
Omit Record If this Element is Null: From this drop-down menu, you can select a field from your layout. If an element in this field is null, the entire record containing that null element will be omitted from the dataset. In this case, we will select , the default option, which denotes that this option will not apply to any field.
Once you have made the required selections, click Next.
On the last window, which is the General Options window, you will be provided with an empty text box for Comments. Moreover, you can also select a few options that are common to most objects in Astera.
Clear Incoming Record Messages: When this option is checked, any messages coming in from the preceding object will be cleared.
Do Not Overwrite Default Values with Nulls: When this option is checked, actual values are not overwritten with null values in the output.
In this case, we will leave the options unchecked. Once you are done, click OK.
Now that you have configured the Normalize object, you will notice that new input nodes have been added to the object based on our selection for the Number of Mapping Groups option. Each node contains the layout we specified in the Layout Builder window.
The next step is to make the required mappings from the source object to the Normalize object. These are the mappings needed for this particular use case:
Map SSN from the Excel Workbook Source object to SSN in all four input nodes of the Normalize object.
Map City to TaxAmount in the first input node, County to TaxAmount in the second input node, State to TaxAmount in the third input node, and Federal to TaxAmount in the fourth input node.
Map the City Field Name to TaxType in the first input node, the County Field Name to TaxType in the second input node, the State Field Name to TaxType in the third input node, and the Federal Field Name to TaxType in the fourth input node. To map field names, right-click on the mapping link, hover over Change Map Type, and select Field Name.
Here is what the final dataflow should look like:
Preview the output to have a look at the normalized dataset.
You can map these fields further to other objects in the dataflow using the output node of the Normalize object.
This concludes using the Normalize object in Astera.
The Sequence Generator Transformation object in Astera is used to add sequences of integer values to a dataflow. The sequences can start with any number and have any step, for example, 50, 55, 60, 65 etc.
The Astera Data Stack can either create a sequence instantly at the dataflow’s run-time (this is called in-memory sequence), or it can read sequence control data from a database table as your dataflow is executed.
In the case of in-memory sequences, a sequence always starts at the Start Value provided in the SeqGenerator: Context Information Properties window. In the case of database sequences, the last value used is recorded in the control database, and a new start value is used every time the sequence is invoked.
This makes it possible to generate ever-increasing values for the sequence each time the dataflow runs. In effect, such a sequence is a chain of sequences with non-overlapping values.
Here, we have retrieved data from the an Orders table using a Database Table Source object. We will use the Sequence Generator Transformation object to generate a sequence for the OrderNo field in our source data. Let’s see how this works.
Drag the Sequence Generator Transformation object from the Transformations section in the Toolbox and drop it on to the dataflow designer.
Map the required fields from the source object to a destination object.
Note: We have the Orders table as our source from a Database Table Source. We will map the fields OrderDate, RequiredDate, ShippedDate, ShipVia and Freight to an Excel Workbook Destination object.
To configure the properties of the Sequence Generator Transformation object, right-click on its header and select Properties from the context menu.
This will open the Context Information Properties window.
In this window, you can choose between three different types of sequence generations and specify the Sequence Details.
A description of these three methods is given below:
In Memory: The sequence will be created in memory at the dataflow run-time. The sequence always starts at the specified Start Value in the sequence properties.
Sequence Details:
Start Value – The initial value for the sequence
Step – The increment value
Database Table: The sequence control information for the database table can be managed within Astera through the Manage Sequences option.
Connection: Specify the connection to the database where the sequences will be stored
Sequence: Select the sequence from the list of available sequences in database.
Note: To manage database sequences, go to Menu > Tools > Sequences.
Batch Size: Specifies the minimum number of values to be allocated to the sequence.
Use Memory Sequence during preview: Prevents the user from breaking the sequence cycle during a data preview by substituting with an in-memory sequence, which does not affect (i.e. increase) the database sequence’s current value.
Sequence Object - The sequence control information is read from a special database table coming from SQL Server or Oracle database.
Connection: Specify the connection to the database that stores your sequences.
Sequence: Select the sequence from the list of available sequences.
Use Memory Sequence during previews: Prevents the user from breaking the sequence cycle during a data preview by substituting with an in-memory sequence.
Note: In this case we will use the In-Memory sequence generator option.
Let’s specify the Sequence Details as follows:
Start Value: 0
Step: 1
In the destination object, a new field will be created where the sequence generator value will be mapped.
Note: In this case, the OrderNo field has been created in the Excel Workbook Destination object.
The NextVal field will be mapped to the OrderNo field in the destination object.
You can see the output of the excel destination object in the Data Preview window.
The sequence has been generated in the new field, OrderNo.
This is how the Sequence Generator Transformation is used in Astera.
The Tree Join Transformation object in Astera enables users to create complex, hierarchical data structures such as EDI or XML documents with ease. Unlike the standard relational join which combines left and right elements to create a new record, the Tree Join Transformation object allows users to create collection and member nodes. It also enables users to join datasets in parent-child hierarchies using a key field. It is a set level transformation that operates on a group of records.
In this document, we will learn to use the Tree Join Transformation object in Astera.
In this use case we have two different source datasets. The first data contains information about Customers and has fields such as CustomerName, CustomerID, Address, etc.
The second dataset contains details of Orders placed by customers. It includes fields such as OrderID, CustomerID, and order details such as RequiredDate, ShippedDate and other shipping details.
We will join these two datasets using the Tree Join Transformation object and create a hierarchical dataset in which all orders placed by a customer along with the order details are represented in a parent-child hierarchy.
Contrary to the regular Join that joins two datasets in a flat layout, the Tree Join Transformation joins data from two different datasets into a hierarchical data structure.
In this use case, each record from the first dataset that contains Customer details will be a parent node, and under the parent node, the details of Orders placed by that customer will be returned in a child node.
1. To get the Tree Join Transformation object from the Toolbox, go to Toolbox > Transformations > Tree Join and drag-and-drop the Tree Join Transformation object onto the designer.
2. Now map fields from the Customer source dataset to the TreeJoin object.
3. Right-click on the Tree Join object header and go to Properties from the context menu. In the Tree Join Layout Builder window, you can see the fields from the Customer dataset listed under the root node.
4. Next, click on the TreeJoin node, you will see that the small icons or buttons at the top of the screen will become active. If you click on the icon, you will get two options:
Add Member Object – To add a new member node to your layout
Add a Collection Object – To add a new collection node under the parent node. It will return all corresponding records as a collection under the parent node.
In this case we will Add a Member Object to create a separate record for each order placed by a customer, under a separate customer record node.
5. Add a Member Object to this root node. A dialogue box will open to name your member object.
In this case, let’s name it ‘Orders’ and click OK. A member object has been added to our parent node.
6. Click OK, to close the properties window. Now map the Orders dataset to the member node that we created in the previous step to complete the layout.
7. Go to the properties of the TreeJoin object again. We have already created the layout, so we will proceed to the next window.
8. In the TreeJoin Transformation Properties window, we must specify the Join Key.
The join key is a common field or a common identifier in both the datasets which will identify and join records in a tree-like structure. The parent and child fields are the same fields which are common in both the source datasets and serves as a key identifier to join records.
Parent Field – Join field from the first dataset.
Child Field – Same field as the parent field, selected from the second dataset.
In this case, the CustomerID field is common in both the datasets, so we will use it as the join Key.
9. Click on the Parent field dropdown button. Expand the TreeJoin node and select the CustomerID field.
10. Click on the Child field column and expand the TreeJoin root node. Scroll down to your member node, expand this node and select the CustomerID field from the second dataset.
Let’s discuss the other options on the properties window:
Join In Database – Lets you join the tables in the database itself rather than in-memory. However, it applies only when both the tables are sourced from the same database.
Case Sensitive – To process and join records on a case sensitive basis.
11. We have our layout and the TreeJoin properties ready, click OK.
12. Right-click on the TreeJoin object and select Preview Output.
The TreeJoin object has returned the customer records in parent nodes. Upon expanding the node, you can see the order placed by the customer listed as its member unit under the parent node.
If we choose to Add a Collection Object in the Layout Builder, all the records for orders placed by a customer will be returned in a collection under a single parent node for each customer.
13. The joined dataset can now be written to a desired destination. In this case we will write it to an XML File Destination object.
To know more about writing to an XML File Destination object, click here.
This concludes using the Tree Join Transformation object in Astera.
You can download the file for this use case from the following link:
Astera provides an array of source options to read and extract data from. Different source objects can be found in Toolbox > Sources.
Look for the supported sources and data providers here.
For a detailed overview of different source objects in Astera's dataflows, see Setting Up Sources.
Transformations in dataflows are used to perform a variety of operations on data as it moves through the dataflow pipeline. Astera provides an extensive library of built-in transformations enabling you to cleanse, convert, and transform data as per your business needs. Transformations can be found in Toolbox > Transformations. For a detailed review on transformations, see Introducing Transformations.
In this article, we will discuss:
How various sources in dataflows can be used as a transformation.
Some common scenarios where you could use a source as a transformation.
While the basic function of source objects in dataflows is to extract data and bring it to the designer for further integration, a source object can also be used as a transformation function.
To use a source object as a transformation, you will need to:
Select the relevant source object from Toolbox > Sources and drag-and-drop it onto the designer.
Right-click on the source header and select Transformation from the context menu.
As soon as the Transformation option is selected from the context menu, the header color of the source object will change from green to purple. This is because, by default, Source objects in Astera's dataflows are indicated by a green header and Transformation objects are indicated by a purple header. Hence, the change in color.
Listed below are the source objects that can be used as a transformation:
Note: Some sources in Astera cannot be used as transformations. These sources are: ADO.Net Metadata Collections, COBOL Source, SQL Query Source, Multi-table Query Source, and FTP List Directory Contents.
Generally, source objects are used as transformations when the source file path is dynamic.
In the next section of the article, we will discuss how to use a Delimited File Source object as a transformation.
A Delimited File Source object can be used as a transformation when it is taking a dynamic file path; therefore, it will have multiple files of the same layout processing in a single dataflow or workflow.
Drag-and-drop the Delimited File Source object onto the designer.
Go to the object’s properties and provide the File Path for the delimited source file.
Once you have provided the File Path and configured the properties of the source object, click OK. Now, right-click on the header and select Transformation from the context menu, to change it to a Transformation object.
The header of the Delimited Source object will change to purple indicating that the source object is now converted into a Transformation object.
The transformed DelimitedSourceTrans object will now have two nodes:
Input node: To map the file path of the folder that contains delimited files that are to be processed.
Output node: On expanding the Output node, you will see the data fields in the delimited source file. Map these fields to other objects in a dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the Delimited source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, point the path to the directory and folder where the delimited files are located.
Map the FullPath field from FileSystem to the DelimitedSource object’s input node (FilePath).
Now our Delimited Source Transformation object is ready. To preview the data, right-click on the DelimitedSourceTrans object and select Preview Output.
Once you select Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed preview format. Click on the +icon right next to the root node of theDelimitedSourceTran object to expand the node and preview your data.
You now have an expanded version of your data:
Root Node: Object Path – DelimitedSourceTran
Sub Node:
Input: Displays the path of the file that is being used as the input for this data.
Output: Displays the fields in the source data.
This is how you use a Delimited File Source as a transformation.
Next, we will see how to use the Excel Workbook Source object as a transformation.
The Excel Workbook Source can be used as a transformation when you have multiple Excel files with the same layout, and want to process them together in a dataflow or workflow.
Drag-and-drop the Excel Workbook Source object onto the designer.
Go to the object’s properties and provide the File Path for the excel source file.
Once you have provided the file path and configured the properties of the excel source object, click OK. Now right-click on the header and select Transformation from the context menu, to change it into a transformation object.
The header of the ExcelSource object will change to purple indicating that the ExcelSource object is now converted to a transformation object.
The transformed ExcelSource object will now have two nodes:
Input node:
FilePath: To map the path of the folder that contains excel files that are to be processed.
Worksheet: This option can be used when you have more than one worksheet in an excel source file and want to use any particular worksheet in the dataflow/workflow. This can be done by specifying the worksheet name using a Constant Transformation object which you can find in Toolbox > Transformation > Constant Value.
Output node: On expanding this node, you will be able to see the data fields in the excel source file. Map these fields to other objects in the dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the Excel source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, provide the path of the directory and folder where the excel files are located.
Map the FullPath field from FileSystem to the ExcelSource object’s Input node (FilePath).
Map the Value field from ConstantValue to the ExcelSource object’s Input node (Worksheet).
Now our Excel Source transformation object is ready. To preview the data, right-click on the ExcelSourceTrans object and select Preview Output.
On selecting Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed preview format. Click on the +icon right next to the root node ExcelSourceTran to expand the node and preview your data.
You will see the following nodes:
Root Node: Object Path – ExcelSourceTran
Sub Node:
Input: Gives the file path of the file that is being used as the input for this data.
Output: Displays the fields in the source data.
This is how you use an Excel Workbook Source as a transformation.
Now we will discuss how to use an XML/JSON File Source as a transformation in Astera.
The XmlJson File Source object can be used as a transformation when you have multiple XML or JSON files with the same layout, and want to process them in a dataflow or a workflow.
Drag-and-drop the XML/JSON File Source object onto the designer.
Go to the object’s properties and provide the File Path for the XML source file and its schema.
Once you’ve provided both paths and configured the XML/JSON source object, click OK. Now, right-click on the header and select Transformation from the context menu to change it into a transformation object.
The header of the XmlJsonSource object will change to purple indicating the conversion from a source object to a transformation object.
The transformed XmlJsonSource object will now have two nodes:
Input node: To map the file path of the folder that contains XmlJson files that are to be processed.
Output node: Once expand, you will be able to see the data fields that are in the XmlJson source file. You can map these fields to other objects in a dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the XmlJson source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, provide the Path of the directory and folder where the XML/JSON files are located.
Map the FullPath field from FileSystem to XmlJsonSource object’s Input node (FilePath).
Now our XmlJson source transformation object is ready. To preview the data, right-click on the XmlJsonSourceTrans object and select Preview Output.
On selecting Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed form. To expand the data and preview your output, you need to click on the +icon right next to the root node – XmlJsonSourceTran.
You now have an expanded version of your data:
Root Node: Object Path – XmlJsonSourceTran
Sub Node:
Input: Gives the file path of the file that is used as the input for this data.
Output: Displays the fields in the source data.
This is how you use an XmlJson File Source as a transformation.
In the next section of the article, we will discuss how to use Report Source as a transformation in dataflows.
The Report Source object can be used as a transformation when you have multiple report models with the same layout, and process them in a dataflow or a workflow.
Drag-and-drop the Report Source object onto the designer.
Go to the properties and provide the File Path for the report source and its report model.
Once you’ve provided both the paths and configured the properties of the report source object, click OK. Now right-click on the header and select Transformation from the context menu, to change it to a transformation object.
The header of the ReportSource object will change to purple indicating the conversion from a source object to a transformation object
The transformed ReportSource object will now have two nodes:
Input node: Map the file path of the folder that contains report files that are to be processed.
Output node: When expanded, you will be able to see the data fields that are in the report source file. You can map these fields to other objects in the dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the Report source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, provide the path of the directory folder where report files are located.
Map the FullPath field from FileSystem to the ReportModel object’s Input node (FilePath).
Now our Report Source Transformation object is ready. To preview the data, right-click on the report source object and select Preview Output.
On selecting Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed form. To expand the data and preview your output, you need to click on the +icon right next to the root node – ReportModelTran. And then to further expand the data, click on the +icon right next to the sub node – Output.
You now have an expanded version of your data:
Root Node: Object Path – ReportModelTrans
Sub Node:
Input: Gives the file path of the file that is being used as the input for this data.
Output: On further expansion it will show the fields/data that is there in the report model.
This is how you use the Report Source object as a transformation object.
The Union Transformation object in Astera is used to combine incoming data from two or more inputs into a single output. Its functionality is similar to a union operator in SQL query. It has multiple input nodes and a single output node. It puts together two sets of data irrespective of the repetition that might occur in the datasets. In order to perform this transformation on two datasets, their cardinality must be same.
Note: An input node in a union transformation cannot receive data from two different set transformations.
To work with a Union Transformation object, drag-and-drop the Union Transformation object from Toolbox > Transformations > Union onto the dataflow designer.
Map the Customers_Marketing data to Input_1 and Customers_Sales data to Input_2 in the Union Transformation object.
Now, right-click on the Union Transformation object’s header and select Properties.
The first window is the Layout Builder window, where you can customize your layout or modify your fields. You can also provide a default value to be used in place of null values.
Add a new field, name it Category and specify its Data Type as String.
Click Next.
Next is the Union Transformation Properties window where there are two input nodes defined as Input_1 and Input_2 by default. You can also rename them if you want. You can also define any number of input nodes based on the number of datasets you want to combine using the Union Transformation object.
Click OK.
Now, map the categories of respective departments from the Variables resource object to the Category field in the Union Transformation object. This is done to identify which department a particular record is coming from.
Now, we have successfully configured the Union Transformation object.
Right-click on the Union Transformation object’s header and select Preview Output.
You can see that the Union Transformation has successfully combined the two datasets into a single, unified dataset.
Note: A Union Transformation will show the combined fields from both sources as a resultant, regardless of whether they are present in one or both datasets. In the final output, the dataset which does not have one or more fields will be assigned a null value in that field.
You can now further transform your dataset or write it to a destination.
This concludes working with the Union Transformation object in Astera.
The Data Cleanse Transformation object is a new addition to Astera's library of transformations. It makes it all the more convenient for business users to cleanse raw data and present it in a more refined, standardized, and enterprise-ready format. Using the Data Cleanse Transformation object, users can clean up data from null values and redundant text and characters, and prepare raw data for transformation, validation, profiling, and record matching functions.
Now drag the Data Cleanse Transformation object from the Transformations section in the Toolbox and drop it onto the designer.
This is what a Data Cleanse transformation object looks like.
Map data from the source object to the Data Cleanse Transformation object. You can either auto-map the entire data set or map a few fields manually.
Now you have to specify the criteria for data cleansing. Right-click on the Data Cleanse Transformation object and select Properties from the context menu.
This will open a new window where you have to set up the properties for data cleansing. The first window is the Layout Builder window. Here you can customize the layout of your dataset by adding, removing or renaming fields. Once you have created the layout, click Next to proceed to the next window.
This is where you set up the data cleanse criteria for your source data.
You can find various data cleanse options arranged in different sections. Let’s explore them one by one.
The options provided within this category allow you to remove values, spaces, tabs, and line breaks from your data. You can find the following options within this category:
All whitespaces – Removes all whitespaces from the data
Leading and trailing whitespaces – Removes whitespaces preceding and succeeding the values
Tabs and line breaks – Removes tabs and line breaks within source values
Duplicate whitespaces – Removes double spaces from the data
Letters – Removes all alphabets from the data
Digits – Removes all digits from the data
Punctuation – Removes all punctuation from the data
Specified Character – Removes any specific character from the data
As the name suggests, the options within this category allow you to replace null values inside a string or numeric field with a corresponding value – blank in case of a string, and zero, in case of a numeric field.
Null strings with blanks: Replaces all null strings with blanks
Null numerics with zeros: Replaces all null numeric values with zeros
The Find and Replace options enable users to replace a value in the source dataset with another value.
It also provides users the option to choose whether the find and replace function is to be performed on a case sensitive basis. You can select a search mode from three options:
Normal – Will perform a normal find and replace function
As in this example, we want to change the status from ‘Planned’ to ‘Scheduled.’
So, we will type in ‘Planned’ in the Find field and ‘Scheduled’ in the Replace field.
Now, if we look at the output, we can see that the Data Cleanse Transformation Object has found and replaced the status values from ‘Planned’ to ‘Scheduled.’
Extended – Allows you to search for tabs(\t), newline(\r\n), or a character by its value (\o, \x, \b, \d, \t, \n, \r and \) and replace with the desired value
In the example below, we want to replace whitespaces within our source values with a hyphen (-).
So, we will type ‘\s’ in the Find field and ‘-’ in the Replace field.
Now, if we look at the output, we can see that the Data Cleanse Transformation object has found and replaced whitespaces from within the values with a hyphen.
Preview before applying the ‘’Extended’ Find and Replace" function.
Preview after applying the “Extended Find and Replace” function.
Regular Expressions – Allows you to find and replace a value based on a regular expression.
In the example below, we want to replace the “ALFKI” value(s) in the CustomerID field with “A1234”.
For this, we will write a regex in the Find field and the desired value in the Replace field.
Now, if we look at the preview, you can see that Astera has replaced values in the source data with the desired values.
Case options allow users to convert the letter case of source data to Upper, Lower, or Title case.
You can choose from the following options:
None – Keeps the letter case as is.
Upper – Changes all letters to upper case.
Lower – Changes all letters to lower case.
Title – Changes all letters to title case.
The Modify Data option provides you the flexibility and convenience of applying an expression to all fields in your data. Check the Run expression on all fields option to activate this feature.
The Run Expression on all fields feature was previously called ApplyToAll and offered as a standalone transformation in Astera 7.5 and previous releases. It had a limited functionality though, compared to the all-new Data Cleanse Transformation object, which is why it has been replaced altogether with the Data Cleanse Transformation object in Astera 7.6 and now in Astera 8.0.
The Run Expression on all fields feature is enabled by default for any existing flows created prior to Astera 7.6. This means that existing flows created on Astera 7.5 or a prior release will continue to work seamlessly on 8.0 or further upgrade and won’t require any modification at all.
Here, you can choose from the extensive library of built-in expressions and apply it to all mapped fields by adding it to a “$FieldValue” parameter.
As in this example, we have mapped a regular expression to the “$FieldValue” parameter.
Now if we look at the preview, you can see that Astera has applied the regular expression to all fields and removed whitespaces from the values.
Preview before running the expression on all fields:
Preview after running the expression on all fields:
This function was previously performed using the ApplyToAll transformation in Astera 7.5 and previous releases. However, now you can perform this and other data cleanse tasks using the Data Cleanse Transformation object.
A Route transformation object invokes one or more paths in the dataflow, in accordance with some decision logic expressed as a set of Rules. Using the Route transformation object, you can create some custom logic involving multiple paths and adapt it to suit your data processing scenario.
For example, a record passing some rule will be written to Destination1, a record passing another rule will be written to Destination2, and a record which fails to pass any rules can still be written to a Destination, and be fed to a downstream transformation object.
To add a Route transformation object, drag the Route object from the Transformations section in the Toolbox, and drop it on the dataflow designer.
An example of what a Route object might look like is shown below:
To configure the properties of a Route object after it was added to the dataflow, right-click on the object’s header, and select Properties from the context menu. The following properties are available:
Layout Builder window:
The Layout Builder window allows you to add or remove fields in the field layout, as well as select their data type. The fields added in the Layout Builder will be added to the Input node inside the object box, as well as in all Rule nodes corresponding to the number of rules created (see below).
Route Transformation Properties window:
The Route Transformation Properties window allows the interface to manage Route rules.
Type a descriptive name for the rule in the Description field.
Click Compile to check for any syntax errors in your rule. The Compile Status should read “Successful” for a successful compilation.
To activate the rule, check the Active checkbox.
Note: Each Route rule you add here will have its own Rule node in the Route object box once you close the Properties window. You can then map it to downstream objects in the dataflow as needed.
Note: All Route transformation objects have a Default node. This node handles records not passing any rules defined in your Route transformation object.
General Options window: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect their statistics.
The Comments input allows you to enter comments associated with this object.
An example of using the Route transformation object is shown below.
The File Lookup Transformation object in Astera is used to look up values coming from a source. It uses an excel or delimited file which contains the lookup values as well as the output values to perform the lookup. File lookup can be performed based on a single lookup field as well as a combination of fields.
Similarly, a File Lookup Transformation object can return a single output field from a lookup table or a combination of fields. In either case, the output field or fields are returned from the records in which the lookup values match the incoming values.
In this scenario, we have a Customers dataset from a fictitious organization stored in a database source. It contains information of customers from different countries. We want to replace the country names by country codes in our database, by switching them with the lookup values (country codes) stored in an excel file. To achieve this, we will use the File Lookup Transformation object.
Select the relevant source object from the Sources section in the Toolbox. In this example, we will use Customers data stored in a database table.
Right-click on the source object’s header and select Preview Output. You can see the country names in the Country field which we want to convert into country codes.
Drag-and-drop the File Lookup Transformation object from Tooblbox>Transformations>FileLookup onto the dataflow designer.
Now, let’s configure the Transformation object.
Right-click on the header of File Lookup Transformation object and select Properties.
A File Lookup Map Properties window will open where you can see an option for Source File Type.
Source File Type: Here, you need to specify the type of your lookup file.
You can perform the task using an Excel or Delimited lookup file.
Select the Source File Type from the dropdown menu. In this example, our country codes are stored in an Excel file so we will specify the Source File Type as Excel.
Click Next to proceed to the File Lookup Details window. You will see two options:
File Location: Here, you need to specify the File Path to the lookup source file.
Options:
First Row Contains Header: You can check this option if your lookup file contains a header in the first row. Otherwise, you can leave it unchecked.
Worksheet: If your lookup file contains multiple worksheets, you can select the worksheet you want to use to perform the lookup.
Click Next to proceed to the Layout Builder. Here, you can make changes to the object’s layout by modifying the existing fields or creating new fields.
Once you are done, click Next.
On the next window, you will see various Lookup Options.
If Multiple Values Are Found
Multiple Matches Found Option: This option provides the flexibility to choose the output value, if more than one matches are found for a single value in your lookup file. The option expands into a drop-down list where you can select one from the following three options:
Return First: Will return the first matched value found.
Return Last: Will return the last value among all the matched values.
Return All: Will return all the values in the lookup file that match a source value.
If Value Is Not Found In The Lookup List:
In case no lookup value is found against a source value, you can choose one of the following three options to be appended with your output.
No Message: There will be no message and the output will be the same as the input value.
Add Error: An error message will appear with the output.
Add Warning: A warning will appear with the output.
If Value Is Not Found In The Lookup List, Assign Value:
If no lookup value is found against a source value, you can assign an output value of your choice.
Assign Source Value: Will return the source value in the output.
Assign Null: Will assign null to your output value.
This Value: You can select this option and assign any value of your choice.
Click Next. This will take you to the Config Parameters window, where you can further configure and define parameters for the File Lookup Transformation object.
Once you have configured the File Lookup Transformation object, click OK.
Map the Country field from source object to the Country field in the File Lookup Transformation object. Now map the Code field from the transformation object to the Country field in the Database Table Destination object.
This is what your dataflow will look like:
Map the remaining fields from the source object to the destination object.
Right-click on the destination object’s header and select Preview Output.
You can see that the country names in the database table have been successfully converted into country codes.
This concludes using the File Lookup Transformation object in Astera.
Astera Data Stack introduces an innovative AI Matching feature which leverages the power of Artificial Intelligence to perform intelligent matching. This feature works based on semantic similarity, ensuring more accurate and comprehensive matching results.
In Astera Data Stack, the AI Match object can be found in the Toolbox and can be used within the scope of the Dataflow.
For our use case, we have a Customers dataset from the sales department as shown below:
We want to replace the values in the Country column of the sales dataset by semantically matching them with Country values from the Customers dataset provided by the marketing team, ensuring both departments follow a unified naming standard.
To get started, let’s drag-and-drop an Excel Workbook Source object and configure it with the customers dataset provided by the sales department.
Next, drag-and-drop the AI Match object from the Toolbox onto the Dataflow and auto-map the fields from the Excel Workbook Source onto the AI Match object.
Once all the fields have been mapped, right-click on the AI Match object and select Properties from the context menu.
This will open the Layout Builder screen, which shows the layout of the incoming dataset. Click Next.
The AIMatch Transformation Properties screen will open, let’s configure these properties.
File Path: This is where we provide the path of the file on the basis of which we want to perform our semantic matching.
Worksheet: This is where we can define which Excel sheet data to use if there are multiple sheets.
Lookup Field: This is the field based on which we are performing the lookup.
Incoming Field: This lets us define the lookup field from the incoming dataset
For our use case, let’s select the Country Field for both.
Once done, click OK and right-click on the AI Match object to preview its output.
As you can see below, the values in the Country field have been semantically matched and replaced from the file, using AI. We can also see that, since the country Pakistan did not have a matching field in the marketing dataset, it hasn’t been replaced.
Now, let’s drag-and-drop a Database Table Destination object and map the matched data onto it.
Running this Dataflow will write the data to the destination table.
This concludes the working of the AI Match object in Astera Data Stack.
The Database Lookup object in Astera is used to look up values from a source. It uses a database table that contains the lookup values as well as a set of corresponding output values. When the lookup is performed, the object returns either a single output field or multiple output fields, depending on the nature of the lookup table. Similarly, the lookup can be performed based on one lookup field or multiple lookup fields. In each case, the output field or fields are returned from the records in which the lookup values match the incoming values.
In this use case, we have a sample Customers dataset that is stored in a database table. Within this dataset, there is a field that contains the country of residence for each customer. We have another database table that contains all of these countries and their corresponding codes. Our goal is to replace the full country names with codes while writing the customer dataset to an Excel file. To do this, we will use a Database Lookup object.
Drag the relevant source object from the Toolbox and drop it onto the designer. In this case, we will select the Database Table Source object from Toolbox > Sources > Database Table Source and configure it so that it reads data from the Customers dataset.
To preview the data, right-click on the object header and select Preview Output from the context menu. Here, you can see that there is a field that contains each customers’ country of residence.
Drag the Database Lookup object from Toolbox > Transformations > Database Lookup and drop it onto the designer.
Right-click on the header of the Database Lookup object and select Properties from the context menu.
This will open a new window on your screen.
Here, you are required to configure the properties for the Database Lookup object.
On the Database Connection window, enter the details for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you wish to connect to. The required credentials will vary according to your chosen provider.
Provide the required credentials. Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A new window will open, showing whether your test is successful or has ended in an error. When the connection has been successfully established, close it by clicking OK, and then click Next.
The next window is the Database Lookup Map Properties window. Here, you can pick a table from the database that you have connected to.
In this case, we will select the table named CountryCodeLookup. This table contains the code for each country and will serve as the lookup table in our use case.
In the text box provided under the Pick Table option, you can enter a where clause to modify the lookup query. In this case, we will leave it empty.
Once you have chosen a table, click Next.
On the next window, you can choose a Lookup Cache Type from the following options:
No Caching: No data will be stored in a cache. This option is selected by default.
Static: The lookup values are stored in a cache. Once the cache is created, the transformation object will always query the cache instead of the lookup table. When you select this option, the following sub-options are enabled:
Fill Cache With All Lookup Values at Start: Fills the cache with all of the lookup values at the start and continues to use this cache for every lookup.
Cache After First Use: Uses the database table for the first lookup and fills the cache right after it is done. This cache is then used for every subsequent lookup. Checking this option enables another sub-option:
Cache Commit Count: Defines the number of records collected per cache chunk before they are committed to the cache.
Dynamic: The lookup values are stored in a temporary cache file, which is deleted once the dataflow has been executed. When you select this option, the following sub-options are enabled:
Fill Cache With All Lookup Values at Start: Fills the cache with all of the lookup values at the start and continues to use this cache for every lookup.
Cache After First Use: Uses the database table for the first lookup and fills the cache right after it is done. This cache is then used for every subsequent lookup. Checking this option enables other sub-options:
Cache Commit Count: Defines the number of records collected per cache chunk before they are committed to the cache.
Cache Key Column: Defines a matching key field to check whether a record already exists in the cache.
Persistent: Saves the lookup values in a cache file that can be reused for future lookups. When you choose this option, the following sub-options are enabled:
Rebuild Persistent Cache on Next Run: Checking this option will allow the contents of the cache file to be modified after every run.
Cache File Name: Here, you can enter a name for your cache file.
In this case, we will select the No Caching option. Once you are done, click Next.
On the Lookup Options window, you can choose between multiple lookup options.
This page provides a set of options for different scenarios that could be faced during a lookup.
If Multiple Values Are Found
Multiple Matches Found Option**:** This option provides the flexibility to choose the output value if more than one matches are found for a single value in the lookup table. You can select one out of the three options that appear in the drop-down list:
Return First: Returns the first matched value.
Return Last: Returns the last value among all matched values.
Return All: Returns all matched values.
If Value Is Not Found In the Lookup List
If no lookup values are found for a source value, you can choose from the following options to be appended with the output:
No Message: The output value will be the same as the input value and no message will appear with it.
Add Error: An error message will appear with the output.
Add Warning: A warning message will appear with the output.
If Value Is Not Found in the Lookup List, Assign Value
If no lookup value is found for a source value, you can assign an output value of your choice.
Assign Source Value: Returns the source value in the output.
Assign Null: Returns null in the output.
This Value: Allows you to enter any value that will be returned in the output.
In this case, there is only one code for each country. Therefore, we will choose Return First from the drop-down list in the Multiple Matches Found Option. Moreover, we will leave the other options at their default selection i.e. No Message under If Value Is Not Found in the Lookup List, and Assign Null under If Value Is Not Found, Assign Value.
Once you are done choosing the options, click Next.
On the Config Parameters window, you can define certain parameters for the Database Lookup object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
In this case, we will leave them blank. Click Next.
On the last window, which is the General Options window, you will be provided with a text box to add Comments. The General Options in this window have been disabled.
You are now done configuring the Database Lookup object. Click OK to close the configuration window.
Expand the Database Lookup object to view the layout of the lookup table. In this case, it contains two fields, Country and Code. The former contains the full name of each country and the latter contains each country’s code.
Map the Country field from the Database Table Source object to its counterpart in the Database Lookup object.
Drag an Excel Workbook Destination object from Toolbox > Destinations > Excel Workbook Destination and drop it onto the designer. Configure the object by providing a name and the path to the directory and folder where you want to save your destination file.
Auto-map the source dataset to the destination object.
Delete the mapping link between the Country fields in the source and destination. To do this, right-click on the mapping link and select Delete from the context menu.
Map the Code field from the Database Lookup object to the Country field in the destination object. This is what the final dataflow should look like:
Right-click on the destination object’s header and select Preview Output from the context menu.
In the Data Preview window, you will see that each county name has been replaced by its corresponding code.
This concludes using the Database Lookup Transformation object in Astera.
The SQL Statement Lookup object in Astera is used to look up certain values that are mapped to it from a source object. It uses an SQL statement to access a table that contains the lookup values and their corresponding output values. Once the lookup is performed, the SQL Statement Lookup object returns either a single or multiple output fields, depending on the nature of the lookup table. Similarly, the lookup can be performed based on one lookup field or multiple lookup fields. When the incoming values match the lookup values, the output field or fields for those particular records are returned by the SQL Statement Lookup object.
In this use case, we will read data from the Customers table in the Northwind database using a Database Table Source object. This table contains customer information from a fictitious organization and will serve as the source table. Our purpose is to use an SQL Statement Lookup object to find some information about the orders placed by customers. This data is stored in a separate table called Orders, which will serve as the lookup table.
Drag-and-drop the Database Table Source object from Toolbox > Sources > Database Table Source onto the dataflow designer. Configure the object so that it reads data from the Customers table.
Now, drag-and-drop the SQL Statement Lookup Transformation object from Toolbox > Transformations > SQL Statement Lookup onto the dataflow designer, next to the source object.
Right-click on the header of the SQL Statement Lookup object and select Properties from the context menu.
This will open a new window.
Here, we need to configure the properties of the SQL Statement Lookup object.
In the Database Connection window, enter details for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you wish to connect to. The required credentials will vary according to your chosen provider.
Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A separate window will appear, showing whether your test is successful. When the connection has been successfully established, close it by clicking OK, and then click Next.
The next window will present a blank space for you to write an SQL statement. Here, you can enter any valid SELECT statement or stored procedure to read any table from the database that was specified earlier. This table will serve as the lookup table.
In this case, we will be reading data from the Orders table.
Enter the SQL statement and click OK. This will take you back to the dataflow designer.
As you can see, the SQL Statement Lookup object has been populated with all the fields present in the Orders table.
The next step is to choose an incoming field or multiple incoming fields from the source object, based on which the lookup action will be performed. This field needs to be mapped to the transformation object.
In this case, we can clearly see that CustomerID is a common element between the two tables. Hence, this field will be used to perform the lookup. It will be mapped from the Database Table Source object to the SQL Statement Lookup object as a new member.
Right-click on the transformation object’s header and select Properties to open the Properties window. Keep clicking Next until you reach the Layout Builder window. Here, you can customize the layout by modifying the existing fields or creating new fields.
Once you are done, click Next.
On the next window, you can define one or more lookup conditions. These conditions will determine what values are returned when the lookup is complete.
You will have to make appropriate selections from three drop-down lists:
Database Element Name: This list contains all the elements present in the SQL Lookup object. Select the element that you wish to use as a lookup field. In this case, it is CustomerID.
Operator: This list contains a set of operators that are used to define the condition. In this case, we will be using the ‘equals to’ operator because the lookup value is supposed to match the incoming value.
Input Element: This list contains the elements that have been mapped to the lookup object. In this case, the only input element available is CustomerID from the Customers table.
Once you are done defining the condition, click Next.
The next window will allow you to choose a Lookup Caching Type. The following options are available:
No Caching: No data will be stored in cache. This option is selected by default.
Static: The lookup values are stored in a cache. Once the cache is created, the lookup object will always query the cache instead of the lookup table. When you select this option, the following sub-options are enabled:
Fill Cache With All Lookup Values at Start: Fills the cache with all of the lookup values at the start and continues to use this cache for every lookup.
Cache After First Use: Uses the database table for the first lookup and fills the cache right after it is done. This cache is then used for every subsequent lookup. Checking this option enables another sub-option:
o Cache Commit Count: Defines the number of records collected per cache chunk before they are committed to the cache.
Persistent: Saves the lookup values in a cache file that can be reused for future lookups. When you choose this option, the following sub-options are enabled:
Rebuild Persistent Cache on Next Run: Checking this option will allow the contents of the cache file to be modified after every run.
Cache File Name: Here, you can enter a name for your cache file.
In this case, we will select the No Caching option. Once you are done, click Next.
On the next window, you will see multiple lookup options.
The page provides a set of options for different scenarios that could be faced during a lookup.
If Multiple Values Are Found
Multiple Matches Found Option: This option provides the flexibility to choose the output value if more than one matches are found for a single value in the lookup table. You can select one out of three options that appear in the drop-down list:
Return First: Returns the first matched value.
Return Last: Returns the last value among all matched values.
Return All: Returns all the matched values.
If Value Is Not Found In the Lookup List
If no lookup values are found for a source value, you can choose from the following options to be appended with the output:
No Message: The output value will be the same as the input value and no message will appear with it.
Add Error: An error message will appear with the output.
Add Warning: A warning message will appear with the output.
If Value Is Not Found in the Lookup List, Assign Value
If no lookup value is found for a source value, you can assign an output value of your choice.
Assign Source Value: Returns the source value in the output.
Assign Null: Returns null in the output.
This Value: Allows you to enter any value that will be returned in the output.
In this case, we want to look up the details for all of the orders placed by every customer. Hence, we will select Return All from the drop-down list in the Multiple Matches Found Option. This will automatically disable the rest of the options available on the screen.
Once you are done choosing the option, click Next.
On the next window, you can define certain parameters for the SQL Statement Lookup object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
In this case, we will be leaving them blank. Click Next.
On the last window, you will be provided with a text box to add Comments. The General Options in this window have been disabled.
You are now done configuring the SQL Statement Lookup object. Click OK.
Right-click on the SQL Lookup object’s header and select Preview Output.
You will able to see the following results:
Scroll down the Data Preview window to see the rest of the results.
The SQL Statement Lookup object has successfully returned the details for the orders placed by every customer in the Customers table (Source table) by comparing the CustomerID to its counterpart in the Orders table (lookup table).
This concludes using the SQL Statement Lookup Transformation object in Astera.
Or you can expand the dropdown located in the main toolbar and select Subflow as shown below.
Add a new rule by clicking on the Add switch condition icon .
In this example, we have a customers data from two different departments: Sales and Marketing, stored in two separate files. We want to combine this data into a single dataset using a Union Transformation object. To keep track of records coming in from each department, we will also add a new field, Category, in the layout of the Union Transformation object and pass the value using a Variables object.
Retrieve the data you want to cleanse using the relevant Source object. (Click to learn more about setting up Sources.)
Now click on this button to open the Expression Builder.
Click theicon to create a new rule.
In the Expression input, enter an expression for the rule. For example, LTV > 60 and LTV <= 80, or any rule or condition you want to apply to your data. Alternatively, you can click on the button to open the Expression Builder window - a tool that allows you to visually build your rule using Record tree and IntelliSense.
Add other Route rules if necessary. To delete an existing Route rule, select it and click the icon.
To learn how you can configure a Database Table Source object, click .
To learn how you can configure an Excel Workbook Destination object, click .
To learn how you can configure a Database Table Source object, click .