Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Parquet File Destination object allows the user to fetch and map data from various kinds of sources that the Parquet format supports. It is used to write data to Parquet files since it can efficiently store large datasets. It can also be used with various transformations.
The Parquet File Destination object in Astera offers compression methods to reduce file size and control memory consumption.
Drag and drop the Parquet File Destination object from the Destinations section of the Toolbox.
Note: We already have an Excel Workbook Source object configured for our use case.
Right-click on the Parquet File Destination object and select Properties from the context menu.
This will open the Properties screen.
Now, let’s look at the options present on this screen.
File Location
File Path: This is where the file path to the destination file is to be defined. It will be created once the dataflow is executed.
Options
Compression Method - You can select a compression method from this drop-down menu.
Snappy: This method offers high speed and reasonable compression.
Gzip: This method allows the reduction of data size at a fast rate.
Append to File (If Exists): This option will append data to the destination if there is a previously existing file present with data.
Write Numeric Nulls As Zero: Checking this box will write all null values as zero.
Write Null Booleans As False: Checking this box will write all Null Boolean values as false.
Once done, click Next and you will be led to the Layout Builder screen.
Here, the layout is going to be mapped for the destination. It can be built from the incoming data source or can be altered by the user.
We will be using our pre-configured Excel Workbook Source to map the incoming data to the Parquet File Destination object.
Open the Layout Builder again and it will be populated.
Click Next and you will be taken to the Config Parameters screen.
Parameters allow the deployment of flows by eliminating hardcoded values and provide a dynamic way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next and you will be led to the General Options screen.
Here, you can add any comments that you wish to add.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be processed by the object.
Do Not Overwrite Default Values with Nulls: Selecting this option will make sure that values are not overwritten with null values in the output.
Click OK and the Parquet File Destination object will be configured.
This concludes the configuration of the Parquet File Destination object in Astera.
Each destination on the dataflow is represented as a destination object. You can have any number of destinations of the dataflow. Each destination can only receive data from a single source. To feed multiple sources into a destination you need to connect them through a transformation object, for example Merge or Union. For more information on Transformations, see the Creating Transformations article.
The following destination types are supported by the dataflow engine:
Flat File Destinations:
Tree File Destinations:
Database Destinations:
All destinations can be added to the dataflow by grabbing a destination type from the Toolbox and dropping it on the dataflow. File destinations can also be added by dragging-and-dropping a file from an Explorer window while pressing the ‘Shift’ key. Database destinations can be dragged-and-dropped from the Data Source Browser while holding down the ‘Shift’ Key. For more details on adding destinations, see Introducing Dataflows article.
Adding a Delimited File Destination object allows you to write to a delimited file. An example of what a Delimited File Destination object looks like is shown below.
To configure the properties of a Delimited File Destination object after it was added to the dataflow, right-click on its header and select Properties from the context menu.
Adding a Fixed-Length File Destination object allows you to write to a fixed-length file. An example of what a Fixed-Length File Destination object looks like is shown below.
To configure the properties of a Fixed-Length File Destination object after it was added to the dataflow, right-click on it’s header and select Properties from the context menu.
Adding an Excel Workbook Destination object allows you to write to an Excel file. An example of what an Excel Workbook Destination object looks like is shown below.
To configure the properties of an Excel Workbook Destination object after it was added to the dataflow, right-click on its header and select Properties from the context menu.
Adding an XML/JSON File Destination object allows you to write to an XML file. An example of what an XML/JSON File Destination object looks like is shown below.
To configure the properties of an XML/JSON File Destination Object after it was added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
General Properties window:
File Path – Specifies the location of the destination XML file. Using UNC paths is recommended if running the dataflow on a server.
Note: To open an existing destination file for editing in a new tab, click XXX icon next to the File Path input, and select Edit File.
File Options:
Using the Encoding dropdown, select the appropriate encoding scheme for your destination file.
Check the Format XML Output checkbox to have line breaks inserted into the destination XML file for improved readability.
Schema Options:
Read From Schema File Specified Below – Specifies the location of the XSD file controlling the layout of the XML destination file.
Note: You can generate the schema based on the content of the destination XML file if the file already exists. The data types will be assigned based on the destination file’s content. Note that the existing destination file will be overwritten when the dataflow runs.
To generate the schema, click the icon next to the Schema File input, and select Generate.
To edit an existing schema, click the icon next to the Schema File input, and select Edit File. The schema will open for editing in a new tab.
Using the Root Element dropdown, select the node that should be the root of your destination schema. Any nodes up the tree will be excluded.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the destination. Using UNC paths is recommended.
Adding a Database Table Destination object allows you to write to a database table. An example of what a Database Table Destination object looks like is shown below.
Destination Connection screen – Allows you to enter the connection information for your destination, such as Server Name, Database and Schema, as well as credentials for connecting to the selected destination.
Pick Table window:
Database Transaction Management: Enable Transaction Management if you want to wrap your transfer inside a transaction. Depending on your database settings, this can give you performance improvements during the transfer. When Transaction Management is enabled, you should choose between always committing transaction at the end of the transfer, or only committing it if there were no errors. Any errors would result in the entire transaction being rolled back.
Preserve System Generated Key Values: This option is only available if you have assigned at least one field in your destination layout as a System Generated field. If enabled, Astera will pass the incoming value from the source to the system generated field. Otherwise, the incoming source value will be ignored, and the system will write auto-increasing values to the destination System Generated field.
Data Load Options: specify the type of insert of your records into a destination database. The available types are Use Single Record Insert, Bulk Insert with Batch Size, and Bulk Insert with All Records in One Batch.
These types allow you to customize your transfer to balance performance vs. logging needs. Bulk inserts typically result in a better performance (faster transfer for a given number of records), but they also come with less logging, and less ability to undo unwanted inserts should you need to.
Use Single Record Insert: Records are inserted into a destination table one-by-one. Performance is the slowest among the three insert types. However, any errors or warnings during the transfer are displayed to you immediately as the transfer progresses.
Bulk Insert with All Records in One Batch: Typically a quick method of transferring large amounts of data. But keep in mind that, should there be any database specific errors in your transfer, they will not show until the end of the transfer at which time the entire batch were to be written to the destination database.
Note: Not all database providers support this type of insert.
Bulk Insert with Batch Size: A good tradeoff between performance and logging needs. Records are inserted in batches of the specified size. Typically, larger batch sizes result in better transfer speeds; however performance gains may be less with relatively large batch sizes.
Note: Not all database providers support this type of insert.
Note: Bulk insert may not be available if there are certain data types in a destination table. In this case the transfer will proceed as “single insert”.
SQL Statement Destination object offers extra flexibility over database destination objects in applying a custom INSERT or UPDATE SQL code that controls what will be written into the destination table. An example of what an SQL Statement Destination object looks like is shown below.
To configure the properties of an SQL Statement Destination object after it was added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Database Connection window – Allows you to enter the connection information for your SQL Statement, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected database.
SQL Query window: In the SQL Query window, you can enter an SQL expression controlling which fields and records should be written to the destination. The SQL expression should follow standard SQL syntax conventions for the chosen database provider.
For example,
Insert into Orders values (@OrderId, “@OrderName”, “@CreatedDtTm”)
Notice the @ symbol in front of a field name. This makes the field appear in the field list inside the object box so that the field can be mapped. The fields not showing @ symbol in front of them will not show in the list of fields, but they can still receive values according to the logic of the SQL statement itself.
For example,
Insert into Orders (OrderId, OrderName, CreatedDtTm) values (@OrderId, “@OrderName”, “2010/01/01”)
Note: You can optionally use $ parameters inside your SQL expression.
Astera’s Delimited File Destination provides the functionality to write data to a delimited file. Delimited File Destination gives you the ability to control the structure and content of file including numeric, date, and Boolean formats, encodings, text qualifiers (quotes), and character sets. You can choose to create a new file or append data to an existing file.
To get a Delimited File Destination object from the Toolbox, go to Toolbox > Destinations > Delimited File Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Delimited File Destination object onto the designer.
The dragged destination object is empty right now. This is because the object has not been configured yet.
For a Delimited Destination object to work, data fields should be mapped to the object so that the mapped data can be written to the destination.
Configure the source object and place it onto the designer next to the Delimited File Destination object.
Note: In this case a Customers sample table has been used from a Database Table Source to write to the Delimited File Destination.
Now map the source object to the destination object. The following ways can be used for mapping:
By dragging and dropping the parent nodes onto each other for automatic mapping.
By mapping the source parent node by dragging it to the destination parent node manually.
By directly writing the source layout to a Delimited File Destination through the source context menu of its parent node.
The fields are now mapped.
To configure the Delimited File Destination object, right-click on the header, select Properties from the context menu and a dialog box will open.
Select the Properties option from the context menu and a dialog box will open.
Provide the File Path. This is where the delimited destination file will be saved.
The dialog box has some other configuration options. Let’s go over these options:
Options:
If the source File Contains Header, check the box to read headers from the source file.
Field Delimiter - Allows you to select a delimiter from the drop-down list for the fields.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return and line-feed combination, carriage-return and line-feed. You can also type the record delimiter of your choice instead of choosing the available options.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
A Text Qualifier is a symbol that identifies where text begins and ends. It is used specifically when importing data.
Apply Text Qualifier to all Fields will add the specified qualifier to all the fields that have been mapped
Say you need to import a text file that is comma delimited (commas separate the different fields that will be placed in adjacent cells).
Use Null Text to specify a certain value that you do not want in your data and want it to be replaced by a null value.
Choose Append to File (If Exists) to append to an existing file or create a new file. Creating a new file will overwrite any existing file data.
Check on Hierarchical Destination when the data in the source file needs to be sorted into hierarchies in the destination file.
Check on Write to Multiple Files for the data to be saved to multiple files instead of one single file. This can be done within a single dataflow through the destination object and supporting transformations.
Once the data reading options have been specified in this window, click Next.
The next window is the Layout Builder. On this window, the layout of the delimited destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added and select the subsequent properties for it. A new field will be added to the source layout.
Note: Adding a new field (Email) to the layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Note: Deleting the Fax field from the layout.
Right-click on the highlighted line, a context menu will appear in which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down keys.
Note: Find the Move up/Move down icons on the top left of the builder.
For example: To move the Country field right below the Region field, select the row and use the Move up key from the 9th row to the 8th row.
Once the object layout is configured, click Next. A new window will appear, Config Parameters, which allows you to further configure and define parameters for the delimited destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: If you want to specify parametrized values for Field/Record Delimiter, make sure to follow the same syntax of delimiters as in the dropdown menus of these options on the Properties window.
Parameters left blank will use their default values assigned on the properties page.
Next a General Options window will appear. In this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The DelimitedDest object is now configured according to the changes that were made in the properties window.
Note: The changes that were made in this case are:
Added a new field Email in the layout.
Moved the Country field below the field Region field.
The Delimited File Destination object is successfully configured, and the destination file can now be created by running the dataflow.
The Database Table Destination object in Astera provides the functionality to write data to a database table. This destination option provides a great deal of control over how data is written to a database table with its extended data loading options.
Astera supports a wide range of on-premise and cloud-based databases including SQL Server, Oracle, DB2, Sybase, MySQL, Salesforce, Microsoft Dynamics CRM, and more. Astera delivers highly-optimized implementations for these database connectors including high-performance bulk insert, set-based updates and transaction management. This, combined with Astera’s parallel-processing architecture, delivers industrial-strength performance and scalability.
To add a Database Table Destination object to your dataflow, go to Toolbox > Destinations > Database Table Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Database Table Destination object onto the designer.
If you look at it, the Database Table Destination object is empty right now, as in, it does not have any fields or mappings. This is because the object has not been configured yet. There are two empty sub-nodes for Input and Output under the DatabaseDest root node.
To configure the properties of the Database Table Destination object, right-click on the header and select Properties from the context menu.
This will open a new window, Database Connection, in Astera.
First, you will need to select the relevant data provider from the Data Provider drop-down list.
This is where you select the specific database provider you want to connect to. For instance, if you want to write you data to a SQL database, select SQL Server from the list. The connection details will vary according to the data provider selected.
Test Connection to make sure that your database connection is successful and click Next.
Now, you need to provide details to configure a connection with the destination database.
Enter your User ID and Password. You can also use the Recently Used drop-down list to connect to a recently-connected database.
The next window is the Pick Table window. Here, you can choose from the following options:
Pick Table: To append data into an existing table.
Create/Replace: To write data to a new table or replace an existing table.
Truncate Table: To overwrite data in an existing table.
Choose the option based on your requirement. In this case we will select the Create/Replace Table option and create a new table in the database.
Note: We will be creating a new table CustomerDetails.
For a database destination object to work, data fields should be mapped to the object so that the mapped data can be written to the destination.
Configure the source object and place it onto the designer next to the Database Table Destination object.
Note: In this case a Customers sample data, coming in from an Excel Workbook Source, will be written to a Database Table Destination.
Map the source object to the destination object. Data mapping from source to the destination can be done in the following ways:
By dragging and dropping the parent node from the source object onto the destination object.
By mapping the output port of the source object onto the input port of the destination object.
By right-clicking on the parent node inside the source object and selecting Write to > Database Table Destination from the context menu.
The fields are now mapped.
The Pick Table window has some other configuration options.
Define Input Ports for Mapping
Single Port: Works only for the records that have been updated and won’t treat records individually.
Individual Ports for Actions: Works for all the records individually as per the selected action. The actions that are provided are: Insert, Delete, Update and Upsert.
Database Options
Use constraint based write: When the layout has certain restrictions that you want to keep while writing.
Preserve system generated key values: To generate unique values for the selected primary key in the dataset. This option is only available if you assign at least one field in your destination layout as the System Generated field.
Use transaction
Always commit transaction on completion: When you want the whole transaction to be processed regardless of errors.
Rollback if there are any errors: When you don’t want the dataset to process in case of errors and roll back completely.
Check field lengths for the incoming dataset with the lengths defined in the destination layout.
Check for null values from the incoming dataset.
Write null string as zero length strings where the string values are null, their length will be defined as zero.
Write null numeric values as zeros for the numeric data types, the null values will be defined as zeros.
Disable indexes during load for lengthy processing.
Data Load Options
Bulk insert with batch size when you want the whole dataset to be loaded in batches for the specified size. Typically, larger batch sizes result in better transfer speeds; however performance gains may be less with relatively large batch sizes.
Bulk insert with all records in one batch when you want all the records to be loaded into a table in one batch. In this case, any database specific error in your transfer won’t show until the end of the transfer.
Use single record insert when you want records to be loaded individually. Records are inserted into a destination table one-by-one. This loading option renders the slowest performance among the three insert types. However, any errors or warnings during the transfer are displayed immediately as the transfer progresses.
Bulk Copy Options
Use Internal Transaction: When specified, each batch of the bulk-copy operation will occur within a transaction.
Fire Triggers: When specified, it will cause the server to fire the insert triggers for rows being inserted into the database.
Keep Nulls: Preserves null values in the destination table regardless of the settings for default values. When not specified, null values are replaced by default values where applicable.
Table Lock: Obtain a bulk update lock for the duration of the bulk copy operation. When not specified, row locks are used.
Check Constraints: Check constraints while data is being inserted. By default, constraints are not checked.
Keep Identity: Preserve source identity values. When not specified, identity values are assigned by the destination.
Default: Use the default values for all options.
Parallel Writing is used when you want to expedite the data loading process by increasing the number of writers for that dataset.
Once you have specified your options on this screen, click Next.
The next window you will see is the Layout Builder. Here, the layout of the database destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added, and select the properties for it. A new field will be added to the destination table’s layout.
Note: In this example, we will add a new field AccountType to the layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Note: Here, we are deleting the Fax field from the layout.
Right-click on the highlighted line, a context menu will appear which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down buttons.
Note: Find the Move up/Move down icons on the top left of the builder.
For example: To move the Country field right below the Region field, select the row and click the Move up button in the toolbar at the top, to move the field up from the 9th to the 8th position.
Once the object layout is configured, click Next. This will take you to the Config Parameters window where you can further configure and define parameters for the database destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change during the runtime.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next. A General Options window will appear. Here you have the following options:
Comments can be added.
General Options are given, which relate to the processing of records in the destination file.
Clear Incoming Record Messages for any messages coming in from objects preceding the current object to be cleared.
Do Not Process Records With Errors will not let erroneous records to process further for the output.
Do Not Overwrite Default Values with Nulls makes sure that values are not overwritten with null values in the output.
Click OK.
The DatabaseDest object is now configured according to the settings made in the properties window.
Note: The changes that were made in this case are:
Moved the Country field below the field Region field.
The Database Table Destination object is now successfully configured, and the destination file can now be created by running the dataflow.
The job can be traced through the Job Progress window once the job starts running.
Astera’s Fixed Length File Destination provides the functionality to write data to a Fixed Length File.
To get a Fixed Length File Destination object from the Toolbox, go to Toolbox > Destinations > Fixed Length File Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Fixed Length File Destination object onto the designer.
The dragged destination object is empty right now. This is because the object has not been configured yet.
For the Fixed Length File Destination object to work, it needs to be provided with a data source.
Configure the source object and place it onto the designer next to the Fixed Length File Destination object.
Note: In this case we will be using a Customers sample table from the Database Table Source to write to the Fixed Length File Destination.
Now map the source object to the destination object. The following ways can be used for mapping:
i. By dragging and dropping the parent nodes onto each other for automatic mapping.
ii. By mapping the source parent node by dragging it to the destination parent node manually.
iii. By directly writing the source layout to a Fixed Length File Destination through the source context menu of its parent node.
The fields are now mapped.
To configure the Fixed Length File Destination object, right-click on the header, select Properties from the context menu, and a dialogue box will open.
Provide the File Path. This is where the fixed length destination file will be saved.
The dialog box has some other configuration options. Let’s go over these options:
Options:
If the First Row Contains Header, check the box to read headers from the source file.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination, carriage-return and line-feed. The record delimiter can also be of choice, instead of choosing from the available options.
In case the records don’t have a delimiter, the Record Length field is used to specify the character length for a single record.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Choose Append to File (If Exists) to append to an existing file or create a new file. Creating a new file will overwrite any existing file.
Check the Write to Multiple Files option for the data to be saved to multiple files instead of one single file. This can be done within a single dataflow through the destination object and supporting transformations.
To define hierarchical file layout and process the data file as a hierarchical file check the This is a Hierarchical File option. Astera IDE provides extensive user interface capabilities for processing hierarchical structures.
Once the data reading options have been specified on this window, click Next.
The next window is the Layout Builder. On this window, the layout of the fixed length destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added and select the subsequent properties for it. A new field will be added to the source layout.
Note: Adding a new field (Email) to the layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Note: Deleting the Fax field from the layout.
Right-click on the highlighted line, a context menu will appear which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down keys.
Note: Find the Move up/Move down icons on the top left of the builder.
For example: To move the Country field right below the Region field, select the row and use the Move up key to from the 9th row to the 8th row.
The row is now moved from the 9th position to the 8th position.
Once the object layout is configured, click Next. A new window will appear, Config Parameters, which allows to further configure and define parameters for the fixed length destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
A General Options window will appear. On this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The FixedDest object is now configured according to the changes that were made in the properties window.
Note: The changes that were made in this case are:
Added a new field, Email in the layout.
Moved the Country field below the Region field.
The Fixed Length File Destination object is successfully configured, and the destination file can now be created by running the dataflow.
Added a new field AccountType in the layout and mapped it’s value from the object.
The Excel Workbook Destination object in Astera provides the functionality to write data to Microsoft Excel workbooks. An important thing to note here is that, it is not necessary to have Microsoft Excel installed on the machine for the excel destination object in Astera to work. The feature gives you the option to specify the worksheet and the starting cell where the data write begins.
To get the object from the Toolbox, go to Toolbox > Destinations > Excel Workbook Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Excel Workbook Destination object onto the designer.
The dragged destination object is empty right now. This is because the data fields are not mapped to it yet. In this case, we will use a simple source to excel destination mapping scenario as an example.
Configure the source object and place it onto the designer next to the Excel Workbook Destination object.
Note: We are using a sample table containing Customers data from an SQL database.
Now map the data fields from the source object to the destination object. Mapping can be done in the following ways:
i. By dragging and dropping the parent node of the source object onto the parent node of the destination object for auto-mapping the layout.
ii. By creating a map from the source parent node to the destination parent node.
iii. By directly writing the fields in the source layout to an Excel Destination through the source context menu of its parent node.
The fields are now mapped.
To configure the Excel Workbook Destination object, right-click on the header, select Properties from the context menu and a dialog box will open.
Provide the File Path. This is where the excel destination file will be saved.
The dialog box has some other configuration options. Let’s go over these options:
Options:
If the First Row Contains Header, check this option to read headers from the source file.
The Worksheet field can be used to specify the name of a worksheet for either overwriting the data in an already existing worksheet or adding a new worksheet.
Choose Append to File (If Exists) to append to an existing file or create a new file. Creating a new file will overwrite any existing file.
Check on Write to Multiple Files for the data to be saved to multiple files instead of one single file. This can be done within a single dataflow through the destination object and supporting transformations.
Once the data reading options have been specified on this screen, click Next.
The next window is the Layout Builder. On this window, the layout of the excel destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added and select the subsequent properties for it. A new field will be added to the source layout.
Note: Adding a new field (Email) to the layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Note: Deleting the Fax field from the layout.
Right-click on the highlighted line, a context menu will appear which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down keys.
Note: Find the Move up/Move down icons on the top left of the builder.
For example: To move the Country field right below the Region field, select the row and use the Move up key to from the 9th row to the 8th row.
The row is now moved from the 9th position to the 8th position.
Once the object layout is configured, click Next. A new window will appear, Config Parameters, which allows to further configure and define parameters for the excel destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
A General Options window will appear. On this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The ExcelDest object is now configured according to the changes that were made in the properties window.
Note: The changes that were made in this case are:
Added a new field, Email in the layout.
Moved the Country field below the Region field.
The Excel Workbook Destination object is successfully configured, and the destination file can now be created by running the dataflow.
The SQL Statement Destination in Astera offers extra flexibility over other destination objects by providing the option to apply a custom INSERT, UPDATE, or DELETE SQL statements to control what will be written to the destination table. The object can also be used to call stored procedures. Moreover, you can parameterize your SQL statement using the Parameterize Replacement functionality.
In this article, we will be looking at how you can configure and use the SQL Statement Destination object in Astera.
Before moving on to the actual configuration, we will have to get an SQL Statement Destination object from the Toolbox. To do so, go to Toolbox > Sources > SQL Statement Destination. In case you are unable to view the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the SQL Statement Destination object onto the designer.
The destination object is currently empty because we are yet to map any data fields to it.
To configure the SQL Statement Destination object, right-click on its header and select Properties from the context menu. Alternatively, you can double-click the header of the destination object to go to its Properties.
A new window will open when you click on Properties from the context menu.
Here, you need to configure the properties for the SQL Statement Destination object.
On the Database Connection window, enter the details for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you wish to connect to. The required credentials will vary according to your chosen provider.
Provide the required credentials. Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A separate window will appear, showing whether your test is successful. Close it by clicking OK, and then click Next.
The next window will present a blank page for you to enter an appropriate SQL statement for the required outcome. This can consist of an INSERT, UPDATE, or DELETE statement that manipulates the data being written to the database.
The curly brackets on the right side of the window indicate that the use of parameters is supported, which implies that you can replace a regular value with a parameterized value that can be changed during runtime.
In this use-case, we will be inserting new records into an existing table, named TESTTABLE, that has three columns: OrderID, CustomerID, and EmployeeID.
Notice the @ symbol in front of a field name. This makes the field appear in the field list inside the object box so that the field can be mapped. The fields that do not have a @ symbol in front of them will not show in the list of fields, but they can still receive values according to the logic of the SQL statement itself. String fields need to be surrounded by single quotes, whereas Integer fields do not. In this case, CustomerID is a String field, while OrderID and EmployeeID are Integer fields.
The Database Options given at the bottom of the window provide support for transaction management. Checking the Use Transaction option will enable two other sub-options:
Always commit transaction on completion: Ensures that the job is completed regardless of any erroneous records.
Rollback if there are any errors: Aborts the job in case of one or more erroneous records.
Once you have entered an SQL statement and chosen your desired option, click Next.
On the new Config Parameters window, you can define certain parameters for the SQL Statement Destination object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
At the end, a General Options window will appear. Here, you are provided with:
A text box to add Comments.
A set of General Options related to the processing of records.
To conclude the configuration, click OK.
For a destination object to work, data fields must be mapped to it from a source. In this case, we will be using an SQL Query Source object to get data from the Orders table in the Northwind database.
Configure the source object and place it next to the SQL Statement Destination object.
Map the required data fields from the source object to the destination object. This can be done in the following ways:
By dragging and dropping the parent node of the source object onto that of the destination object.
By individually dragging and dropping the required fields from the source object onto their respective nodes in the destination object.
To preview the output, right-click on the destination object’s header and select Preview Output from the context menu. In this case, you will see the following result:
You can now write data to the destination table by running the dataflow.
This is how we use the SQL Statement Destination object in Astera.
Astera’s XML/JSON File Destination object provides the functionality to write data to an XML or JSON file when the data is in hierarchical format.
In order to understand how to write to an XML/JSON File Destination object, we will take help of a use-case where we will convert flat datasets to an hierarchical set, and then write the transformed data to an XML file.
Customers and Orders data from database tables will be used as source objects. We will then join them using the TreeJoin Transformation to create a hierarchical dataset.
Learn more about how the Tree join Transformation works in Astera.
To get an XML/JSON File Destination object from the Toolbox, go to Toolbox > Destinations > XML/JSON File Destination and drag-and-drop it onto the designer. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
The dragged destination object is empty right now. This is because the object has not been configured yet.
Note: In this case we will be using the use case discussed earlier, taking the Treejoin as the source for data to write to the XML/JSON File Destination.
A schema file is needed to write data to an XML/JSON File Destination. To create the schema file, right click on the source object (Treejoin transformation in this case), a context-menu will appear, select the option Generate XML Schema for layout.
A new dialog box will open where you will be asked to save the XML schema file. Give the File Name and file path for the schema file and click Save.
The schema file has been created in the specified location. To view this file, go to the file location and open the file in Astera.
The opened file would look like the one below:
To configure the XML/JSON File Destination object, right-click on its header and select Properties from the context-menu.
A window, Destination XML File, will open. Here we will specify the file locations; File Path and Schema File for the XmlJsonDest object.
The destination file will now be saved in the provided file location. Click OK, and map the destination object to the source object before further configuration.
The XmlJsonDest object will now have the layout of the source object (Treejoin Transformation in this case).
To map the source object to the destination object, the following ways of mapping can be used:
By dragging and dropping the parent node of the source object (TreeJoin node in the TreeJoin table) onto the child node of the destination object (TreeJoin node in the XmlJsonDest object) for automatic mapping.
By manually mapping the source parent node (TreeJoin in the TreeJoin table) by dragging it to the respective destination child node (TreeJoin in the XmlJsonDest object).
The fields are now mapped.
Once the file locations have been specified and the mappings have been done, further properties can be defined.
XML Layout
The next window after the Destination XML File window is the XML Layout window.
This window shows the XML layout for the XmlJsonDest object. The collection nodes for the object can be seen in this window with their fields.
Config Parameters
Click Next, and a window, Config Parameters, will open, which will allow us to further configure and define parameters for the XML/JSON Destination object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
General Options
Click Next, and a new window, General Options, will open. On this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The XmlJsonDest object has been successfully configured and the destination file can now be created by running the dataflow.
MongoDB is a document-oriented database in which one collection holds different documents. The MongoDB Destination object in Astera Data Stack, provides functionality to write data onto it. This component provides functionality to control how data should be written in collections.
While writing data, the number of fields, content, and size of the document can differ from one document to another. This can be easily catered to by including configurations on write concerns that describe level of acknowledgment from MongoDB for write operations.
MongoDB is mainly used for Big Data.
The MongoDB Destination object can be used to map incoming data from a source to the MongoDB server. MongoDB makes it easier for users to store structured and unstructured data both.
For our use case, we already have an XML/JSON Source object configured in a Dataflow.
To start, drag-and-drop the MongoDB Destination object from the Destinations section of the Toolbox onto the Dataflow.
Right-click on the MongoDB Destination object and select Properties from the context menu.
This will open a new window.
User Name: This is where we enter the user name of the MongoDB or Local Server
Password: The password of the MongoDB or Local Server is used here
Primary Server Name: The name of the primary cluster is used here
Database: This is where we select the database on which we wish to write the data.
Authentication Database: This is the database used for authentication.
Port: The port is used to handle incoming and outgoing requests to the server
Enable Set of Replica: Selecting this checkbox allows the use of a secondary cluster
Secondary Server Name: This name of the secondary cluster is used here
Use TLS: Select this option if the server requires TLS security.
Once your credentials have been filled, test the connection, and click Next.
For our use case, we have input the credentials to use the MongoDB Destination for our local server.
We will now be taken to the MongoDB Pick Collection screen.
For our use case, we will select Create/Replace and add a new Collection.
Database Operations – These operations are used when we are picking an already existing collection.
Insert: To insert a new record into the collection.
Update: To update an existing record in the collection.
Delete: To delete a record from the collection.
Upsert: To insert and update a record in the collection.
Select Fields for matching database records: Selecting from this drop-down menu lets the user select fields based on which to match the records for the selected database operation.
Write Concern Options – Selecting from these options lets the server provide an acknowledgment to the user based on how the process was carried out.
ACKNOWLEDGED: This will return an acknowledgment in the Job trace window if the process stops after getting an error or if the process successfully completes.
UNACKNOWLEDGED: This option will not return an acknowledgment, no matter how the data write is carried out.
MAJORITY: If there are multiple primary and secondary servers, this option will return when the majority of the servers have been processed.
W1: Selecting this option will return an acknowledgment when the primary server has been processed.
W2: Selecting this option will return an acknowledgment when the primary server and one secondary server have been processed.
W3: Selecting this option will return an acknowledgment when the primary server and two secondary servers have been processed.
Data Load Options – These options let the user define how the data is going to be loaded into the database.
Bulk insert with batch size: This will insert all records divided into batch sizes that the user has defined.
Bulk insert with all records in one batch: This will insert all records in a single batch.
Use single record insert: This option will treat every record individually and insert them one by one.
Select Type of Bulk Insert: Selecting from this drop-down menu lets the user define whether the Bulk Insert will be Ordered or UnOrdered.
In the case of Ordered, data writing will be stopped if an error is encountered between record insertions.
In the case of UnOrdered, data writing will continue despite any errors being encountered.
Click Next and you will be led to the MongoDB Layout screen.
Currently, our layout is empty since we have not mapped any fields to it.
We will map the incoming fields from the XML/JSON Source object to the MongoDB Destination object.
We will then reopen the MongoDB Layout screen.
As you can see below, the entire layout has now been defined.
Click OK and the MongoDB Destination object will be configured.
Select the Start Dataflow option in the main toolbar and the data will be written to the destination.
As you can see in the Job Progress window, the data has been successfully written to the destination.
Note: You can see the acknowledgment in the window since we selected the respective option.
This concludes the configuration of the MongoDB Destination object in Astera.
The Excel Workbook Report object in Astera is designed to tabulate information from selected fields and present the results in a one- or two-dimensional matrix. This feature enables deeper analysis of data by organizing it in a way that facilitates the identification of trends, patterns, and insights.
To get the object from the Toolbox, go to Toolbox > Destinations > Excel Workbook Report. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Excel Workbook Report object onto the designer.
The dragged report object is empty right now. This is because the data fields are not mapped to it yet. While any source can be used, for this particular use case, we will demonstrate using a Report Source that is extracting data from a PDF source file.
Configure the source object and place it onto the designer next to the Excel Workbook Report object.
Note: We are using the data that we extracted from a customer invoice with the help of Astera's report modeling tool.
Now map the data fields from the source object to the report object.
To configure the Excel Workbook Report object, right-click on the header, select Properties from the context menu and a dialog box will open.
Provide the File Path. This is where the excel report file will be saved.
The dialog box has some other configuration options such as worksheet and start address, which work just like that of Excel Workbook Destination.
Once The File Path and Data reading options have been specified on this screen, click Next.
The next window is the Layout Builder. On this window, the layout of the excel report file can be modified.
Here, you can write names of fields as you want them to appear in your destination in the Header column and specify the relevant Aggregate Functions for them.
Aggregate Functions define how the data will be summarized in the report:
Group By: Groups records based on unique values in the specified field.
Sum: Calculates the total sum of the specified field.
Count: Counts the number of records.
Average: Calculates the average value of the specified field.
Max: Finds the maximum value in the specified field.
Min: Finds the minimum value in the specified field.
First: Returns the first record in a sorted list based on the specified field.
Last: Returns the last record in a sorted list based on the specified field.
Variance: Calculates the variance of the specified field.
Standard Deviation: Calculates the standard deviation of the specified field.
None: Includes the field in the report without applying any aggregation. This is useful when you want certain field values in the data lines but don’t want to apply any aggregation on them.
For this case:
AccountID: We will select the Group By option from the Aggregate Function drop-down list for this field as we want to group the records based on individual accounts.
OrderID: We will select the Group By option from the Aggregate Function drop-down list for this field want to see orders within each account.
TOTAL: For this field we will select the Aggregate Function Sum, to calculate the total amount per order.
QUANTITY: For this field we will select the Aggregate Function Sum, to calculate the total quantity per order.
ITEM: Since we want to show item names in the data lines but do not want to apply any aggregates on them, we will select Aggregate Function None.
The same approach will be applied to the ITEM CODE, DESCRIPTION, and PRICE fields. We will select Aggregate Function None for each of these fields to ensure that their specific values are displayed in separate data lines without any aggregation.
Click Next. The Report Options window will now open.
Report Type: You can select from three report types. Summary, Cross Tab or Time Series.
Note: For this use case we have chosen Summary as the Report Type.
Title: To provide a meaningful title to your report, enter a new title into the Title field.
Subtotal Text: You can specify the name for Subtotal field.
Grand Total Text: You can specify the name for Grand total field.
Enable Case Sensitive Match – Check this option if you want to Group your data on a case sensitive basis. For example, if you have customer names like "john" and "John," enabling this option will treat them as distinct groups rather than combining them into a single group.
Style
You can also modify the style of your report.
Show Data Lines: If you want to see the subtotals and grand totals along with the actual records of data then you check this option.
Insert Blank Line Before Grand Total: Inserts a blank line before the grand total in the report.
Write Grand Total: Adds the grand total to the report. If unchecked, the grand total won't be included.
Insert Blank Line Before Subtotal: Inserts a blank line before each subtotal in the report.
Insert Blank Line After Subtotal: Inserts a blank line after each subtotal in the report.
Click Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
The Excel Report object is successfully configured, and the report file can now be created by running the dataflow.
Below, you can see a sample of how the summary report appears.
Note: If you want to see only the subtotals and the grand total you can uncheck the Show Data Lines option in the Report Options wizard. This will display only the summarized totals without individual records.
A Crosstab Summary displays summarized information about two fields in a two-dimensional matrix. The values for one field are displayed down the left-most column of the matrix and the values for the other key field are displayed across the top row as columns. This two-dimensional arrangement displays only a single measure at a time.
Let’s see how we can make a Cross Tab Summary using Excel Report Source.
Since we want to use information from two tables (order and order details) we have joined them and to calculate the total we used an expression object. We can then map data fields we needed from both to the Excel Report Object.
For this case:
CustomerID: We will select the Group By option from the Aggregate Function drop-down list for this field as we want to group the records based on individual customers.
ProductID: We will select the None option from the Aggregate Function drop-down list for this field as we want to spread the Product Values to spread across the top row(columns).
Total: We will select the Sum option from the Aggregate Function drop-down list for this field as we want totals for each type of product and the totals for each customer.
Note: For this use case we have chosen Cross Tab as the Report Type.
Title: To provide a meaningful title to your report, enter a new title into the Title field.
Subtotal Text: You can specify the name for Subtotal field.
Row Total Text: You can specify the name for Row total field.
Grand Total Text: You can specify the name for Grand total field.
Enable Case Sensitive Match – Check this option if you want to Group your data on a case sensitive basis.
Crosstab Options
Column Field: You can select which field or attribute you want to use as the column headers in the resulting cross-tabulation report.
Note: The selected column field should have Aggregate Function None selected in the layout builder.
Row Totals – Check this Option if you want to add each rows total to your report.
Style
You can also modify the style of your report.
Add Grand Total: Inserts the grand total in the report.
Add Blank Line Before Grand Total: Inserts a blank line before the grand total in the report.
Add Row Total: Inserts the row totals in the report.
Add Blank Line After Subtotal: Inserts a blank line after each subtotal in the report.
Click on Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
After defining the options, Click Ok.
The Excel Report object is successfully configured, and the report file can now be created by running the dataflow.
Below, you can see a sample of how the summary report appears. The summary in the table shows the sales data for different products purchased by various customers, identified by their CustomerID.
Time Series summary displays summarized information about two key fields in a two - dimensional matrix. The values for one field are displayed down the left most column of the matrix and the time intervals (such as days, months, quarters, or years) are displayed across the top row as columns.
Let’s see how we can make a Time Series Summary using Excel Report Source.
Since we want to use information from two tables (order and order details) we have joined them and to calculate the total we used an expression object. We can then map data fields we needed from both to the Excel Report Object.
For this case:
CustomerID: We will select the Group By option from the Aggregate Function drop-down list for this field as we want to group the records based on individual customers.
OrderDate: We will select the None option from the Aggregate Function drop-down list for this field as we want to use this date field across the top row(columns).
Total: We will select the Sum option from the Aggregate Function drop-down list for this field as we want the totals for each customer.
Note: For this use case we have chosen Time Series as the Report Type.
Title: To provide a meaningful title to your report, enter a new title into the Title field.
Subtotal Text: You can specify the name for Subtotal field.
Row Total Text: You can specify the name for Row total field.
Grand Total Text: You can specify the name for Grand total field.
Enable Case Sensitive Match – Check this option if you want to Group your data on a case sensitive basis.
Timeseries Report Options
Time Unit Drop-down: You can specify the time interval for the time series analysis. Available options include:
Year: Analyze data on a yearly basis.
Month: Analyze data on a monthly basis.
Day: Analyze data on a daily basis.
Week: Analyze data on a weekly basis.
Quarter: Analyze data on a quarterly basis.
Start Date: You can specify the start date for the time series analysis. This defines the beginning of the time period for which data will be analyzed.
End Date: You can specify the end date for the time series analysis. This defines the end of the time period for which data will be analyzed.
Date Field: Field from the dataset that contains the date or timestamp information. The selected date field will be used to create the time series.
Style
You can also modify the style of your report.
Add Grand Total: Inserts the grand total in the report.
Add Blank Line Before Grand Total: Inserts a blank line before the grand total in the report.
Add Row Total: Inserts the row totals in the report.
Add Blank Line After Subtotal: Inserts a blank line after each subtotal in the report.
Click on Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
After defining the options, Click Ok.
The Excel Report object is successfully configured, and the report file can now be created by running the dataflow.
Below, you can see a sample of how the summary report appears. This summary table shows the number of sales across different years for customers, identified by their CustomerID.