Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Database Table Source object provides the functionality to retrieve data from a database table. It also provides change data capture functionality to perform incremental reads, and supports multi-way partitioning, which partitions a database table into multiple chunks and reads these chunks in parallel. This feature brings about major performance benefits for database reads.
The object also enables you to specify a WHERE clause and sort order to control the result set.
In this article, we will be discussing how to:
Get a Database Table Source object on the dataflow designer.
Configure the Database Table Source object according to the required layout and settings.
We will also be discussing some best practices for using a Database Table Source object.
To get a Database Table Source from the Toolbox, go to Toolbox > Sources > Database Table Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Database Table Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Database Table Source object, right-click on its header and select Properties from the context menu.
A dialog box will open.
This is where you can configure the properties for the Database Table Source object.
The first step is to specify the Database Connection for the source object.
Provide the required credentials. You can also use the Recently Used drop-down menu to connect to a recently connected database.
You will find a drop-down list next to the Data Provider.
This is where you select the specific database provider to connect to. The connection credentials will vary according to the provider selected.
Test Connection to make sure that your database connection is successful and click Next.
Next, you will see a Pick Source Table and Reading Options window. On this window, you will select the table from the database that you previously connected to and configure the table from the given options.
From the Pick Table field, choose the table that you want to read the data from.
Note: We will be using the Customers table in this case.
Once you pick a table, an icon will show up beside the Pick Table field.
View Data: You can view data in a separate window in Astera.
View Schema: You can view the schema of your database table from here.
View in Database Browser: You can see the selected table in the Database Source Browser in Astera.
Table Partition Options
This feature substantially improves the performance of large data movement jobs. Partitioning is done by selecting a field and defining value ranges for each partition. At runtime, Astera generates and runs multiple queries against the source table and processes the result set in parallel.
Check the Partition Table for Reading option if you want your table to be read in partitions.
You can specify the Number of Partitions.
The Pick Key for the Partition drop-down will let you choose the key field for partitioning the table.
If you have specific key values based on which you want to partition the table, you can use the Specify Key Values (Separated by comma) option.
The Favor Centerprise Layout option is useful in cases where your source database table layout has changed over time, but the layout built in Astera is static. And you want to continue to use your dataflows even with the updated source database table layout. You check this option and Astera will favor its own layout over the db layout.
Incremental Read Options
The Database Table Source object provides incremental read functionality based on the concept of audit fields. Incremental read is one of the three change data capture approaches supported by Astera. Audit fields are fields that are updated when a record is created or modified. Examples of audit fields include created date time, modified date time, and version number.
Incremental read works by keeping a track of the highest value for the specified audit field. On the next run, only the records with value higher than the saved value are retrieved. This feature is useful in situations where two applications need to be kept in sync and the source table maintains audit field values for rows.
Select Full Load if you want to read the entire table.
Select Incremental Load Based on Audit Fields to perform an incremental read. Astera will start reading the records from the last read.
Checking the Perform full load on next run option will override the incremental load function from the next run onwards and will perform a full load on it.
Use Audit Field to compare when the last read was performed on the dataset.
Specify the path to the file in the File Path, that will store incremental transfer information.
The next window is the Layout Builder. In this window you can modify the layout of your database table.
Note: By default, Astera reads the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear in which you will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
Note: Modifying the layout (adding or deleting fields) from the Layout Builder in Astera will not make any changes to the actual database table. The layout is only specific to Astera.
If you want to change the position of any field and want to move it below or above another field in the layout, you can do this by selecting the row and using the Move up/Move down keys.
Note: You will find the Move up/Move down icons on the top left of the builder.
For example: We want to move the Country field right below the Region field. We will select the row and use the Move up key to move the field from the 9th row to the 8th.
After you are done customizing the Layout Builder, click Next. You will be taken to a new window, Where Clause. Here, you can provide a WHERE clause, which will filter the records from your database table.
Note: If the wizard is left blank, Astera will use the default values of the database table.
For instance, if you add a WHERE clause that selects all the customers from the country “Mexico” in the Customers table.
Your output will be filtered out and only the records that satisfy the WHERE condition will be read by Astera.
Once you have configured the Database Table Source object, click Next.
A new window, Config Parameters will open. Here, you can define parameters for the Database Table Source object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters can be changed in the Config Parameters wizard page. Parameters left blank will use their default values assigned on the properties page.
Click OK.
You have successfully configured your Database Table Source object. The fields from the source object can now be mapped to other objects in a dataflow.
To get the Database Table Source object from the Data Source Browser, go to View > Data Source Browser or press Ctrl + Alt + D.
A new window will open. You can see that the pane is empty right now. This is because we are not connected to any database source yet.
To connect the browser to a database source, click on the first icon located at the top left corner of the pane.
A Database Connection box will open.
This is where you can connect to your database from the browser.
You can either connect to a Recently Used database or create a new connection.
Note: In this case we will use one of our recent connections.
To create a new connection, select your Data Provider from the drop-down list.
Note: We will be using the SQL Server in this case.
The next step is to fill in the required credentials. Also, to ensure that the connection is successfully made, select Test Connection.
Once you test your connection, a dialog box will indicate whether the test was successful or not.
Click OK.
Once you have connected the browser, your Data Source Browser will now have the databases that you have on your server.
Select the database that you want to work with and then choose the table you want to use.
Note: In this case we will be using the Northwind database and Customers table.
Drag-and-drop Customers table onto the designer in Astera.
If you expand the dropped object, you will see that the layout for the source file is already built. You can even preview the output at this stage.
Right-clicking on the Database Table Source object will also display options for the database table.
Show in DB Browser - Will show where the table resides in the database in the Database Browser.
View Table Data - Builds a query and displays all the data from the table.
View Table Schema - Displays the schema of the database table.
Create Table - Creates a table on a database based on the schema.
The Email Source object in Astera enables users to retrieve data from emails and process the incoming email attachments.
In this section, we will cover how to get the Email Source object onto the dataflow designer from the Toolbox.
To get an Email Source object from the Toolbox, go to Toolbox > Sources > Email Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Email Source object onto the designer.
You can see some built-in fields and an Attachments node.
Double-click on the header of the Email source object to go to the Properties window
A configuration window for the Email Source object will open. The Email Connection window is where you will specify the connection details.
Url: The address of the mail server on which the connection will be configured.
Login Name: The Hostname
Password: Password of the user.
Port: The port of the mail server on which to configure. Some examples of SMTP provider ports are; 587 for outlook, 25 for Google, etc.
Connection Logging: Connection logging is used to log different types of messages or events between the client and the server. In case of error or debugging purposes, the user can see them.
Astera supports 4 types of Connection Logging methods:
Verbose: Captures everything.
Debug: Captures only the content that can be used in debugging.
Info: Captures information and general messages.
Error: Captures only the errors.
If you have configured email settings before, you can access the configured settings from the drop-down list next to the Recent option. Otherwise, provide server settings for the mailing platform that you want to use. In this case, we are using an Outlook server.
Test your connection by clicking on Test Connection, this will give you the option to send a test mail to the login email.
Click Next. This is the Email Source Properties window. There are two important parts in this window:
Download attachment options
Email reading options
Check the Download Attachments option if you want to download the contents of your email. Specify the directory where you want to save the email attachments, in the provided field next to Directory.
The second part of the Email Source Properties window has the email reading options that you can work with to configure various settings.
Read Unread Only – Check this option if you only want to process unread emails.
Mark Email as Read – Check this option if you want to mark processed emails as read.
Folder – From the drop-down list next to Folder, you can select the specific folder to check, for example, Inbox, Outbox, Sent Items etc.
Filters - You can apply various filters to only process specific emails in the folder.
From Filter: Filters out emails based on the sender’s email address.
Subject Filter: Filters out emails based on the text of the subject line.
Body Filter: Filters out emails based on the body text.
Click OK.
Right-click on the Email Source object’s header and select Preview Output from the context menu.
A Data Preview window will open and will show you the preview of the extracted data.
Notice that the output only contains emails from the email address specified in the Filter section.
Astera Data Stack can read data from a wide range of file sources and database providers. In this article, we have compiled a list of file formats, data providers, and web-applications that are supported for use in Astera Data Stack.
Amazon Aurora
Azure SQL Server
MySQL
Amazon Aurora Postgres
Amazon RDS
Amazon Redshift
DB2
Google BigQuery
Google Cloud SQL
MariaDB
Microsoft Azure
Microsoft Dynamics CRM
MongoDB (as a Source)
MS Access
MySQL
Netezza
Oracle
PostgreSQL
PowerBI
Salesforce (Legacy)
Salesforce Rest
SAP Hana
Snowflake
SQL Server
Sybase
Tableau
Teradata
Vertica
In addition, Astera features an ODBC connector that uses the Open Database Connectivity (ODBC) interface by Microsoft to access data in database management systems using SQL as a standard.
COBOL
Delimited files
Fixed length files
XML/JSON
Excel workbooks
PDFs
Report sources
Text files
Microsoft Message Queue
EDI formats (including X12, EDIFACT, HL7)
Microsoft Dynamics CRM
Microsoft Azure Blob Storage
Microsoft SharePoint
Amazon S3 Bucket Storage
Azure Data Lake Gen 2
PowerBI
Salesforce
SAP
Tableau
AS2
FTP (File Transfer Protocol)
HDFS (Hadoop Distributed File System) n/a
SCP (Secure Copy Protocol)
SFTP (Secure File Transfer Protocol)
SOAP (Simple Object Access Protocol)
REST (REpresentational State Transfer)
Using the SOAP and REST web services connector, you can easily connect to any data source that uses SOAP protocol or can be exposed via REST API.
Here are some applications that you can connect to using the API Client object in Astera Data Stack:
FinancialForce
Force.com Applications
Google Analytics
Google Cloud
Google Drive
Hubspot
IBM DB2 Warehouse
Microsoft Azure
OneDrive
Oracle Cloud
Oracle Eloqua
Oracle Sales Cloud
Oracle Service Cloud
Salesforce Lightning
ServiceMAX
SugarCRM
Veeva CRM
The list is non-exhaustive.
You can also build a custom transformation or connector from the ground up quickly and easily using the Microsoft .NET APIs, and retrieve data from various other sources.
Each source on the dataflow is represented as a source object. You can have any number of sources in the dataflow, and they can feed into zero or more destinations.
The following source types are supported by the dataflow engine:
Flat File Sources:
Tree File Sources:
Database Sources:
Data Model
All sources can be added to the dataflow by picking a source type on the Toolbox and dropping it on the dataflow. File sources can also be added by dragging-and-dropping a file from an Explorer window. Database sources can be drag-and-dropped from the Data Source Browser. For more details on adding sources to the dataflow, see Introducing Dataflows.
Adding a Delimited File Source object allows you to transfer data from a delimited file. An example of what a delimited file source object looks like is shown below.
To configure the properties of a Delimited File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding a Fixed-Length File Source object allows you to transfer data from a fixed-length file. An example of what a Fixed-Length File Source object looks like is shown below.
To configure the properties of a Fixed-Length File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding an Excel Workbook Source object allows you to transfer data from an Excel file. An example of what an Excel Workbook Source object looks like is shown below.
To configure the properties of an Excel Workbook Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding a COBOL File Source object allows you to transfer data from a COBOL file. An example of what a COBOL File Source object looks like is shown below.
To configure the properties of a COBOL File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding an XML/JSON File Source object allows you to transfer data from an XML file. An example of what an XML/JSON File Source object looks like is shown below.
To configure the properties of an XML/JSON File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
General Properties window:
File Path – Specifies the location of the source XML file. Using UNC paths is recommended if running the dataflow on a server.
Schema File Path – Specifies the location of the XSD file controlling the layout of the XML source file.
Note: Astera can generate a schema based on the content of the source XML file. The data types will be assigned based on the source file’s content.
Optional Record Filter Expression – Allows you to enter an expression to selectively filter incoming records according to your criteria. You can use the Expression Builder to help you create your filter expression. For more information on using Expression Builder, see Expression Builder.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.
Adding a Database Table Source object allows you to transfer data from a database table. An example of what a Database Table Source object looks like is shown below.
To configure the properties of a Database Table Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your source, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected source.
Pick Source Table window:
Select a source table using the Pick Table dropdown.
Select Full Load if you want to read the entire table.
Select Incremental Load Based on Audit Fields to perform an incremental read starting at a record where the previous read left off.
Incremental load based on Audit Fields is based around the concept of Change Data Capture (CDC), which is a set of reading and writing patterns designed to optimize large-scale data transfers by minimizing database writing in order to improve performance. CDC is implemented in Astera using Audit Fields pattern. The Audit Fields pattern uses create time or last update time to determine the records that have been inserted or updated since the last transfer and transfers only those records.
Advantages
Most efficient of CDC patterns. Only records that were modified since the last transfer are retrieved by the query thereby putting little stress on the source database and network bandwidth
Disadvantages
Requires update date time and/or create date time fields to be present and correctly populated
Does not capture deletes
Requires index on the audit field(s) for efficient performance
To use the Audit Fields strategy, select the Audit Field and an optional Alternate Audit Field from the appropriate dropdown menus. Also, specify the path to the file that will store incremental transfer information.
Where Clause window:
You can enter an optional SQL expression serving as a filter for the incoming records. The expression should start with the WHERE word followed by the filter you wish to apply.
For example, WHERE CreatedDtTm >= ‘2001/01/05’
General Options window:
The Comments input allows you to enter comments associated with this object.
Adding a SQL Query Source object allows you to transfer data returned by the SQL query. An example of what an SQL Query Source object looks like is shown below.
To configure the properties of a SQL Query Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your SQL Query, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected database.
SQL Query Source window:
Enter the SQL expression controlling which records should be returned by this source. The expression should follow SQL syntax conventions for the chosen database provider.
For example, select OrderId, OrderName, CreatedDtTm from Orders.
Source or Destination is a Delimited File
If your source or destination is a Delimited File, you can set the following properties
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Field Delimiter - Allows you to select the delimiter for the fields. The available choices are , and . You can also type the delimiter of your choice instead of choosing the available options.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Quote Char - Allows you to select the type of quote character to be used in the delimited file. This quote character tells the system to overlook any special characters inside the specified quotation marks. The options available are ” and ’.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Microsoft Excel Worksheet
If the Source and/or the Destination chosen is a Microsoft Excel Worksheet, you can set the following properties:
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Worksheet - Allows you to select a specific worksheet from the selected Microsoft Excel file.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Fixed Length File
If the Source and/or the Destination chosen is a Fixed Length File, you can set the following properties:
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Using the Length Markers window, you can create the layout of your fixed-length file, The Length Markers window has a ruled marker placed at the top of the window. To insert a field length marker, you can click in the window at a particular point. For example, if you want to set the length of a field to contain five characters and the field starts at five, then you need to click at the marker position nine.
In case the records don’t have a delimiter and you rely on knowing the size of a record, the number in the RecordLength box is used to specify the character length for a single record.
You can delete a field length marker by clicking the marker.
Source or Destination is an XML file
If the source is an XML file, you can set the following options:
Source File Path specifies the file path of the source XML file.
Schema File Path specifies the file path of the XML schema (XSD file) that applies to the selected source XML file.
Record Filter Expression allows you to optionally specify an expression used as a filter for incoming source records from the selected source XML file. The filter can refer to a field or fields inside any node inside the XML hierarchy.
The following options are available for destination XML files.
Destination File Path specifies the file path of the destination XML file.
Encoding - Allows you to choose the encoding scheme for the XML file from a list of choices. The default value is Unicode (UTF-8).
Format XML Output instructs Astera to add line breaks to the destination XML file for improved readability.
Read From Schema File specifies the file path of the XML schema (XSD file) that will be used to generate the destination XML file.
Root Element specifies the root element from the list of the available elements in the selected schema file.
Generate Destination XML Schema Based on Source Layout creates the destination XML layout to mirror the layout of the source.
Root Element specifies the name of the root element for the destination XML file.
Generate Fields as XML Attributes specifies that fields will be written as XML attributes (as opposed to XML elements) in the destination XML file.
Record Node specifies the name of the node that will contain each record transferred.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.
When importing from a fixed-width, delimited, or Excel file, you can specify the following advanced reading options:
Header Spans x Rows - If your source file has a header that spans more than 1 row, select the number of rows for the header using this control.
Skip Initial Records - Sets the number of records which you want skipped at the beginning of the file. This option can be set whether or not your source file has a header. If your source file has a header, the first record after the specified number of rows to skip will be used as the header row.
Raw Text Filter - Only records starting with the filter string will be imported. The rest of the records will be filtered.
You can optionally use regular expressions to specify your filter. For example, the regular expression ^[12][4] will only include records starting with 1 or 2, and whose second character is 4.
Note: Astera supports Regular Expressions implemented with the Microsoft .NET Framework and uses the Microsoft version of named captures for regular expressions.
Raw Text Filter setting is not available for Excel source files.
If your source is a fixed-length file, delimited file, or Excel spreadsheet, it may contain an optional header row. A header row is the first record in the file that specifies field names and, in the case of a fixed-length file, the positioning of fields in the record.
If your source file has a header row, you can specify how you want the system to handle the differences between your actual source file, and the source layout specified in the setting. Differences may arise due to the fact that the source file has a different field order from what is specified in the source layout, or it may have extra fields compared to the source layout. Conversely, the source file may have fewer fields than what is defined in the source layout, and the field names may also differ, or may have changed since the time the layout was created.
By selecting from the available options, you can have Astera handle those differences exactly as required by your situation. These options are described in more detail below:
Enforce exact header match – Lets Astera Data Stack proceed with the transfer only if the source file’s layout matches the source layout defined in the setting exactly. This includes checking for the same number and order of fields and field names.
Columns order in file may be different from the layout – Lets Astera Data Stack ignore the sequence of fields in the source file, and match them to the source layout using the field names.
Column headers in file may be different from the layout – This mode is used by default whenever the source file does not have a header row. You can also enable it manually if you want to match the first field in the layout with the first field in the source file, the second field in the layout with the second field in the source file, and so on. This option will match the fields using their order as described above even if the field names are not matched successfully. We recommend that you use this mode only if you are sure that the source file has the same field sequence as what is defined in the source layout.
The Field Layout window is available in the properties of most objects on the dataflow to help you specify the fields making up the object. The table below explains the attributes you can set in the Field Layout window.
The table below provides a list of all the attributes available for a particular layout type.
Astera supports a variety of formats for each data type. For example, for Dates, you can specify the date as “April 12” or “12-Apr-08”. Data Formats can be configured independently for source and for destination, giving you the flexibility to correctly read source data and change its format as it is transferred to destination.
If you are transferring from a flat file (for example, Delimited or Fixed-Width), you can specify the format of a field so that the system can correctly read the data from that field.
If you do not specify a data format, the system will try to guess the correct format for the field. For example, Astera is able to correctly interpret any of the following as a Date:
April 12
12-Apr-08
04-12-2008
Saturday, 12 April 2008
and so on
Astera comes with a variety of pre-configured formats for each supported data type. These formats are listed in the Sample Formats section below. You can also create and save your own data formats.
To select a data format for a source field, go to Source Fields and expand the Format dropdown menu next to the appropriate field.
Sample Formats
Dates:
Booleans:
Integers:
Real Numbers:
Numeric Format Specifiers:
The COBOL File Source object holds the ability to fetch data from a COBOL source file if the user has the workbook file available. The data present in this file can then be processed further in the dataflow and then written to a destination of your choice.
Expand the Sources section of the Toolbox and select the COBOL Source object.
Drag-and-drop the COBOL Source object onto the dataflow. It will appear like this:
By default, the COBOL Source object is empty.
To configure it according to your requirements, right-click on the object and select Properties from the context menu.
Alternatively, you can open the properties window by double-clicking on the COBOL Source object header.
The following is the properties tab of the COBOL Source object.
File Path: Clicking on this option allows you to define a path to the data file of a COBOL File.
Note: File Path registers files with extensions of .dat and .txt (Additionally, it can also register files with an .EBC extension)
For our use case, we will be using a sample file with an .EBC extension.
Encoding: This drop-down option allows us to select the encoding from multiple options.
In this case, we will be using the IBM EBCDIC (US-Canada) encoding.
Record Delimiter: This allows you to select the kind of delimiter from the drop-down menu.
(Carriage Return): Moves the cursor to the beginning of the line without advancing to the next line.
(Line Feed): Moves the cursor down to the next line without returning to the beginning of the line.
<CR><LF>: Does both.
For our use case, we have selected the following.
Copybook: This option allows us to define a path to the schema file of a COBOL File.
Note: Copybook registers files with the extensions of .txt and .cpy
For our use case, we are using a file with the .cpy extension.
Next, three checkboxes can be configured according to the user application. There is also a Record Filter Expression field given under these checkboxes.
Ignore Line Numbers at Start of Lines: This option is checked when the data file has incremental values. It is going to ignore line numbers at the start of lines.
Zone Decimal Sign Explicit: Controls whether there is an extra character for the minus sign of a negative integer.
Fields with COMP Usage Store Data in a Nibble: Checking this box will ignore the COMP encryption formats where the data is stored.
COMP formats range from 1-6 in COBOL Files.
Record Filter Expression: Here, we can add a filter expression that we wish to apply to the records in the COBOL File.
On previewing output, the result will be filtered according to the expression.
Once done with this configuration, click Next, and you will be taken to the next part of the properties tab.
The COBOL Source Layout window lets the user check values which have been read as an input.
Expand the Source node, and you will be able to check each of the values and records that have been selected as an input.
This gives the user data definition and field details on further expanding the nodes.
Once these values have been checked, click Next.
The Config Parameters window will now open. Here, you can further configure and define parameters for the COBOL Source Object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next.
Now, a new window, General Options, will appear.
Here, you can add any Comments that you wish to add. The rest of the options in this window have been disabled for this object.
Once done, click OK.
The COBOL Source object has now been configured. The extracted data can now be transformed and written to various destinations.
This concludes our discussion on the COBOL Source Object and its configuration in Astera Data Stack.
The Excel File Source object in Astera supports all formats of Excel. In this article, we will be discussing:
Various ways to get the Excel Workbook Source object on the dataflow designer.
Configuring the Excel Workbook Source object according to our required layout and settings.
In this section, we will cover the various ways to get an Excel Workbook Source object on the dataflow designer.
To get an Excel File Source object from the Toolbox, go to Toolbox > Sources > Excel Workbook Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Excel Workbook Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet. We will discuss the configuration properties for the Excel Workbook Source object in the next section.
If you already have a project defined and excel source files are a part of that project, you can directly drag-and-drop the excel file sources from the project tree onto the dataflow designer. The Excel File Source objects in this case will already be configured. Astera Data Stack detects the connectivity and layout information from the source file itself.
Note: In this case we are using an excel file with Customers data. The file is a part of an existing project folder.
To get an Excel File Source object from the Project Explorer, go to the Project Explorer window and expand the project tree.
Select the Excel file you want to bring in as the source and drag-and-drop it on the designer. In this case, we are working with Customers -Excel Source.xls file so we will drag-and-drop it onto the designer.
If you expand the dropped object, you will see that the layout for the source file is already built. You can even preview the output at this stage.
To get an Excel Workbook Source directly from the file location, open the folder containing the Excel file.
Drag-and-drop the Excel file from the folder onto the designer in Astera.
If you expand the dropped object, you will see that the layout for the source file is already built. You can even preview the output at this stage.
To configure the Excel Workbook Source object, right-click on its header and select Properties from the context menu.
As soon as you have selected the Properties option from the context menu, a dialog box will open.
This is where you can configure your properties for the Excel Workbook Source object.
The first step is to provide the File Path for the excel source. By providing the file path, you are building the connectivity to the source dataset.
Note: In this case we are going to be using an excel file with sample Customers data.
The dialog box has some other configuration options:
If your source file contains headers and you want your Astera source layout to read headers from the source file, check the File Contains Header box.
If you have blank rows in your file, you can use the Consecutive Blank Rows to Indicate End of File option to specify the number of blank rows that will indicate the end of the file.
Use the Worksheet option to specify if you want to read data from a specific worksheet in your excel file.
In the Start Address option, you can indicate the cell value from where you want Astera to start reading the data.
Advanced File Options
In the Header spans over option, give the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
Check the Enforce exact header match option if you want the header to be read as it is.
Check the Column order in file may be different from the layout option if the field order in your source layout is different from the field order in Astera layout.
Check on Column headers in file may be different from the layout if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Astera layout. You can create a Synonym Dictionary file to store the values for alternate headers. You can also use Synonym Dictionary file to facilitate automapping between objects that use alternate names in field layouts.
String Processing
String processing options come in use when you are reading data from a file system and writing it to a database destination.
Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Astera will omit those accordingly in the output.
Check the Trim strings option when you want to omit any extra spaces in the field value.
Once you have specified the data reading options on this screen, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of your Excel source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column) and double-click on it. A blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line and select Delete from the context menu.
This will Delete the entire row from the layout.
Note: Modifying the layout (adding or deleting fields) in the Layout Builder window in Astera will not make any changes to the actual source file. The layout is specific to Astera only.
If you want to change the position of any field and want to move it below or above another field in the layout, you can do this by selecting the row and using Move up/move down keys.
Note: You will find the Move up/Move down icons on the top left of the Layout Builder.
For example: To move the Country field right below the Region field, we will select the row and use the Move up key to move this field from the 9th row to the 8th.
Other options that the Layout Builder provides:
After you are done customizing the Object Builder, click Next. You will be taken to a new window,Config Parameters. Here, you can further configure and define parameters for the Excel source.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Once you have been through all the configuration options, click OK.
The ExcelSource object is now configured according to the changes made.
You have successfully configured your Excel Workbook Source object. The fields from the source object can now be mapped to other objects in the dataflow.
Delimited files are one of the most commonly used data sources and are used in a variety of situations. The Delimited File Source object in Astera provides the functionality to read data from a delimited file.
In this article, we will cover how to use a Delimited File Source object.
To get a Delimited File Source object from the Toolbox, go to Toolbox > Sources > Delimited File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Delimited File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Delimited File Source object, right-click on its header and select Properties from the context menu.
As soon as you have selected the Properties option from the context menu, a dialog box will open.
This is where you can configure the properties for the Delimited File Source object.
The first step is to provide the File Path for the delimited source file. By providing the file path, you are building the connectivity to the source dataset.
Note: In this case, we are going to be using a delimited file with sample Orders data. This file works with the following options:
File Contains Headers
Record Delimiter is specified as CR/LF:
The dialog box has some other configuration options:
If the source file contains headers, and you want Astera to read headers from the source file, check the File Contains Header option.
If you want your file to be read in portions, upon selecting the Partition File for Reading option, Astera will read your file according to the specified Partition Count. For instance, if a file with 1000 rows has a Partition Count of 2 specified, the file will be read in two partitions of 500 each. This is a back-end process that makes data reading more efficient and helps in processing data faster. This will not have any effect on your output.
The Record Delimiter field allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination <CR/LF>, carriage-return - CR and line-feed - LF. You can also type the record delimiter of your choice instead of choosing from the available options.
In case the records do not have a delimiter and you rely on knowing the size of a record, the number in the Record Length field can be used to specify the character length for a single record.
The Encoding field allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8).
A Text Qualifier is a symbol that identifies where text begins and ends. It is used specifically when importing data. For example, if you need to import a text file that is comma delimited (commas separate the different fields that will be placed in adjacent cells).
To define a hierarchical file layout and process the data file as a hierarchical file, check the This is a Hierarchical File option. Astera IDE provides extensive user interface capabilities for processing hierarchical structures.
Use the Null Text option to specify a certain value that you do not want in your data, and instead want it to be replaced by a null value.
Check the Allow Record Delimiter Inside a Field Text option when you have the record delimiter as text inside your data and want that to be read as it is.
Advanced File Options
In the Header spans over field, specify the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
Check the Enforce exact header match option if you want the header to be read as it is.
Check the Column order in file may be different from the layout option, if the field order in your source layout is different from the field order in Astera’s layout.
Check the Column headers in file may be different from the layout option if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Astera’s layout. You can create a Synonym Dictionary file to store values for alternate headers. You can also use the Synonym Dictionary file to facilitate automapping between objects on the flow diagram that use alternate names in field layouts.
To skip any unwanted rows at the beginning of your file, you can specify the number of records that you want to omit through the Skip initial records option.
Raw text filter
If you do not want to apply any filter and process all records, check No filter. Process all records.
If there is a specific value which you want to filter out, you can check the Process if begins with option and give the value that you want Astera to read from the data, in the provided field.
If there is a specific expression which you want to filter out, you can check the Process if matches this regular expression option and give the expression that you want Astera to read from the data, in the provided field.
String Processing
String Processing options come in use when you are reading data from a file system and writing it to a database destination.
Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Astera will omit those accordingly in the output.
Check the Trim strings option when you want to omit any extra spaces in the field value.
Once you have specified the data reading options on this window, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of the delimited source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear where you will have the option to Delete.
Selecting this option will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
Note: Modifying the layout (adding or deleting fields) from the Layout Builder in Astera will not make any changes to the actual source file. The layout is specific to Astera only.
After you are done customizing the layout, click Next. You will be directed to a new window, Config Parameters. Here, you can define parameters for the Delimited File Source object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Once you have configured the source object, click OK.
The Delimited File Source object is now configured according to the changes made.
The Delimited File Source object has now been modified from its previous configuration. The new object has all the modifications that were made in the builder.
In this case, the modifications that were made are:
Added the CustomerName column.
Deleted the ShipCountry column.
You have successfully configured your Delimited File Source object. The fields from the source object can now be mapped to other objects in the dataflow.
Note: To open the source file for editing in a new tab, click icon next to the File Path input, and select Edit File.
To generate the schema, click icon next to the Schema File Path input, and select Generate.
To edit an existing schema, click icon next to the Schema File Path input, and select Edit File. The schema will open for editing in a new tab.
Note: Astera makes it possible to generate an XSD file from the layout of the selected source XML file. This feature is useful when you don’t have the XSD file available. Note that all fields are assigned the data type of String in the generated schema. To use this feature, expand the control and select Generate.
Attribute | Description |
---|---|
Layout Type | Attributes Available |
---|---|
To open the Data Formats window, click icon located in the Toolbar at the top of the designer.
Format | Sample Value |
---|---|
Format | Sample Value |
---|---|
Format | Sample Value |
---|---|
Format | Sample Value |
---|---|
Format specifier | Name | Description |
---|---|---|
Column Name
Description
Alternate Header
Assigns an alternate header value to the field.
Data Type
Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean.
Allows Null
Controls whether the field allows blank or NULL values in it.
Output
The output checkbox allows you to choose whether or not you want to enable data from a particular field to flow through further in the dataflow pipeline.
Calculation
Defines functions through expressions for any field in your data.
Name
The system pre-fills this item for you based on the field header. Field names do not allow spaces. Field names are used to refer to the fields in the Expression Builder or tools where a field is used in a calculation formula.
Header
Represents the field name specified in the header row of the file. Field headers may contain spaces.
Data Type
Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean.
Format
Specifies the format of the values stored in that field, depending on the field’s data type. For example, for dates you can choose between DD-MM-YY, YYYY-MM-DD, or other available formats.
Start Position
Specifies the position of the field’s first character relative to the beginning of the record. Note: This option is only available for fixed length layout type.
Length
Specifies the maximum number of characters allotted for a value in the field. The actual value may be shorter than what is allowed by the Length attribute. Note: This option is only available for fixed length and database layout types.
Column Name
Specifies the column name of the database table. Note: This option is only available in database layout.
DB Type
Specifies the database specific data type that the system assigns to the field based on the field's data. Each database (Oracle, SQL, Sybase, etc) has its own DB types. For example, Long is only available in Oracle for data type string. Note: This option is only available in database layout.
Decimal Places
Specifies the number of decimal places for a data type specified as real. Note: This option is only available in database layout.
Allows Null
Controls whether the field allows blank or NULL values in it.
Default Value
Specifies the value that is assigned to the field in any one of the following cases:- The source field does not have a value - The field is not found in the source layout- The destination field is not mapped to a source field. Note: This option is only available in destination layout.
Sequence
Represents the column order in the source file. You can change the column order of the data being imported by simply changing the number in the sequence field. The other fields in the layout will then be reordered accordingly.
Description
Contains information about the field to help you remember its purpose.
Alignment
Specifies the positioning of the field’s value relative to the start position of the field. Available alignment modes are LEFT, CENTER, and RIGHT. Note: This option is only available for fixed length layout type.
Primary Key
Denotes the primary key field (or part of a composite primary key) for the table. Note: This option is only available in database layout.
System Generated
Indicates that the field will be automatically assigned an increasing Integer number during the transfer. Note: This option is only available in database layout.
Source Delimited file and Excel worksheet
Name, Header, Data type, Format
Source Fixed Length file
Name, Header, Data type, Format, Start position, Length
Source Database Table and SQL query
Column name, Name, Data type, DB type, Length, Decimal places, Allows null
Destination Delimited file and Excel worksheet
Name, Header, Data type, Format, Allows null, Default value
Destination Fixed Length file
Sequence, Name, Header, Description, Data type, Format, Start position, Length, Allows null, Default value, Alignment
Destination Database Table
Column name, Name, Data type, DB type, Length, Decimal places, Allows null, Primary key, System generated
dd-MMM-yyyy
12-Apr-2008
yyyy-MM-dd
2008-04-12
dd-MM-yy
12-04-08
MM-dd-yyyy
04-12-2008
MM/dd/yyyy
04/12/2008
MM/dd/yy
04/12/08
dd-MMM-yy
12-Apr-08
M
April 12
D
12 April 2008
mm-dd-yyyy hh:mm:ss tt
04-12-2008 11:04:53 PM
M/d/yyyy hh:mm:ss tt
4/12/2008 11:04:53 PM
Y/N
Y/N
1/0
1/0
T/F
T/F
True/False
True/False
######
123456
####
1234
####;0;(####)
-1234
.##%;0;(.##%)
123456789000%
.##%;(.##%)
1234567800%
$###,###,###,###
$1,234,567,890,000
$###,###,###,##0
$1,234,567,890,000
###,###
123450
#,#
1,000
##.00
35
###,###.##
12,345.67
##.##
12.34
$###,###,###,###
$1,234,567,890,000
$###,###,###,##0
$1,234,567,890,000
.##%;(.##%);
.1234567800%
.##%;0;(.##%)
.12345678900%
0
Zero placeholder
If the value being formatted has a digit in the position where the '0' appears in the format string, then that digit is copied to the result string; otherwise, a '0' appears in the result string. The position of the leftmost '0' before the decimal point and the rightmost '0' after the decimal point determines the range of digits that are always present in the result string. The "00" specifier causes the value to be rounded to the nearest digit preceding the decimal, where rounding away from zero is always used. For example, formatting 34.5 with "00" would result in the value 35.
#
Digit placeholder
If the value being formatted has a digit in the position where the '#' appears in the format string, then that digit is copied to the result string. Otherwise, nothing is stored in that position in the result string. Note that this specifier never displays the '0' character if it is not a significant digit, even if '0' is the only digit in the string. It will display the '0' character if it is a significant digit in the number being displayed. The "##" format string causes the value to be rounded to the nearest digit preceding the decimal, where rounding away from zero is always used. For example, formatting 34.5 with "##" would result in the value 35.
.
Decimal Point
The first '.' character in the format string determines the location of the decimal separator in the formatted value; any additional '.' characters are ignored.
,
Thousand separator and number scaling
The ',' character serves as both a thousand separator specifier and a number scaling specifier. Thousand separator specifier: If one or more ',' characters is specified between two digit placeholders (0 or #) that format the integral digits of a number, a group separator character is inserted between each number group in the integral part of the output. Number scaling specifier: If one or more ',' characters is specified immediately to the left of the explicit or implicit decimal point, the number to be formatted is divided by 1000 each time a number scaling specifier occurs. For example, if the string "0,," is used to format the number 100 million, the output is "100".
%
Percentage placeholder
The presence of a '%' character in a format string causes a number to be multiplied by 100 before it is formatted. The appropriate symbol is inserted in the number itself at the location where the '%' appears in the format string.
E0E+0E-0e0e+0e-0
Scientific notation
If any of the strings "E", "E+", "E-", "e", "e+", or "e-" are present in the format string and are followed immediately by at least one '0' character, then the number is formatted using scientific notation with an 'E' or 'e' inserted between the number and the exponent. The number of '0' characters following the scientific notation indicator determines the minimum number of digits to output for the exponent. The "E+" and "e+" formats indicate that a sign character (plus or minus) should always precede the exponent. The "E", "E-", "e", or "e-" formats indicate that a sign character should only precede negative exponents.
'ABC'"ABC"
Literal string
Characters enclosed in single or double quotes are copied to the result string, and do not affect formatting.
;
Section separator
The ';' character is used to separate sections for positive, negative, and zero numbers in the format string. If there are two sections in the custom format string, the leftmost section defines the formatting of positive and zero numbers, while the rightmost section defines the formatting of negative numbers. If there are three sections, the leftmost section defines the formatting of positive numbers, the middle section defines the formatting of zero numbers, and the rightmost section defines the formatting of negative numbers.
Other
All other characters
Any other character is copied to the result string, and does not affect formatting.
The File System Items Source in Astera Data Stack is used to provide metadata information to a task in a dataflow or workflow. In a dataflow, it can be used in conjunction with a source object, especially in cases where you want to process multiple files through the transformation and loading process.
In a workflow, the File System Items Source object can be used to provide input paths to a subsequent object such as a RunDataflow task.
Let’s see how it works in a dataflow.
Here we have a dataflow that we want to run on multiple source files that contain Customer_Data from a fictitious organization. We are going to use the source object as a transformation and provide the location of the source files using a File System Items Source object. The File System Items Source will provide the path to the location where our source files reside and the source will object pick the source files from that location, one by one, and pass it on for further processing in the dataflow.
Here, we want to sort the data, filter out records of customers from Germany and write the filtered records into a database table. The source data is stored in delimited (.csv) files.
First, change the source object into a Transformation object. This is because the data is stored in multiple delimited files and we want to process all of them in the dataflow. For this, right-click on the source object’s header and click Transformation in the context menu.
You can see that the color of the source object has changed from green to purple which indicates that the source object has been changed into a transformation object.
Notice that the source object now has two nodes: Input and Output. The Input node has an input mapping port which means that it can take the path to the source file from another object.
Now we will use a File System Items Source object to provide a path to Customer_Data Transformation object. Go to the Sources section in the Toolbox and drag-and-drop the File System Items Source object onto the designer.
If you look at the File System Items Source object, you can see that the layout is pre-populated with fields such as FileName, FileNameWithoutExtension, Extension, FullPAth, Directory, ReadOnly, Size, and other attributes of the files.
To configure the properties of the File System Items Source object, right-click on the File System Items Source object’s header and go to Properties.
This will open the File System Properties window.
The first thing you need to do is point the Path to the directory or folder where your source files reside.
You can see a couple of other options on this screen:
Filter: If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “*.csv” in the Filter field to filter and read delimited files only.
Include items in subdirectories: Check this option if you want to process files present in the sub-directories
Include Entries for Directories: Check this option if you want to include all items in the specified directory
Once you have specified the Path and other options, click OK.
Now right-click on the File System Items Source object’s header and select Preview Output.
You can see that the File System Items Source object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the FileName, FileNameWithoutExtension, Extension, FullPath, Directory, and other attributes such as whether the file is ReadOnly, FileSize, LastAccessed, and other details in the output.
Now let’s start mapping. Map the FullPath field from the File System Items Source object to the FullPath field under the Input node in the Customer_Data Transformation object.
Once mapped, when we run the dataflow, the File System Items Source will pass the path to the source files, one by one, to the Customer_Data Transformation object. The Customer_Data Transformation object will read the data from the source file and pass it to the subsequent transformation object to be processed further in the dataflow.
In a workflow, the File System Items Source object can be used to provide input paths to a subsequent task such as a RunDataflow task. Let’s see how this works.
We want to design a workflow to orchestrate the process of extracting customer data stored in delimited files, sorting that data, filtering out records of customers from Germany and loading the filtered records in a database table.
We have already designed a dataflow for the process and have called this dataflow in our workflow using the RunDataflow task object.
We have multiple source files that we want to process in this dataflow. So, we will use a File System Items Source object to provide the path to our source files to the RunDataFlow task. For this, go to the Sources section in the Toolbox and drag-and-drop the File System Items Source onto the designer.
If you look at the File System Items Source, you can see that the layout is pre-populated with fields such as FileName, FileNameWithoutExtension, Extension, FullPAth, Directory, ReadOnly, Size, and other attributes of the files. Also, there is this small blue icon with the letter ‘s’, this indicates that the object is set to run in Singleton mode.
By default, all objects in a workflow are set to execute in Singleton mode. However, since we have multiple files to process in the dataflow, we will set the File System Items Source object to run in loop. For this, right-click on the File System Items Source and click Loop in the context menu.
You can see that the color of the object has changed to purple, and it now has this purple icon over the header which denotes the loop function.
It also has these two mapping ports on the header to map the File System Items Source object to the subsequent action in the workflow. Let’s map it to the RunDataflowTask.
To configure the properties of the File System Items Source, right-click on the File System Item Source object’s header and go to Properties.
This will open the File System Items Source Properties window.
The first thing you need to do is point the Path to the directory or folder where your source files reside.
You can see a couple of other options on this window:
Filter: If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “*.csv” in the Filter field to filter and read delimited files only.
Include items in subdirectories: Check this option if you want to process files present in the sub-directories.
Include Entries for Directories: Check this option if you want to include all items in the specified directory.
Once you have specified the Path and other options, click OK.
Now right-click on the File System Items Source object’s header and click Preview Output.
You can see that the File System Items Source object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the FileName, FileNameWithoutExtension, Extension, FullPath, Directory, and other attributes such as whether the file is ReadOnly, FileSize, LastAccessed, and other details in the output.
Now let’s start mapping. Map the FullPath field from the File System Items Source object to the FilePath variable in the RunDataflow task.
Once mapped, upon running the dataflow, the File System Items Source object will pass the path to the source files, one by one, to the RunDataflow task. In other words, the File System Items Source acts as a driver to provide source files to the RunDataflow tasks, which will then process them in the dataflow.
When the File System Items Source is set to run in a loop, the dataflow will run for ‘n’ number of times; where ‘n’ = the number of files passed by the File System Items Source to the RunDataflow task. For instance, you can see that we have six source files in the specified folder. The RunDataflow task object will pass these six files one by one to the RunDataflow task to be processed in the dataflow.
This concludes using the File System Items Source object in Astera Data Stack.
The SQL Query Source object enables you to retrieve data from a database using an SQL query or a stored procedure. You can specify any valid SELECT statement or a stored procedure call as a query. In addition, you can parameterize your queries dynamically, thereby allowing you to change their values at runtime.
In this article, we will be looking at how you can configure the SQL Query Source object and use it to retrieve data in Astera Data Stack.
Before moving on to the actual configuration, we will have to get the SQL Query Source object from the Toolbox onto the dataflow designer. To do so, go to Toolbox > Sources > SQL Query Source. In case you are unable to view the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the SQL Query Source object onto the designer.
The source object is currently empty as we have not configured it yet.
To configure the SQL Query Source object, right-click on its header and select Properties from the context menu. Alternatively, you can double-click the header of the source object.
A new window will pop up when you click on Properties in the context menu.
In this window, we will configure properties for the SQL Query Source object.
On this Database Connection window, enter information for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you want to connect to. The required credentials will vary according to your chosen provider.
Provide the required credentials. Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A separate window will appear, showing whether your test was successful. Close this window by clicking OK, and then, click Next.
The next window will present a blank page for you to enter your required SQL query. Here, you can enter any valid SELECT statement or stored procedure to read data from the database you connected to in the previous step.
The curly brackets located on the right side of the window indicate that the use of parameters is supported, which implies that you can replace a regular value with one that is parameterized and can be changed during runtime.
In this example, we will be reading the Orders table from the Northwind database.
Once you have entered the SQL query, click Next.
The following window will allow you to check or uncheck certain options that may be utilized while processing the dataset, if needed.
When checked, The Trim Trailing Spaces option will refine the dataset by removing extra whitespaces present after the last character in a line, up until the end of that line. This option is checked by default.
The Dynamic Layout option is unchecked by default. When checked, it will automatically enable two other sub-options.
o Delete Field In Subsequent Objects: When checked, this option will delete all fields that are present in subsequent objects.
o Add Fields In Subsequent Objects: When checked, this option will add fields that are present in the source object to subsequent objects.
Choose your desired options and click Next.
The next window is the Layout Builder. Here, you can modify the layout of the table that is being read from the database. However, these modifications will only persist within Astera and will not apply to the actual database table.
To delete a certain field, right-click on its serial column and select Delete from the context menu. In this example, we have deleted the OrderDate field.
To change the position of a field, click its serial column and use the Move up/Move down icons located in the toolbar of the window. In this example, we have moved up the EmployeeID field using the Move up icon, thus shifting the CustomerID field to the third row. You can move other fields up or down in a similar manner, allowing you to modify the entire order of the fields present in the table.
Once you are done customizing your layout, click Next.
In the Config Parameters window, you can define certain parameters for the SQL Query Source object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
Enter your desired values for these parameters, if any, and click Next.
Finally, aGeneral Options window will appear. Here, you are provided with:
A text box to add Comments.
A set of General Options that have been disabled.
To conclude the configuration, click OK.
You have successfully configured the SQL Query Source object. The fields are now visible and can be mapped to other objects in the dataflow.
Report Model extracts data from an unstructured file into a structured file format using an extraction logic. It can be used through the Report Source object inside dataflows in order to leverage the advanced transformation features in Astera Data Stack.
In this section, we will cover how to get the Report Source object onto the dataflow designer from the Toolbox.
To get a Report Source object from the Toolbox, go to Toolbox > Sources > Report Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Report Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Report Source object, right-click on its header and select Properties from the context menu.
A configuration window for Report Source will open.
First, provide the File Path of the unstructured file (your report) for which you have created a Report Model.
Then, specify the File Path for the associated Report Model.
Click OK, and the fields added in the extraction model will appear inside the Report Source object with a sub-node, Items_Info, in our case.
Right-click on the Report Source object’s header and select Preview Output from the context menu.
A Data Preview window will open and shows you the data extracted through the Report Model.
Adding an XML/JSON File Source object to a dataflow allows you to read and transfer data from an XML or a JSON file.
In this section, we will cover how to get an XML/JSON File Source object on the dataflow designer from the Toolbox.
To get an XML/JSON File Source from from the Toolbox, go to Toolbox > Sources > XML/JSON File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the XML/JSON File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the XML/JSON File Source object, right-click on the header and select Properties from the context menu.
When you select the Properties option from the context menu, a dialog box will open.
This is where you can configure properties for the XML/JSON File Source object.
The first step is to provide the File Path and Schema Location for the XML/JSON Source object. By providing the file path and schema, you are building the connectivity to the source dataset.
Check the JSON Format checkbox if your source file is a JSON.
Check the Provide Inner XML checkbox to get the XML markup representing only the child nodes of the parent node.
Note: In this case we are going to be using an XML/JSON file with Orders sample data in the parent node and Order Details sample data in the child node.
Once you have specified the data reading options in this window, click Next.
On the XML Layout window, you can view the layout of your XML/JSON source file.
After you are done viewing the layout, click Next. You will be taken to a new window, Config Parameters. Here, you can define the parameters for the XML/JSON File Source.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
After you have configured the source object, click OK.
You have successfully configured your XML/JSON File Source object. The fields from the source object can now be mapped to other objects in a dataflow.
The Fixed-Length File Source object in Astera provides a high-speed reader for files containing fixed length records. It supports files with record delimiters as well as files without record delimiters.
In this section, we will cover how to get Fixed Length File Source object on the dataflow designer from the Toolbox.
To get a Fixed Length File Source object from the Toolbox, go to Toolbox > Sources > Fixed Length File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Fixed Length File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Fixed Length File Source object, right-click on its header and select Properties from the context menu.
When you select the Properties option from the context menu, a dialog box will open.
This is where you configure the properties for Fixed Length File Source object.
The first step is to provide the File Path for the Fixed Length File Source object. By providing the File Path you are building the connectivity to the source dataset.
Note: In this case we are going to be using a fixed length file that contains Orders sample data. This file works with the following options:
File Contains Headers
Record Delimiter is specified as
The dialog box has some other configuration options:
If the source File Contains Header and you want the Astera source layout to read headers from the source file, check this option.
If you want the file to be read in portions, for instance, your file has data over 1000 rows, upon selecting Partition File for Reading, Astera will read your file according to the specified Partition Count. For example, a file with 1000 rows, with the Partition Count specified as 2, will be read in two partitions of 500 rows each. This is a back-end process that makes data reading more efficient and helps in processing data faster. This will not have any effect on your output.
Record Delimiter field allows you to select the delimiter for the records in the source file. The choices available are carriage-return line-feed combination , carriage-return and line-feed**. You can also type the record delimiter of your choice instead of choosing from the available options.
In case the records do not have a delimiter and you rely on knowing the size of a record, the number in the Record Length field is used to specify the character length for a single record.
The Encoding field allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Check the This is a COBOL data file option if you are working with COBOL files and do not have COBOL copybooks, you can still import this data by visually marking fields in the layout builder and specifying field data types. For more advanced parsing of COBOL files, you can use Astera’s COBOL File Source.
To define a hierarchical file layout and process the data file as a hierarchical file check the This is a Hierarchical File option. Astera IDE provides extensive user interface capabilities for processing hierarchical structures.
Advanced File Options
In the Header spans over field, give the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
Check the Enforce exact header match option if you want the header to be read as it is.
Check the Column order in file may be different from the layout option, if the field order in your source layout is different from the field order in Astera’s layout.
Check the Column headers in file may be different from the layout option if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Astera’s layout. You can create a Synonym Dictionary file to store the values for alternate headers. You can also use the Synonym Dictionary file to facilitate automapping between objects on the flow diagram that use alternate names in field layouts.
To skip any unwanted rows at the beginning of your file, you can specify the number of records that you want to omit through the Skip initial records option.
Raw text filter
If you do not want to apply any filter and process all records, check the No filter. Process all records option.
If there is a specific value which you want to filter out, you can check the Process if begins with option and specify the value that you want Astera to read from the data, in the provided field.
If there is a specific expression which you want to filter out, you can check the Process if matches this regular expression option and give the expression that you want Astera to read from the data, in the provided field.
String Processing
String processing options come in use when you are reading data from a file system and writing it to a database destination.
Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Astera will omit those accordingly in the output.
Check the Trim strings option when you want to omit any extra spaces in the field value.
Once you have specified the data reading options on this window, click Next.
The next window is the Length Markers window. You can put marks and specify the columns in your data.
Using the Length Markers window, you can create the layout of your fixed-length file. To insert a field length marker, you can click in the window at any point. For example, if you want to set the length of a field to contain five characters and the field starts at five, then you need to click at the marker position nine.
Note: In this case we are using a fixed length file with Orders sample data.
If you point your cursor to where the data is starting from, (in this case next to OrderID) and double-click on it, Astera will automatically detect columns and put markers in your data. Blue lines will appear as markers on the columns that will get detected.
You can modify the markers manually. To delete a marker, double-click on the column which has been marked.
In this case we removed the second marker and instead added a marker after CustomerID and EmployeeID.
In this way you can add as many markers as the number of columns/fields there are in the data set.
You can also use the Build from Specs feature to help you build destination fields based on an existing file instead of manually specifying the layout.
After you have built the layout by inserting the field markers, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of your fixed length source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
Note: Make sure to specify the length of the field that you have added in the properties of the field.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will open where you will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
Note: Modifying the layout (adding or deleting fields) from the Layout Builder in Astera will not make any changes to the actual source file. The layout is specific to Astera only.
Other options that the Layout Builder provides are:
After you are done customizing the layout in the Object Builder window, click Next. You will be taken to a new window, Config Parameters. Here, you can define parameters for the Fixed Length File Source.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Once you have been through all configuration options, click OK.
The FixedLengthFileSource object is now configured.
The Fixed Length File Source object has now been modified from its previous configuration. The new object has all the modifications that we specified in the Layout Builder.
In this case, the modifications that we made were:
Separated the EmployeeID column from the OrderDate column.
Added the CustomerName column.
You have successfully configured your Fixed Length File Source object. The fields from the source object can now be mapped to other objects in a dataflow.
Astera Data Stack gives the user the ability to use a MongoDB Source as part of the ETL pipeline. MongoDB is a fully cloud-based application data platform.
It is also a NoSQL platform that provides a mechanism of storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
It can be configured in accordance with the user application in Astera.
To start, drag and drop the MongoDB Source object from the Sources section of the toolbox onto the dataflow.
To configure the MongoDB Source object, right-click on it and select Properties from the context menu.
This will open the Properties screen.
User Name: The name of the user connecting to the MongoDB cluster.
Password: The password of the user connecting to the MongoDB cluster.
Primary Server Name: The address of the primary server cluster for connection.
Database: The database to be selected from the MongoDB server.
Authentication Database: The database used for authentication.
Enable set of replica: Allow the server to access the secondary cluster in case of unavailability of the primary server.
Use TLS: Check this option if you are using TLS authentication.
Secondary Server Name: The address of the secondary server cluster for connection.
Read Preference –
This drop-down menu allows the user to select which server to be given preference to while reading the data.
Primary: Means that data will only be read from the primary server.
Secondary: Means that data will only be fetched from the secondary server.
Primary Preferred: Means that preference will be given to the primary server but in case of its unavailability, it will fetch data from secondary server.
Secondary Preferred: Means that preference will be given to the secondary server but in case of its unavailability, it will fetch data from the primary server.
Nearest: Means that the preference will be given to the server closest to the connection in region and IP.
Note: You can also select some advanced connection info through the option next to Test Connection.
Once the credentials have been filled, you can test the connection by selecting Test Connection.
Once done, click Next and you will be led to the MongoDB Collection screen.
Here, you can pick a collection that you wish to fetch the data from using the Pick Collection drop-down menu.
Once the collection is selected, the layout will be built.
There are three ways to generate the layout,
Astera auto generates the layout with respect to the first 100 records by default.
The user can provide a JSON schema and Astera will generate the layout.
The user can manually create the layout.
Once the layout has been built, click Next and you will be led to the MongoDB Filter screen.
Here, you can provide a query to filter out your records based on some criteria.
Note: Functions cannot be applied to MongoDB fields in the filter screen and we cannot filter records based on a criterion dependent upon array type fields.
Click Next and you will be taken to the MongoDB SortBy screen.
Here, you can set the limit to fetch the specified number of records or provide a number to skip the first ‘n’ number of records.
You can also sort your data based on a single field or a collection of fields.
Note: We cannot sort the data based on array-type fields.
Click Next and you will be taken to the Config Parameters screen.
Parameters can provide deployment of flows by eliminating hardcoded values and can also provide a dynamic way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next and you will be faced with the General Options screen.
Here, you can add any comments that you wish to add.
Click OK and the MongoDB Source object will be configured.
The source data can now be further used in an ETL pipeline with transformation or destination objects.
This concludes our discussion on the MongoDB Source object in Astera Data Stack.
A PDF Form Source object provides users with the functionality of extracting data from a fillable PDF document. A fillable PDF document comprises of certain data points or digital fields which are editable by a user using any modern PDF viewers. They are often used instead of official documents on the web. The PDF Form Source object detects those points, extracts the written data, and creates relative fields for them.
In this article, we will explore how to make use of the PDF Form Source object in Astera to retrieve data.
Note: This is a Scholarship Application Form with fillable data fields for Personal Information, Contact Details, and Education Qualifications.
Select the PDF Form Source object from the Toolbox and drag-and-drop it onto the dataflow designer.
Right-click on the PDF Form Source object’s header and select the Properties option from the context menu.
A configuration window will open, as shown below.
Provide the File Path for the fillable PDF document.
Owner Password: If the file is protected, then enter the password that is configured by the owner of the fillable pdf document. If the file is not protected, this option can be left blank.
Use UTF-8 Encoding: Check this option if the file is UTF-8 i.e., Unicode Transformation Format – 8-bit, encoded.
Click Next.
This is the Layout Builder window, where you can see the data fields extracted from the fillable PDF document. Click Next.
This is the Config Parameters window. Click Next.
This is the General Options window. Click OK.
Right-click on the PDF Form Source object’s header and select Preview Output from the context menu.
View the data through the Data Preview window.
The data is now available for mapping. For simplicity, we will delete the non-required data fields and store the output in a separate file. To store the data, we must write it to a destination file.
We are using a Delimited Destination object. Drag-and-drop the Delimited Destination object onto the dataflow designer and map the fields from the PDF Form Source object to the destination object.
Right-click on the fields that you do not want to store and select the Remove Element option.
Note:
Do not delete the data fields from the PDF Form Source object, as it will disturb the layout that has been generated for the detected data fields.
Simply double-click or right-click on the Delimited Destination object’s header and select the Properties option from the context menu. Specify the File Path where you want to store the destination file. Click OK.
To preview the data, right-click on the destination object’s header and select Preview Output from the context menu.
Here, you can see data of the selected fields.
This is how a PDF Form Source object is used in Astera Data Stack to mine data point/digital fields from fillable PDF documents.
The Data Model Query object in Astera Data Stack allows you to extract multiple tables from a deployed data model. This is especially useful when you’re writing data to a fact table via the Fact Loader object, since the fact table contains attributes from multiple source tables.
In this article, we’ll be looking at how you can configure the Data Model Query object and use it to extract data from a source model.
Let’s assume that we have the following source model.
In this example, we’ll extract all of these tables as a source via the Data Model Query object.
To get the Data Model Query object from the toolbox, go to Toolbox > Sources > Data Model Query.
Drag and drop the Data Model Query object onto the dataflow designer.
The object is currently empty because we are yet to configure it.
To configure the Data Model Query object, right-click on its header and select Properties from the context menu. Alternatively, you can double-click on the object header.
A configuration window will pop up.
Using this window, you can configure the properties of the Data Model Query object.
On the Database Connection screen, you’ll notice that the Data Provider dropdown menu is limited to just one option: Astera Data Model. This option represents the data models that are deployed on the server and are available for usage.
Once you’ve provided your Astera Data Stack credentials and a server connection, you can select a deployed model from the Database dropdown menu.
Note: The default username is admin, and the default password is Admin123.
In this case, we’ll select DW_Source, which represents the source model that was shown earlier.
Once you’ve chosen a deployed model, click Next.
The Query Source Layout screen will appear.
On the Query Source Layout screen, you can select a root entity from a list of entities present in the source model, via the Root Entity dropdown menu.
The root entity serves as a starting point for a tree layout that includes all of the entities that you need to extract data from.
In this case, the root entity in the source data model is InvoiceLines.
Once you’ve chosen a root entity, a tree/hierarchical layout starting from the root entity will appear on the left side of the screen. You can expand the inner nodes to reveal the fields present in other entities of the source model.
Checking the Allow Collections option will enable collection nodes in the tree layout.
In the Where Clause textbox, you can add an optional SQL statement that will serve as a filter for incoming records.
Click OK once you’ve chosen a root entity. You’ve now configures the Data Model Query object. The tree layout, starting from the root entity, will be visible in the object.
The fields present in this layout can now be mapped further to other objects in the dataflow.
This concludes our discussion on the Data Model Query object.
Apache Parquet is a column storage file format used by Hadoop systems such as Pig, Spark, and Hive. The file format is language-independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension of .parquet.
The key features of Parquet with respect to Astera Data Stack are:
It offers the option of compression with a lesser size post-compression.
It encodes the data.
It stores data in a column layout.
In Astera Data Stack, you can use a Parquet file in which the cardinality of the data is maintained, i.e., all columns must have the same number of fields.
Note: There should only be one row for each data field.
Drag and drop the Parquet File Source from the Sources section of the Toolbox onto the dataflow designer.
Right-click on the Parquet File Source object and select Properties from the context menu.
This will open a new window.
Let’s have a look at the options present here.
File Location
File Path: This is where you will provide the path to the .parquet file.
Data Load Option
If you wish to control memory consumption and increase read time, then the Data Load option can be used.
Batch Size: This is where the size of each batch is defined.
Advanced File Processing: String Processing
Treat empty string as null value: Checking this will give a null value on every empty string.
Trim strings: Checking this box will trim the strings.
Once done, click Next and you will be led to the Layout Builder screen.
The layout will be automatically built. Otherwise, you can build it using the Build Layout from layout spec option at the top of the screen.
Once done, click Next and you will be taken to the Config Parameters screen.
This allows you to further configure and define dynamic parameters for the Parquet source file.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next and you will be taken to the General Options screen.
Here, you can add any comments that you wish to add.
Click OK and the Parquet File Source object will be configured.
You can now map these fields to other objects as part of the dataflow.
Integer
Time/Timestamp
Date
String
Float
Real
Decimal
Double
Byte Array
Guid
Base64
Integer96
Image
Hierarchy is not supported.
This concludes our discussion on the definition and configuration of the Parquet File Source object in Astera Data Stack.
You can also delete the data fields in the destination file by using the Layout Builder. Or map only the relevant fields onto the nodes of the destination object. You can refer to this to learn more about the Delimited Destination object.
Column Name
Description
Data Type
Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean.
Start Position
Specifies the position from where that column/field starts.
Length
Defines the length of a column/field.
Alignment
Specifies the alignment of the values in a column/field. The options provided are right, left, and center.
Allows Null
Controls whether the field allows blank or NULL values in it.
Expressions
Defines functions through expressions for any field in your data.