Setting Up Sources
Each source on the dataflow is represented as a source object. You can have any number of sources in the dataflow, and they can feed into zero or more destinations.
The following source types are supported by the dataflow engine:
Flat File Sources:
Tree File Sources:
Database Sources:
Data Model
All sources can be added to the dataflow by picking a source type on the Toolbox and dropping it on the dataflow. File sources can also be added by dragging-and-dropping a file from an Explorer window. Database sources can be drag-and-dropped from the Data Source Browser. For more details on adding sources to the dataflow, see Introducing Dataflows.
Flat File Sources
Delimited File
Adding a Delimited File Source object allows you to transfer data from a delimited file. An example of what a delimited file source object looks like is shown below.
To configure the properties of a Delimited File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Fixed-Length File
Adding a Fixed-Length File Source object allows you to transfer data from a fixed-length file. An example of what a Fixed-Length File Source object looks like is shown below.
To configure the properties of a Fixed-Length File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Excel File
Adding an Excel Workbook Source object allows you to transfer data from an Excel file. An example of what an Excel Workbook Source object looks like is shown below.
To configure the properties of an Excel Workbook Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Tree File Sources
COBOL File
Adding a COBOL File Source object allows you to transfer data from a COBOL file. An example of what a COBOL File Source object looks like is shown below.
To configure the properties of a COBOL File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
XML/JSON File
Adding an XML/JSON File Source object allows you to transfer data from an XML file. An example of what an XML/JSON File Source object looks like is shown below.
To configure the properties of an XML/JSON File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
General Properties window:
File Path – Specifies the location of the source XML file. Using UNC paths is recommended if running the dataflow on a server.
Schema File Path – Specifies the location of the XSD file controlling the layout of the XML source file.
Note: Astera can generate a schema based on the content of the source XML file. The data types will be assigned based on the source file’s content.
Optional Record Filter Expression – Allows you to enter an expression to selectively filter incoming records according to your criteria. You can use the Expression Builder to help you create your filter expression. For more information on using Expression Builder, see Expression Builder.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.
Database Sources
Database Table
Adding a Database Table Source object allows you to transfer data from a database table. An example of what a Database Table Source object looks like is shown below.
To configure the properties of a Database Table Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your source, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected source.
Pick Source Table window:
Select a source table using the Pick Table dropdown.
Select Full Load if you want to read the entire table.
Select Incremental Load Based on Audit Fields to perform an incremental read starting at a record where the previous read left off.
Incremental load based on Audit Fields is based around the concept of Change Data Capture (CDC), which is a set of reading and writing patterns designed to optimize large-scale data transfers by minimizing database writing in order to improve performance. CDC is implemented in Astera using Audit Fields pattern. The Audit Fields pattern uses create time or last update time to determine the records that have been inserted or updated since the last transfer and transfers only those records.
Advantages
Most efficient of CDC patterns. Only records that were modified since the last transfer are retrieved by the query thereby putting little stress on the source database and network bandwidth
Disadvantages
Requires update date time and/or create date time fields to be present and correctly populated
Does not capture deletes
Requires index on the audit field(s) for efficient performance
To use the Audit Fields strategy, select the Audit Field and an optional Alternate Audit Field from the appropriate dropdown menus. Also, specify the path to the file that will store incremental transfer information.
Where Clause window:
You can enter an optional SQL expression serving as a filter for the incoming records. The expression should start with the WHERE word followed by the filter you wish to apply.
For example, WHERE CreatedDtTm >= ‘2001/01/05’
General Options window:
The Comments input allows you to enter comments associated with this object.
SQL Query
Adding a SQL Query Source object allows you to transfer data returned by the SQL query. An example of what an SQL Query Source object looks like is shown below.
To configure the properties of a SQL Query Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your SQL Query, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected database.
SQL Query Source window:
Enter the SQL expression controlling which records should be returned by this source. The expression should follow SQL syntax conventions for the chosen database provider.
For example, select OrderId, OrderName, CreatedDtTm from Orders.
Source/Destination File Options
Source or Destination is a Delimited File
If your source or destination is a Delimited File, you can set the following properties
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Field Delimiter - Allows you to select the delimiter for the fields. The available choices are , and . You can also type the delimiter of your choice instead of choosing the available options.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Quote Char - Allows you to select the type of quote character to be used in the delimited file. This quote character tells the system to overlook any special characters inside the specified quotation marks. The options available are ” and ’.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Microsoft Excel Worksheet
If the Source and/or the Destination chosen is a Microsoft Excel Worksheet, you can set the following properties:
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Worksheet - Allows you to select a specific worksheet from the selected Microsoft Excel file.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Fixed Length File
If the Source and/or the Destination chosen is a Fixed Length File, you can set the following properties:
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Using the Length Markers window, you can create the layout of your fixed-length file, The Length Markers window has a ruled marker placed at the top of the window. To insert a field length marker, you can click in the window at a particular point. For example, if you want to set the length of a field to contain five characters and the field starts at five, then you need to click at the marker position nine.
In case the records don’t have a delimiter and you rely on knowing the size of a record, the number in the RecordLength box is used to specify the character length for a single record.
You can delete a field length marker by clicking the marker.
Source or Destination is an XML file
If the source is an XML file, you can set the following options:
Source File Path specifies the file path of the source XML file.
Schema File Path specifies the file path of the XML schema (XSD file) that applies to the selected source XML file.
Record Filter Expression allows you to optionally specify an expression used as a filter for incoming source records from the selected source XML file. The filter can refer to a field or fields inside any node inside the XML hierarchy.
The following options are available for destination XML files.
Destination File Path specifies the file path of the destination XML file.
Encoding - Allows you to choose the encoding scheme for the XML file from a list of choices. The default value is Unicode (UTF-8).
Format XML Output instructs Astera to add line breaks to the destination XML file for improved readability.
Read From Schema File specifies the file path of the XML schema (XSD file) that will be used to generate the destination XML file.
Root Element specifies the root element from the list of the available elements in the selected schema file.
Generate Destination XML Schema Based on Source Layout creates the destination XML layout to mirror the layout of the source.
Root Element specifies the name of the root element for the destination XML file.
Generate Fields as XML Attributes specifies that fields will be written as XML attributes (as opposed to XML elements) in the destination XML file.
Record Node specifies the name of the node that will contain each record transferred.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.
Advanced Flat-File Reading Options
When importing from a fixed-width, delimited, or Excel file, you can specify the following advanced reading options:
Header Spans x Rows - If your source file has a header that spans more than 1 row, select the number of rows for the header using this control.
Skip Initial Records - Sets the number of records which you want skipped at the beginning of the file. This option can be set whether or not your source file has a header. If your source file has a header, the first record after the specified number of rows to skip will be used as the header row.
Raw Text Filter - Only records starting with the filter string will be imported. The rest of the records will be filtered.
You can optionally use regular expressions to specify your filter. For example, the regular expression ^[12][4] will only include records starting with 1 or 2, and whose second character is 4.
Note: Astera supports Regular Expressions implemented with the Microsoft .NET Framework and uses the Microsoft version of named captures for regular expressions.
Raw Text Filter setting is not available for Excel source files.
Managing Differences between Source Layout and Source File
If your source is a fixed-length file, delimited file, or Excel spreadsheet, it may contain an optional header row. A header row is the first record in the file that specifies field names and, in the case of a fixed-length file, the positioning of fields in the record.
If your source file has a header row, you can specify how you want the system to handle the differences between your actual source file, and the source layout specified in the setting. Differences may arise due to the fact that the source file has a different field order from what is specified in the source layout, or it may have extra fields compared to the source layout. Conversely, the source file may have fewer fields than what is defined in the source layout, and the field names may also differ, or may have changed since the time the layout was created.
By selecting from the available options, you can have Astera handle those differences exactly as required by your situation. These options are described in more detail below:
Enforce exact header match – Lets Astera Data Stack proceed with the transfer only if the source file’s layout matches the source layout defined in the setting exactly. This includes checking for the same number and order of fields and field names.
Columns order in file may be different from the layout – Lets Astera Data Stack ignore the sequence of fields in the source file, and match them to the source layout using the field names.
Column headers in file may be different from the layout – This mode is used by default whenever the source file does not have a header row. You can also enable it manually if you want to match the first field in the layout with the first field in the source file, the second field in the layout with the second field in the source file, and so on. This option will match the fields using their order as described above even if the field names are not matched successfully. We recommend that you use this mode only if you are sure that the source file has the same field sequence as what is defined in the source layout.
Creating Field Layout
The Field Layout window is available in the properties of most objects on the dataflow to help you specify the fields making up the object. The table below explains the attributes you can set in the Field Layout window.
The table below provides a list of all the attributes available for a particular layout type.
Using Data Formats
Astera supports a variety of formats for each data type. For example, for Dates, you can specify the date as “April 12” or “12-Apr-08”. Data Formats can be configured independently for source and for destination, giving you the flexibility to correctly read source data and change its format as it is transferred to destination.
If you are transferring from a flat file (for example, Delimited or Fixed-Width), you can specify the format of a field so that the system can correctly read the data from that field.
If you do not specify a data format, the system will try to guess the correct format for the field. For example, Astera is able to correctly interpret any of the following as a Date:
April 12
12-Apr-08
04-12-2008
Saturday, 12 April 2008
and so on
Astera comes with a variety of pre-configured formats for each supported data type. These formats are listed in the Sample Formats section below. You can also create and save your own data formats.
To select a data format for a source field, go to Source Fields and expand the Format dropdown menu next to the appropriate field.
Sample Formats
Dates:
Booleans:
Integers:
Real Numbers:
Numeric Format Specifiers:
Last updated