Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Client Application Processor
Dual Core or greater (recommended); 2.0 GHz or greater
Server Application Processor
8 Cores or greater (recommended)
Repository Database
MS SQL Server 2008R2 or newer, or PostgreSQL v15 or newer for hosting repository database
Operating System - Client
Windows 10 (recommended)
Operating System - Server
Windows: Windows 10 or Windows Server 2012 or newer
Memory
Client: 4GB or greater (recommended)
Server: 8GB or greater (recommended)
32GB or greater for large data processing
32GB or greater for AI processing
Hard Disk Space
Client: 300 MB – (if .NET Framework is pre-installed)
Server: 700 MB – (if .NET Framework is pre-installed)
Additional 300 MB if .NET Framework is not installed
AI Subscription Requirements
OpenAI API (provided as part of the package)
LLAMA API
Together AI
Postgres Db (if knowledgebase is needed)
Other
Requires AST .NET Core 8.0.x Windows and Desktop Runtime 8.0.x for the client, .NE.NET Core Runtime 8.0.x for the server
In order for advanced features such as AGL, OCR and Text Converter to work in Astera, different packages are required to be installed, such as Python, Java etc. To avoid the tedious process of separately installing these, Astera provides a built in Install Manager in your tool.
There are two types of packages which are required to be installed:
Prerequisites for Python Server: This package is required to be installed on server machine only.
Prerequisites for ReportMiner: This package is required to be installed on client and server machine.
The packages being installed for AGL are listed as follows:
The packages being installed for OCR are listed as follows:
Python packages
The packages being installed for Python Server are listed as follows:
Python Server executable is installed (comes with all packages necessary for Python Server)
In the following documents, we will look at how to use the install manager to install these packages on client and server machines.
Open Astera as an administrator.
Once Astera is open, go to the Tools > Run Install Manager.
The Install Manager welcome window will appear. Click on Next.
If the prerequisite packages are already installed, the Install Manager will inform you about them and give you the option to uninstall or update them as needed.
If the prerequisite packages are not installed, then the Install Manager will present you with the option to install them. Check the box next to the pre-requisite package, and then click on Install.
During the installation, the Install Manager window will display a progress bar showing the installation progress.
You can also cancel the installation at any point if necessary.
Once the installation is complete, the Install Manager will prompt you. Click on Close to exit out of the Install Manager.
The packages for AGL and OCR usage are now installed, and the features are ready to use.
In case the Integration Server is installed on a separate machine, we will need to install the packages for AGL and OCR there as well.
In order to access the install manager on the sever machine, open start and search for “Install Manager for Integration Server”.
Run this Install Manager as admin.
The Install Manager welcome window will appear. Click on Next.
If the prerequisite packages are already installed, the Install Manager will inform you about them and give you the option to uninstall or update them as needed.
If the prerequisite packages are not installed, then the Install Manager will present you with the option to install them. Check the box next to the pre-requisite package, and then click on Install.
During the installation, the Install Manager window will display a progress bar showing the installation progress.
You can also cancel the installation at any point if necessary.
Once the installation is complete, the Install Manager will prompt you. Click on Close to exit out of the Install Manager.
This concludes our discussion on how to use the install manager for Astera.
After you have successfully installed Astera client and server applications, open the client application and you will see the Server Connection screen as pictured below.
Enter the Server URI and Port Number to establish the connection.
The server URI will be the IP address of the machine where Astera Integration server is installed.
Server URI: (HTTPS://IP_address)
The default port for the secure connection between the client and the Astera Integration server is 9264.
If you have connected to any server recently, you can automatically connect to that server by selecting that server from the Recently Used drop-down list.
Click Connect after you have filled out the information required.
The client will now connect to the selected server. You should be able to see the server listed in the Server Explorer tree when the client application opens.
To open Server Explorer go to Server > Server Explorer or use the keyboard shortcut Ctrl + Alt + E.
Before you can start working with the Astera client, you will have to create a repository and configure the server.
You can connect to different servers right from the Server Explorer window in the Client. Go to the Server Explorer window and click on the Connect to Server icon.
A prompt will appear that will confirm if you want to disconnect from the current Server and establish connection to a different server. Click Yes to proceed.
You will be directed to the Server Connection screen. Enter the required server information (Server URI and Port Number) to connect to the server and click Connect.
If the connection is successfully established, you should be able to see the connected server in the Server Explorer window.
Before you start using the Astera server, a repository must be set up. Astera supports SQL Server and PostgreSQL for building cluster databases, which can then be used for maintaining the repository. The repository is where job logs, job queues, and schedules are kept.
To see these options, go to Server > Configure > Step 1: Build repository database and configure server.
The first step is to point to the SQL Server or PostgreSQL instance where you want to build the repository and provide the credentials to establish the connection.
Go to Server > Configure > Step 1: Build repository database and configure server.
Select SQL Server from the Data Provider drop-down list and provide the credentials for establishing the connection.
From the drop-down list next to the Database option, select the database on the SQL instance where you want to host the repository.
Click Test Connection to test whether the connection is successfully established or not. You should be able to see the following message if the connection is successfully established.
Click OK to exit out of the test connection window and again click OK, the following message will appear. Select Yes to proceed.
The repository is now set up and configured with the server to be used.
The next step is to log in using your credentials.
Go to Server > Configure > Step 1: Build repository database and configure server.
Select PostgreSQL from the Data Provider drop-down list and provide the credentials for establishing the connection.
From the drop-down list next to the Database option, select the database on the PostgreSQL instance where you want to host the repository.
Click Test Connection to test whether the connection is successfully established or not. You should be able to see the following message if the connection is successfully established.
Click OK and the following message will appear. Select Yes to proceed.
The repository is now set up and configured with the server to be used.
The next step is to log in using your credentials.
In this section we will discuss how to install and configure Astera Server and Client applications.
Run ‘IntegrationServer.exe’ from the installation package to start the server installation setup.
Astera Software License Agreement window will appear; check I agree to the license terms and conditions checkbox, then click Install.
Run the ‘ReportMiner’ application from the installation package to start the client installation setup.
Astera Software License Agreement window will appear; check I agree to the license terms and conditions checkbox, then click Install.
When the installation is successfully completed, click Close.
is installed
with the following packages:
is installed
Your server installation has been completed. If you want to use advanced features such as OCR, Text Converter, etc, click on Install Python Server and follow the steps .
The license key provided to you contains information about how many Astera clients can connect to a single server as well as the functionality available to the connected clients.
After you have configured the server, and logged in with the admin credentials, the last step is to insert your license key.
Go to Server > Configure > Step 4: Enter License Key.
On the License Management window, click on Unlock using a key.
Enter the details to unlock Astera – Name, Organization, and Product Key and select Unlock.
You’ll be shown the message that your license has been successfully activated.
Your client is now activated. To check your license status, you can go to Tools > Manage Server License.
This opens a window containing information about your license.
This concludes unlocking Astera client and server applications using a single licensing key.
ReportMiner 11.1 marks our step into the realm of AI, empowering users to harness the full potential of data extraction and processing with newly added AI capabilities. From user interface improvements to advanced AI-powered features, this release sets a new standard for template-less data extraction. Template-less data extraction uses AI and machine learning to pull data from documents without the need to use pre-configured templates, making it easier to handle different and unorganized document layouts.
ReportMiner 11.1 sets a new benchmark in efficiency and usability, seamlessly integrating AI into your data workflows.
Elevate your data journey with ReportMiner 11.1 – where performance, visibility, and efficiency converge effortlessly, all within an intuitive drag-and-drop interface.
LLM Generate is a core component of Astera’s AI capabilities, enabling the creation of AI-powered solutions when combined with other objects on the flow, including sources, transformations, and destinations. It retrieves an output from a Large Language Model based on a user-defined input prompt, with support for various LLM providers such as OpenAI, Llama, and custom models.
The object features:
Input Port: Maps fields to be included in the prompt for the LLM model.
Output Port: Contains the generated result from the LLM model.
LLM Generate’s flexibility in processing input and generating output through natural language instructions makes it a versatile and powerful tool in your data pipeline. Numerous use cases are made possible thanks to this new object, which will be explored in the product documentation.
The Text Converter object is another addition to the dataflow’s toolbox, which enables users to extract text from various file formats, including documents, images, and scanned files. It enhances text extraction performance using Optical Character Recognition (OCR) technology. Currently, the Text Processor supports Google OCR, PaddleOCR (Beta), TesseractOCR (Beta), and TextractOCR (Beta) platforms.
Key conversion features include:
Document to Text: Extract text from PDFs, Doc/DocX, and TXT files.
Image to Text: Use OCR to extract text from image formats such as JPG, PNG, and JPEG.
HTML to Text: Extract text from HTML, HTM, and XHTML files.
Markdown to Text: Extract text from MD, MARKDOWN, MKD, MKDN, MDWN, and MDOWN files.
Excel to Text: Extract text from XLS, XLSX, and CSV files.
Astera ReportMiner's Auto-Generate Layout (AGL) feature uses AI to automatically identify data regions and fields in your source document, making it easier to create layouts in different document types and extract data.
Normally, creating a template can take over 10 minutes, but with the AGL feature, it could take as little as 5 seconds. This feature also checks the extracted data for accuracy identifying any errors or issues that may require user’s review.
This helps speed up data extraction with less manual work involved, allowing the user to focus on other tasks.
To enable advanced extraction features, such as the Text Converter, the Python server is required. As part of the installation, the Python server is embedded within the integration server in v11.1, and it runs seamlessly as a component within the server. Python server activation is required to utilize the Text Converter object within the tool.
The Install Manager now includes the Python Server installation as well for features like Text Converter and more. The Python Server must be installed on the same machine as the Integration Server, and the Install Manager will launch automatically if selected during the installation. If the user doesn't open the install manager upon server installation (via Wix installer), they can launch it from the Start menu later on.
With the WiX installer, users can customize their installation by choosing the installation directory and modifying the service port for the server installer, offering greater flexibility during setup compared to the previous versions. The v11.1 installation is designed to run alongside any pre-11 versions, so you can test your existing flows on either version side-by-side while transitioning to the new version. While we recommend setting up a new repository database for your new 11.1 installation, upgrading an existing repository is also supported.
This concludes ReportMiner 11.1 release notes.
In some cases, it may be necessary to supply a license key without prompting the end user to do so. For example, in a scenario where the end user does not have access to install software, a systems administrator may do this as part of a script.
One possible solution is to place the license key in a text file. This way, the administrator can easily license each machine without having to go through the licensing prompt for each user.
Here’s a step-by-step guide to supplying a license key without prompting the user:
To get started, create a new text document that will hold the license key required to access the application.
In the text document, enter a valid license key. The key must be the only thing in the document, and it must be on the very first line. Make sure there are no unnecessary leading or trailing spaces, lines, or any characters other than those of the license key.
Name the text document “Serial” and save it in the Integration Server Folder of the application located in Program Files on your PC. For instance, if the application is Astera, save the Text Document in the “Astera Integration Server 10” folder. This folder contains the files and settings for the server application. The directory path would be as follows:
C:\Program Files\Astera Software\Astera Integration Server 10.
Finally, restart the Astera Integration Server 10 service to complete the process. This step ensures that, from now on, when the user launches Astera or any other application by Astera, they will not be prompted to enter a license key.
Also, please keep in mind that all license restrictions are still in effect, and this process only bypasses the user prompt for the key.
In conclusion, by following these simple steps, system administrators can easily supply a license key without prompting the end user. This approach is particularly useful when installing software remotely or when licensing multiple machines.
The Python server is embedded in the Astera server and is required to use the Text Converter object in the tool. It is disabled by default, and this document will guide us through the process of enabling it.
Launch the client and navigate to Server > Manage > Server Properties.
In the Server Properties, check the Start Python Server checkbox and press Ctrl + S to save your changes.
Now open the Start Menu > Services and restart the service of Astera Integration Server 11.1.
After restarting the service, wait for a few minutes and run cmd as administrator.
Typenetstat -ano | findstr :5001
command in the command prompt to check if your python server is running.
We’ve successfully enabled the Python server. You can now close this window and use any python server dependent features in the Client.
Astera Data Stack is built on a client-server architecture. The client is the part of the application which a user can run locally on their machine, whereas the server performs processing and querying requested by the client. In simple words, the client sends a request to the server, and the server, in turn, responds to the request. Therefore, database drivers are installed only on the Astera Data Stack server. This enables horizontal scaling by adding multiple clients to an existing cluster of servers and eliminating the need to install drivers on every machine.
The Astera client and server applications communicate on REST architecture. REST-compliant systems, often called RESTful systems, are characterized by statelessness and separate concerns of the client and server, which means that the implementation of both can be done independently if each side knows what format of messages to send to the other. The server communicates with the client using HTTPS commands, which are encrypted using a certified key/certificate signed by an authority. This saves the data from being intercepted by an attacker as the plaintext is encrypted as a random string of characters.
Once you have created the repository and configured the server, the next step is to login using your Astera account credentials.
You will not be able to design any dataflows or workflows on the client if you haven’t logged in to your Astera account. The options will be disabled.
Go to Server > Configure > Step 2: Login as admin.
This will direct you to a login screen where you can provide your user credentials.
If you are using Astera 10 for the first time, you can login using the default credentials as follows:
Username: admin Password: Admin123
After you log in, you will see that the options in the Astera Client are enabled.
You can use these options until your trial period is active. For fully activating the options and the product, you’ll have to enter your license.
If you don’t want Astera to show you the server connection screen every time you run the client application, you can skip that by modifying the settings.
To do that go to Tools > Options > Client Startup and select the Auto Connect to Server option. On enabling the option, Astera will store the server details you entered previously and will use those details to automatically reconnect to the server every time you run the application.
The next step after logging in is to unlock Astera using the License key.
Note: You cannot use your existing set of keys (from version 6 or 7). If you are planning to migrate from version 7 (or earlier) to version 8, 9 or 10, please contact , as you will need a new license key.
This article introduces the role-based access control mechanism in Astera. This means that administrators can grant or restrict access to various users within the organization, based on their role in the entire data management cycle.
In this article, we will look at the user lists and role management features in detail.
Username: admin
Password: Admin123
Once you have logged in, you now have the option to create new users and we recommend you to do this as a first step.
To create/register a new user, right-click on the DEFAULT server node in the Server Explorer window and select User List from the context menu.
This will open the Server Browser panel.
Under the Security node, right-click on the User node and select Register User from the context menu.
This will open a new window. You can see quite a few fields required to be filled here to register a new user.
Once the fields have been filled, click Register and a new user will be registered.
Now that a new user is registered, the next step is assign roles to the user.
Select the user you want to assign the role(s) to and right-click on it. From the context menu, select Edit User Roles.
A new window will open where you can see all roles that are there by default in Astera or are custom created. We haven’t created any custom role, so we’ll see the three default roles that are - Developer, Operator, and Root.
Select the role that you want to assign to the user and click on the arrows in the middle section of the screen. You’ll see that the selected role will get transferred from the All Roles section to the User Roles section.
After you have assigned the roles. click OK and the specific role(s) will be assigned to the user.
Astera lets the admin manage resources allowed to any user. They can assign permissions of resources or they can restrict resources.
To edit role resources, right-click on any of the roles and select Edit Role Resources from the context menu.
This will open a new window. Here, you can see four nodes on the left under which resources can be assigned,
The admin can provide a role with resources from the Url node, the Cmd node, access to deployments from the REST node and access to Catalog aritfacts from the Catalog Node.
Expanding the Url node shows us the following resources,
Expanding the Cmd node will give us the following checkboxes as resources.
If we expand the REST checkbox, we can see a list of available API resources, including endpoints you might have deployed.
Upon expanding the Catalog node, we can see the artifacts that have been added to the Catalog, along with which endpoint permissions are to be given.
This concludes User Roles and Access Control in Astera Data Stack.
Existing Astera customers can upgrade to the latest version of Astera Data Stack by executing an exe. script, which automates the repository update to the latest release. This streamlined approach enhances the efficiency and effectiveness of the upgrade process, ensuring a smoother transition for users.
To start, download and run the latest server and client installers to upgrade the build.
Run the Repository Upgrade Utility to upgrade the repository.
Once run, you will be faced with the following prompt.
Click OK and the repository will be upgraded.
Once done, you will be able to view all jobs, schedules, and deployments that you previously worked with in the Job Monitor, Scheduler, and Deployment windows.
This concludes the working of the Repository Upgrade Utility in Astera Data Stack.
To activate Astera on your machine, you need to enter the license key provided with the product. When the license key is entered, the client sends a request to the licensing server to grant the permission to connect. This action can only be performed when the client machine is connected to the internet.
However, Astera provides an alternative method to activate the license offline by providing an activation code to the users who request for it. Follow the steps given below for offline activation:
1. Go to the menu bar and click on Server > Configure > Step 4: Enter License Key
2. Click on Unlock using a key.
3. Type your Name, Organization and paste the Key provided to you. Then, click Unlock. Do the same if you are changing (instead of activating) your license offline.
4. Another pop-up window will give you an error about the server being unavailable because you cannot connect to the server offline. Click OK.
5. Click on Activate using a key button.
7. You will receive an activation code from the support staff via e-mail. Paste this code into the Activation Code textbox and click on Activate.
8. You have successfully activated Astera on your machine offline using the activation code. Click OK.
9. A pop-up window will notify you that the client needs to restart for the new license to take effect. Click OK and restart the client.
You have successfully completed the offline activation of Astera.
6. Now, copy the Key and the Machine Hash and email it to The Machine Hash is unique for every machine. Make sure you send the correct Key and Machine Hash as it is very important for generating the correct Activation Code.
Once you have logged into the Astera client, you can set up an admin email to access the Astera server. This will also allow you to be able to use the “Forgot Password” option at the time of log in.
In this document, we will discuss how to verify admin email in Astera.
1. Once logged in, we will now proceed to enter an email address to associate with the admin user by verifying the email address.
Go to Server > Configure > Step 3: Verify Admin Email
2. Unless you have already set up an email address in the Mail Setup section of Cluster settings, the following dialogue box will pop up asking you to configure your email settings.
Click on Yes to open your cluster settings.
Click on the Mail Setup tab.
3. Enter your email server settings.
4. Now, right-click on the Cluster Settings active tab and click on Save & Close in order to save the mail setup.
5. Re-visit the Verify Admin Email step by going to Server > Configure > Step 3: Verify Admin Email.
This time, the Configure Email dialogue box will open.
6. Enter the email address you previously set up and click on Send OTP.
7. Use the OTP from the email you received and enter it in the Configure Email dialogue and proceed.
On correct entry of the OTP, an email successfully configured dialogue will appear.
8. Click OK to exit it. We can confirm our email configuration by going to the User List.
Right click on DEFAULT under Server Connections in the Server Explorer and go to User List.
9. This opens the User List where you can confirm that the email address has been configured with the admin user.
The feature is now configured and can be utilized when needed by clicking on Forgot Password in the log in window.
This opens the Password Reset window, where you can enter the OTP sent to the specified e-mail for the user and proceed to reset your password.
This concludes our discussion on verifying admin email in Astera.
The Text Convertor object enables users to extract text from various file formats, including documents, images, and audio files. It supports Optical Character Recognition (OCR) for enhanced performance in text extraction.
The Text Convertor object provides conversion for:
Document to Text: Extract text from PDFs, Doc/Docx and TXT files.
Image to Text: Use Optical Character Recognition (OCR) to extract text from images in JPG, PNG, and JPEG formats.
HTML to Text: Extract text from HTML, HTM, and XHTML files.
Markdown to Text: Extract text from MD, MARKDOWN, MKD, MKDN, MDWN, and MDOWN files.
Excel to Text: Extract text from XLS, XLSX, and CSV files.
In this guide, we will cover how to:
Convert a PDF document to text.
Extract text from PNG images using OCR.
Use the Text Converter object as a transformation
To get a Text Converter object, go to Toolbox > Sources > Text Converter. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Text Convertor object onto the designer.
Configure the object, by right-clicking on its header and select Properties from the context menu.
A dialog box will open.
This is where you can configure the properties for the Text Converter object.
1. The first step is to specify the File Path to the PDF file that needs to be converted.
Next, define an Output Directory where the converted text will be stored in another pdf file. (optional)
Configure the PDF Converter Options.
Pdf Password: Provide the password if the pdf file is password protected.
Pages To Read: Specify which pages need to be read. Leaving this empty means read all pages.
Text Converter Model: Select from the available models:
Google OCR
TesseractOCR (Beta)
PaddleOCR (Beta)
TextractOCR (Beta)
Force OCR: This option applies OCR to both digital and scanned files, regardless of their format. When unchecked (the default setting), the system first determines whether an incoming file is scanned or an image. OCR is applied only if the file is detected as scanned or image based.
Split Output: Check this box to split the text for each page into a separate output record.
Configure the Excel Converter Options.
Work Sheet Name: Specify the name of your worksheet that you want to read data from.
Space Between Excel Columns: Specify the space between the Excel Columns.
Blank Lines Before End of File: Specify the number of blank lines at which the file ends.
Tab Size: Specify the tab spacing to be used in the extracted text.
Once you have configured the Text Convertor object, click OK.
Right-click on the Text Convertor object’s header and select Preview Output from the context menu.
A Data Preview window will open and will show you the preview of the extracted text.
Extract Text from PNG Images using OCR
The first step is to specify the File Path to the PNG image that needs to be processed for text extraction.
Next, define an Output Directory where the extracted text will be stored. (optional)
Configure the Text Converter Options.
Text Converter Models: Select from the available models:
Google OCR
TesseractOCR
PaddleOCR
TextractOCR
Once you have configured the Text Convertor object, click OK.
Right-click on the Text Convertor object’s header and select Preview Output from the context menu.
A Data Preview window will open, displaying the extracted text from the PNG image.
We can also use the Text Converter object as a transformation. To do so, right click on the header of the object and select Transformation.
You’ll see the color of headerthe header changing from green to purple. Depicting its transition from a source to transformation. You can also notice an input node being added along with the output node.
You can now provide the input from any source object to the Text Converter object in your dataflow directly.
This concludes working with the Text Converter object in Astera Data Stack.
Resolution: there are three types of resolutions to choose from: Auto, High, Low, and Medium. To learn more on which resolution would suit your needs best, click .
Astera Data Stack can read data from a wide range of file sources and database providers. In this article, we have compiled a list of file formats, data providers, and web-applications that are supported for use in Astera Data Stack.
Amazon Aurora
Azure SQL Server
MySQL
Amazon Aurora Postgres
Amazon RDS
Amazon Redshift
DB2
Google BigQuery
Google Cloud SQL
MariaDB
Microsoft Azure
Microsoft Dynamics CRM
MongoDB (as a Source)
MS Access
MySQL
Netezza
Oracle
Oracle ODP .Net
Oracle ODP .Net Managed
PostgreSQL
PowerBI
Salesforce (Legacy)
Salesforce Rest
SAP SQL Anywhere
SAP Hana
Snowflake
SQL Server
SQLite
Sybase
Tableau
Teradata
Vertica
In addition, Astera features an ODBC connector that uses the Open Database Connectivity (ODBC) interface by Microsoft to access data in database management systems using SQL as a standard.
COBOL
Delimited files
Fixed length files
XML/JSON
Excel workbooks
PDFs
Report sources
Text files
Microsoft Message Queue
EDI formats (including X12, EDIFACT, HL7)
Microsoft Dynamics CRM
Microsoft Azure Blob Storage
Microsoft SharePoint
Amazon S3 Bucket Storage
Amazon Aurora MySQL
Azure Data Lake Gen 2
PowerBI
Salesforce
SAP
Tableau
AS2
FTP (File Transfer Protocol)
HDFS (Hadoop Distributed File System) n/a
SCP (Secure Copy Protocol)
SFTP (Secure File Transfer Protocol)
SOAP (Simple Object Access Protocol)
REST (REpresentational State Transfer)
Using the SOAP and REST web services connector, you can easily connect to any data source that uses SOAP protocol or can be exposed via REST API.
Here are some applications that you can connect to using the API Client object in Astera Data Stack:
FinancialForce
Force.com Applications
Google Analytics
Google Cloud
Google Drive
Hubspot
IBM DB2 Warehouse
Microsoft Azure
OneDrive
Oracle Cloud
Oracle Eloqua
Oracle Sales Cloud
Oracle Service Cloud
Salesforce Lightning
ServiceMAX
SugarCRM
Veeva CRM
The list is non-exhaustive.
You can also build a custom transformation or connector from the ground up quickly and easily using the Microsoft .NET APIs, and retrieve data from various other sources.
The Database Table Source object provides the functionality to retrieve data from a database table. It also provides change data capture functionality to perform incremental reads, and supports multi-way partitioning, which partitions a database table into multiple chunks and reads these chunks in parallel. This feature brings about major performance benefits for database reads.
The object also enables you to specify a WHERE clause and sort order to control the result set.
In this article, we will be discussing how to:
Get a Database Table Source object on the dataflow designer.
Configure the Database Table Source object according to the required layout and settings.
We will also be discussing some best practices for using a Database Table Source object.
To get a Database Table Source from the Toolbox, go to Toolbox > Sources > Database Table Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Database Table Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Database Table Source object, right-click on its header and select Properties from the context menu.
A dialog box will open.
This is where you can configure the properties for the Database Table Source object.
The first step is to specify the Database Connection for the source object.
Provide the required credentials. You can also use the Recently Used drop-down menu to connect to a recently connected database.
You will find a drop-down list next to the Data Provider.
This is where you select the specific database provider to connect to. The connection credentials will vary according to the provider selected.
Test Connection to make sure that your database connection is successful and click Next.
Next, you will see a Pick Source Table and Reading Options window. On this window, you will select the table from the database that you previously connected to and configure the table from the given options.
From the Pick Table field, choose the table that you want to read the data from.
Once you pick a table, an icon will show up beside the Pick Table field.
View Data: You can view data in a separate window in Astera.
View Schema: You can view the schema of your database table from here.
View in Database Browser: You can see the selected table in the Database Source Browser in Astera.
Table Partition Options
This feature substantially improves the performance of large data movement jobs. Partitioning is done by selecting a field and defining value ranges for each partition. At runtime, Astera generates and runs multiple queries against the source table and processes the result set in parallel.
Check the Partition Table for Reading option if you want your table to be read in partitions.
You can specify the Number of Partitions.
The Pick Key for the Partition drop-down will let you choose the key field for partitioning the table.
If you have specific key values based on which you want to partition the table, you can use the Specify Key Values (separated by comma) option.
Next, two checkboxes can be configured according to the user application.
The Favor Centerprise Layout option is useful in cases where your source database table layout has changed over time, but the layout built in Astera is static. And you want to continue to use your dataflows even with the updated source database table layout. You check this option and Astera will favor its own layout over the DB layout.
Check the Trim Trailing Spaces option if you want to remove the trailing whitespaces.
Dynamic Layout
Checking the Dynamic Layout option enables the two following options, Add Fields in Subsequent Objects, and Delete Fields in Subsequent Objects. These options can also be unchecked by users.
Add Fields in Subsequent Objects: Checking this option ensures that a field is definitely added in subsequent objects in a flow in case additional fields are manually added in the source database by the user or need to be added into the source database.
Delete Fields in Subsequent Objects: Checking this option ensures that a field is definitely deleted from subsequent objects in a flow in case of deletion from the source database.
Incremental Read Options
The Database Table Source object provides incremental read functionality based on the concept of audit fields. Incremental read is one of the three change data capture approaches supported by Astera. Audit fields are fields that are updated when a record is created or modified. Examples of audit fields include created date time, modified date time, and version number.
Incremental read works by keeping a track of the highest value for the specified audit field. On the next run, only the records with value higher than the saved value are retrieved. This feature is useful in situations where two applications need to be kept in sync and the source table maintains audit field values for rows.
Select Full Load if you want to read the entire table.
Select Incremental Load Based on Audit Fields to perform an incremental read. Astera will start reading the records from the last read.
Checking the Perform full load on next run option will override the incremental load function from the next run onwards and will perform a full load on it.
Use Audit Field to compare when the last read was performed on the dataset.
Specify the path to the file in the File Path, that will store incremental transfer information.
The next window is the Layout Builder. In this window you can modify the layout of your database table.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear in which you will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
If you want to change the position of any field and want to move it below or above another field in the layout, you can do this by selecting the row and using the Move up/Move down keys.
For example: We want to move the Country field right below the Region field. We will select the row and use the Move up key to move the field from the 9th row to the 8th.
After you are done customizing the Layout Builder, click Next. You will be taken to a new window, Where Clause. Here, you can provide a WHERE clause, which will filter the records from your database table.
For instance, if you add a WHERE clause that selects all the customers from the country “Mexico” in the Customers table.
Your output will be filtered out and only the records that satisfy the WHERE condition will be read by Astera.
Once you have configured the Database Table Source object, click Next.
A new window, Config Parameters will open. Here, you can define parameters for the Database Table Source object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Click OK.
You have successfully configured your Database Table Source object. The fields from the source object can now be mapped to other objects in a dataflow.
To get the Database Table Source object from the Data Source Browser, go to View > Data Source > Data Source Browser or press Ctrl + Alt + U.
A new window will open. You can see that the pane is empty right now. This is because we are not connected to any database source yet.
To connect the browser to a database source, go to the Add Data Source icon located at the top left corner of the pane and click on Add Database Connection.
A Database Connection box will open.
This is where you can connect to your database from the browser.
You can either connect to a Recently Used database or create a new connection.
To create a new connection, select your Data Provider from the drop-down list.
The next step is to fill in the required credentials. Also, to ensure that the connection is successfully made, select Test Connection.
Once you test your connection, a dialog box will indicate whether the test was successful or not.
Click OK.
Once you have connected the browser, your Data Source Browser will now have the databases that you have on your server.
Select the database that you want to work with and then choose the table you want to use.
Drag-and-drop Customers table onto the designer in Astera.
If you expand the dropped object, you will see that the layout for the source file is already built. You can even preview the output at this stage.
Right-clicking on the Database Table Source object will also display options for the database table.
Show in Data Source Browser - Will show where the table resides in the database in the Database Browser.
View Table Data - Builds a query and displays all the data from the table.
View Table Schema - Displays the schema of the database table.
Create Table - Creates a table on a database based on the schema.
The ETL and ELT functionality of Astera Data Stack is represented by Dataflows. When you open a new Dataflow, you’re provided with an empty canvas knows as the dataflow designer. This is accompanied with a Toolbox that contains an extensive variety of objects, including Sources, Destinations, Transformations, and more.
Using the Toolbox objects and the user-friendly drag-and-drop interface, you can design ETL pipelines from scratch on the Dataflow designer.
The Dataflow Toolbar also consists of various options.
These include:
Undo/Redo: The Dataflow designer supports unlimited Undo and Redo capability. You can quickly Undo/Redo the last action done, or Undo/Redo several actions at once.
Auto Layout Diagram: The Auto Layout feature allows you to arrange objects on the designer, improving its visual representation.
Zoom (%): The Zoom feature helps you adjust the display size of the designer. Additionally, you can select a custom zoom percentage by clicking on the Zoom % input box and typing in your desired value.
Auto-Size All: The Auto-Size All feature resizes all the object in a manner where all fields of the expanded nodes are visible and empty area inside the object is cropped out.
Expand All: The Expand All feature expands or enlarges the objects on the designer, improving the visual representation.
Collapse All: The Collapse All feature closes or collapses the objects on the designer, improving the visual representation and reducing clutter.
Use Orthogonal Links: The Use Orthogonal Links feature replaces the links between objects with orthogonal curves instead of straight lines.
Data Quality Mode: Data Quality Mode in Astera enhances Dataflows with advanced profiling and debugging by adding a Messages node to objects. This node captures statistical information, such as, TotalCount, ErrorCount, and WarningCount etc.
Safe Mode: The Safe Mode option allows you to study and debug your Dataflows in cases when access to source files or databases is not available. You can open a Dataflow/Subflow and then proceed to debug or understand it after activating Safe Mode.
Show Diagram Overview: This feature opens a Diagram Overview panel, allowing you to get an overview of the whole Dataflow designer.
Link Actions to Create Maps Using AI: The AI Auto-mapper semantically maps fields between different data layouts, automatically linking related fields, for example, "Country" to "Nation."
In the next sections, we will go over the object-wise documentation for the various Sources, Destination, Transformations, etc., in the Dataflow Toolbox.
Silent installation refers to the installation of software or applications on a computer system without requiring any user interaction or input. In a silent installation, the installation process occurs in the background, without displaying any user interfaces, prompts, or dialog boxes that would normally require the user to make choices or provide information. This type of installation is particularly useful in scenarios where an administrator or IT professional needs to deploy software across multiple computers or systems efficiently and consistently.
Obtain the installer file you want to install silently. This could be an executable (.exe), Microsoft Installer (MSI), or any other installer format.
For example, in this article we will be using the ReportMiner.exe file to perform the silent installation.
To initiate the silent installation, you'll need to use a command-line interface. Open the Command Prompt as an administrator.
To achieve this, first search for “Command Prompt” in the Windows search bar then right-click the Command Prompt app, and select Run as administrator from the context menu. This will launch the Command Prompt with administrative privileges.
Locate the installation file and open its location in Windows Explorer. Once you have located the file in Windows Explorer, the full path will be displayed in the address bar at the top of the window. The address bar shows the complete path from the drive letter to the file's location.
For example, this file is located at "C:\Users\muhammad.hasham\Desktop\Silent Installation Files" as evident with the full path displayed in the address bar.
Alternatively, you can also right-click the file and select Properties from the context menu. In the Properties Window, you'll find a Location field that displays the full path to the file.
To silent install the file, change your current location to the specific folder containing the installer using the Command Prompt. To do so, enter the following command in the Command Prompt:
For example:
Use the appropriate command to run the installer in silent mode. This command might involve specifying command-line switches that suppress dialogs and prompts.
General File:
Example:
General File:
Example:
To run it in this manner:
The silent installation might take some time. Wait for the installation process to finish. Depending on the software, you might receive an output indicating the progress and success of the installation.
After the installation is complete, verify that the software is installed as expected. You might want to check the installation directory, program shortcuts, or any other relevant indicators.
Use the provided command in the Command Prompt to remove the silently installed file.
General File:
Example:
This concludes our discussion on Silent Installations.
LLM Generate is the primary object of Astera’s AI offerings. When used in a logical combination with other objects, we can use it to create AI-powered solutions.
LLM Generate allows the user to retrieve an output from an LLM model, based on the input prompt. The user can select from a choice of LLM providers, including OpenAI, Llama etc. User also has the option to use custom LLM models.
It has an input port and an output port.
Input port allows us to map fields that we want to include in the input prompt to the LLM model to generate the result.
Output port is populated with the result of LLM Generate.
LLM Generate can be used in countless use cases to generate unique applications. Here, we will cover a basic use case, where LLM Generate will be used to create an invoice extraction solution.
The source file is a PDF invoice. In the output, we want the extracted data from the invoice in a structured JSON file. We want to create a flexible extraction solution that can take invoices of various unpredictable layouts and generate the JSON output in a fixed format.
Create a new dataflow. Here we will design our invoice extraction pipeline.
In the output port of the source object, we have the entire content of the pdf file as a single string.
This string can now be mapped to the LLM Generate object as input, along with our instructions in the prompt to generate the output.
To do this, we will drag-and-drop LLM Generate object from AI Section of the toolbox to the dataflow designer.
To use an LLM Generate object, we need to map input field/s and define a prompt. In the Output node we get the response of the LLM model, which we can map to downstream objects in our data pipeline. Other configurations of the LLM Generate are set as default but may be adjusted if required by the use case.
As the first step, we will map our input fields to the LLM Generate object’s input node. We can map any number of input fields as required by our use case. For our use case, we will map a single input field, the invoice text from the Text Convertor. This field will have the invoice content as a string. We can rename the input fields, if needed, inside the LLM Generate object.
The next step is to write the prompt that will act as a set of instructions to the LLM for the response that we would like in the output. Go into the properties of LLM Generate object and right-click on the Prompts node and select ‘Add Prompt’. You can also use the ‘Add Prompt’ button at the top of the layout window.
A Prompt node will appear containing the Properties and Text fields.
Prompt Properties
Properties are set by default. Clicking the Properties field opens the Prompt options. The default settings are as shown in the image below:
Run Strategy Type: Defines the execution of the object based on the input.
Once Per Item means that the object will run once per input record. This option is used in cases where the input has multiple records and LLM Generate is to be executed for each record. The output of LLM Generate will have the same number of records as the input.
Chain means that the object will use the output of one prompt and feed it as input for the subsequent prompt within the LLM Generate object. To use the output from the last prompt within the current prompt use the syntax {LLM.LastPrompt.Result}
. To use the prompt from the last prompt within the current prompt use the syntax {LLM.LastPrompt.Text}
.
Conditional Expression: Here, you can provide the condition that must be satisfied for this prompt to be used in the LLM Generate execution. It works in conjunction with multiple prompts, in cases where one of the multiple prompts are to be used based on some criteria.
For our use case, we have used the default settings of the Prompt Properties.
Prompt Text
Prompt text allows us to write the prompt that is sent to the LLM model to get the response in the output.
In the prompt, we can include the contents of the input fields using the syntax:
{Input.field}
In the above syntax, we can provide the input field name in place of field.
We can also use functions to customize our prompt by clicking the functions icon.
For instance, the following syntax will resolve to the first 1000 characters of the input field value in the prompt:
{Left(Input.field,1000)}
For our use case, we will write a prompt that instructs the LLM to extract data from the provided invoice and generate the output in the JSON structure we have provided in the prompt.
Click Next to move to the next screen. This is the LLM Generate Properties screen.
General Options
We can select the AI provider and model for our object. Additionally, we can use LLM models not provided in the list or a custom LLM model by configuring their API connection as part of a shared connection file inside the project. However, custom fine-tuned models are only supported when using the "Open AI" AI provider.
Ai SDK Options
AI SDK Options allow fine-tuning the output or behavior of the model.
Evaluation Metrics: Enabling this option introduces three additional fields in the output, OutputTokens, LogProbs, and PerplexityScore.
Output Tokens : The total number of tokens of the generated result by LLM Generate. It can help understand the volume/length of the generated content.
LogProbs: Log-probabilities associated with each generated token. These values represent the likelihood (in logarithmic scale) of a specific token being generated based on the model's understanding of the input and context. It depicts the model’s confidence in generating each token.
To understand Log-Probs better, we have a second flow here that identifies the document type from the available options we have provided in the prompt.
We want the confidence score of the result of the AI model as well. For this, we can parse the LogProbs and calculate its exponential. This linear prob calculation is only possible when there is a single value of logprob, which means the output will be set to generate a single token. It is useful for classification cases such as this one, or where boolean response is expected.
After applying exponential to logprob, it becomes linear probability. In the output, we will have the result and the linear probability, or the confidence score. A value closer to 1 means higher confidence of the result.
PerplexityScore: It measures how well a language model predicts a sequence of words. Lower perplexity (closer to 1) means better predictions, while higher perplexity indicates greater uncertainty in predicting the next word.
Max Tokens: Limits the output tokens. Limit is specific to the model. Each model has its max token limit, and we need to set our limit within that threshold.
Temperature: It controls randomness in model predictions. At temperature 0 (default), the model's output is deterministic and consistent, always choosing the most likely token. This also means the model will always produce the same output for the same input. Higher temperatures increase randomness and creativity in the output.
Top P: Also called nucleus sampling controls the diversity of generated output by selecting tokens from a subset of the most likely options. Top P 0.1 (default) means the model will only consider the smallest set of tokens whose cumulative probability is at least 10%. This significantly narrows down the possible token choices, making the output more focused and less random. Increasing the Top P results in less constrained and more creative responses.
For our primary use case, of invoice extraction, we are using OpenAI gpt-4 and default configurations for other options on the Properties screen to generate the result.
Now, we’ll click OK to complete the configuration of the LLM Generate object. We can preview the output to confirm that we are getting the desired response.
Now we want to write this text output to a JSON file. We will first drag and drop a JSON Parser onto the designer. We will map the output field of the LLM Generate object to the input field of the JSON Parser object.
Open the Properties of JSON Parser. On the layout screen, we can create our preferred layout or provide a JSON sample to generate the layout automatically. We have copied the same layout we provided in the prompt and pasted into the ‘Generate Layout by Providing Sample Text’ option in the Json layout window.
Once the JSON Parser object is configured, drag and drop a JSON Destination file, configure its file path and map all the fields from the JSON Parser output.
Our dataflow is now configured, and we can run it to create the JSON file for our invoice.
To automate the process for extracting multiple invoices, we will create a workflow. To parameterize the source and destination file paths, we will add and configure a Variables object in our dataflow.
In our workflow, we’ll configure three objects:
Once our workflow is configured, we can run it to extract data and write to JSON files for all of our invoices.
The flexibility of LLM Generate to provide an input, give natural language commands on how to manipulate the input to generate the output makes it a dynamic universal transformation object in a data pipeline. There can be countless use cases for LLM Generate. We will cover some of these in the next documents.
The File System Items Source in Astera Data Stack is used to provide metadata information to a task in a dataflow or workflow. In a dataflow, it can be used in conjunction with a source object, especially in cases where you want to process multiple files through the transformation and loading process.
In a workflow, the File System Items Source object can be used to provide input paths to a subsequent object such as a RunDataflow task.
Let’s see how it works in a dataflow.
Here we have a dataflow that we want to run on multiple source files that contain Customer_Data from a fictitious organization. We are going to use the source object as a transformation and provide the location of the source files using a File System Items Source object. The File System Items Source will provide the path to the location where our source files reside and the source will object pick the source files from that location, one by one, and pass it on for further processing in the dataflow.
Here, we want to sort the data, filter out records of customers from Germany and write the filtered records into a database table. The source data is stored in delimited (.csv) files.
First, change the source object into a Transformation object. This is because the data is stored in multiple delimited files and we want to process all of them in the dataflow. For this, right-click on the source object’s header and click Transformation in the context menu.
You can see that the color of the source object has changed from green to purple which indicates that the source object has been changed into a transformation object.
Notice that the source object now has two nodes: Input and Output. The Input node has an input mapping port which means that it can take the path to the source file from another object.
Now we will use a File System Items Source object to provide a path to Customer_Data Transformation object. Go to the Sources section in the Toolbox and drag-and-drop the File System Items Source object onto the designer.
If you look at the File System Items Source object, you can see that the layout is pre-populated with fields such as FileName, FileNameWithoutExtension, Extension, FullPAth, Directory, ReadOnly, Size, and other attributes of the files.
To configure the properties of the File System Items Source object, right-click on the File System Items Source object’s header and go to Properties.
This will open the File System Properties window.
The first thing you need to do is point the Path to the directory or folder where your source files reside.
You can see a couple of other options on this screen:
Filter: If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “*.csv” in the Filter field to filter and read delimited files only.
Include items in subdirectories: Check this option if you want to process files present in the sub-directories
Include Entries for Directories: Check this option if you want to include all items in the specified directory
Once you have specified the Path and other options, click OK.
Now right-click on the File System Items Source object’s header and select Preview Output.
You can see that the File System Items Source object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the FileName, FileNameWithoutExtension, Extension, FullPath, Directory, and other attributes such as whether the file is ReadOnly, FileSize, LastAccessed, and other details in the output.
Now let’s start mapping. Map the FullPath field from the File System Items Source object to the FullPath field under the Input node in the Customer_Data Transformation object.
Once mapped, when we run the dataflow, the File System Items Source will pass the path to the source files, one by one, to the Customer_Data Transformation object. The Customer_Data Transformation object will read the data from the source file and pass it to the subsequent transformation object to be processed further in the dataflow.
In a workflow, the File System Items Source object can be used to provide input paths to a subsequent task such as a RunDataflow task. Let’s see how this works.
We want to design a workflow to orchestrate the process of extracting customer data stored in delimited files, sorting that data, filtering out records of customers from Germany and loading the filtered records in a database table.
We have already designed a dataflow for the process and have called this dataflow in our workflow using the RunDataflow task object.
We have multiple source files that we want to process in this dataflow. So, we will use a File System Items Source object to provide the path to our source files to the RunDataFlow task. For this, go to the Sources section in the Toolbox and drag-and-drop the File System Items Source onto the designer.
If you look at the File System Items Source, you can see that the layout is pre-populated with fields such as FileName, FileNameWithoutExtension, Extension, FullPAth, Directory, ReadOnly, Size, and other attributes of the files. Also, there is this small blue icon with the letter ‘s’, this indicates that the object is set to run in Singleton mode.
By default, all objects in a workflow are set to execute in Singleton mode. However, since we have multiple files to process in the dataflow, we will set the File System Items Source object to run in loop. For this, right-click on the File System Items Source and click Loop in the context menu.
You can see that the color of the object has changed to purple, and it now has this purple icon over the header which denotes the loop function.
It also has these two mapping ports on the header to map the File System Items Source object to the subsequent action in the workflow. Let’s map it to the RunDataflowTask.
To configure the properties of the File System Items Source, right-click on the File System Item Source object’s header and go to Properties.
This will open the File System Items Source Properties window.
The first thing you need to do is point the Path to the directory or folder where your source files reside.
You can see a couple of other options on this window:
Filter: If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “*.csv” in the Filter field to filter and read delimited files only.
Include items in subdirectories: Check this option if you want to process files present in the sub-directories.
Include Entries for Directories: Check this option if you want to include all items in the specified directory.
Once you have specified the Path and other options, click OK.
Now right-click on the File System Items Source object’s header and click Preview Output.
You can see that the File System Items Source object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the FileName, FileNameWithoutExtension, Extension, FullPath, Directory, and other attributes such as whether the file is ReadOnly, FileSize, LastAccessed, and other details in the output.
Now let’s start mapping. Map the FullPath field from the File System Items Source object to the FilePath variable in the RunDataflow task.
Once mapped, upon running the dataflow, the File System Items Source object will pass the path to the source files, one by one, to the RunDataflow task. In other words, the File System Items Source acts as a driver to provide source files to the RunDataflow tasks, which will then process them in the dataflow.
When the File System Items Source is set to run in a loop, the dataflow will run for ‘n’ number of times; where ‘n’ = the number of files passed by the File System Items Source to the RunDataflow task. For instance, you can see that we have six source files in the specified folder. The RunDataflow task object will pass these six files one by one to the RunDataflow task to be processed in the dataflow.
This concludes using the File System Items Source object in Astera Data Stack.
The Fixed-Length File Source object in Astera provides a high-speed reader for files containing fixed length records. It supports files with record delimiters as well as files without record delimiters.
In this section, we will cover how to get Fixed Length File Source object on the dataflow designer from the Toolbox.
To get a Fixed Length File Source object from the Toolbox, go to Toolbox > Sources > Fixed Length File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Fixed Length File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Fixed Length File Source object, right-click on its header and select Properties from the context menu.
When you select the Properties option from the context menu, a dialog box will open.
This is where you configure the properties for Fixed Length File Source object.
The first step is to provide the File Path for the Fixed Length File Source object. By providing the File Path you are building the connectivity to the source dataset.
File Contains Headers
Record Delimiter is specified as
The dialog box has some other configuration options:
If the source File Contains Header and you want the Astera source layout to read headers from the source file, check this option.
If you want the file to be read in portions, for instance, your file has data over 1000 rows, upon selecting Partition File for Reading, Astera will read your file according to the specified Partition Count. For example, a file with 1000 rows, with the Partition Count specified as 2, will be read in two partitions of 500 rows each. This is a back-end process that makes data reading more efficient and helps in processing data faster. This will not have any effect on your output.
Record Delimiter field allows you to select the delimiter for the records in the source file. The choices available are carriage-return line-feed combination , carriage-return and line-feed**. You can also type the record delimiter of your choice instead of choosing from the available options.
In case the records do not have a delimiter and you rely on knowing the size of a record, the number in the Record Length field is used to specify the character length for a single record.
The Encoding field allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
To define a hierarchical file layout and process the data file as a hierarchical file check the This is a Hierarchical File option. Astera IDE provides extensive user interface capabilities for processing hierarchical structures.
Advanced File Options
In the Header spans over field, give the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
Check the Enforce exact header match option if you want the header to be read as it is.
Check the Column order in file may be different from the layout option, if the field order in your source layout is different from the field order in Astera’s layout.
Check the Column headers in file may be different from the layout option if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
To skip any unwanted rows at the beginning of your file, you can specify the number of records that you want to omit through the Skip initial records option.
Raw text filter
If you do not want to apply any filter and process all records, check the No filter. Process all records option.
If there is a specific value which you want to filter out, you can check the Process if begins with option and specify the value that you want Astera to read from the data, in the provided field.
If there is a specific expression which you want to filter out, you can check the Process if matches this regular expression option and give the expression that you want Astera to read from the data, in the provided field.
String Processing
String processing options come in use when you are reading data from a file system and writing it to a database destination.
Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Astera will omit those accordingly in the output.
Check the Trim strings option when you want to omit any extra spaces in the field value.
Once you have specified the data reading options on this window, click Next.
The next window is the Length Markers window. You can put marks and specify the columns in your data.
Using the Length Markers window, you can create the layout of your fixed-length file. To insert a field length marker, you can click in the window at any point. For example, if you want to set the length of a field to contain five characters and the field starts at five, then you need to click at the marker position nine.
If you point your cursor to where the data is starting from, (in this case next to OrderID) and double-click on it, Astera will automatically detect columns and put markers in your data. Blue lines will appear as markers on the columns that will get detected.
You can modify the markers manually. To delete a marker, double-click on the column which has been marked.
In this case we removed the second marker and instead added a marker after CustomerID and EmployeeID.
In this way you can add as many markers as the number of columns/fields there are in the data set.
You can also use the Build from Specs feature to help you build destination fields based on an existing file instead of manually specifying the layout.
After you have built the layout by inserting the field markers, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of your fixed length source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will open where you will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
Other options that the Layout Builder provides are:
After you are done customizing the layout in the Object Builder window, click Next. You will be taken to a new window, Config Parameters. Here, you can define parameters for the Fixed Length File Source.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Once you have been through all configuration options, click OK.
The FixedLengthFileSource object is now configured.
The Fixed Length File Source object has now been modified from its previous configuration. The new object has all the modifications that we specified in the Layout Builder.
In this case, the modifications that we made were:
Separated the EmployeeID column from the OrderDate column.
Added the CustomerName column.
You have successfully configured your Fixed Length File Source object. The fields from the source object can now be mapped to other objects in a dataflow.
The COBOL File Source object holds the ability to fetch data from a COBOL source file if the user has the workbook file available. The data present in this file can then be processed further in the dataflow and then written to a destination of your choice.
Expand the Sources section of the Toolbox and select the COBOL Source object.
Drag-and-drop the COBOL Source object onto the dataflow. It will appear like this:
By default, the COBOL Source object is empty.
To configure it according to your requirements, right-click on the object and select Properties from the context menu.
Alternatively, you can open the properties window by double-clicking on the COBOL Source object header.
The following is the properties tab of the COBOL Source object.
File Path: Clicking on this option allows you to define a path to the data file of a COBOL File.
For our use case, we will be using a sample file with an .EBC extension.
Encoding: This drop-down option allows us to select the encoding from multiple options.
In this case, we will be using the IBM EBCDIC (US-Canada) encoding.
Record Delimiter: This allows you to select the kind of delimiter from the drop-down menu.
(Carriage Return): Moves the cursor to the beginning of the line without advancing to the next line.
(Line Feed): Moves the cursor down to the next line without returning to the beginning of the line.
<CR><LF>: Does both.
For our use case, we have selected the following.
Copybook: This option allows us to define a path to the schema file of a COBOL File.
For our use case, we are using a file with the .cpy extension.
Next, three checkboxes can be configured according to the user application. There is also a Record Filter Expression field given under these checkboxes.
Ignore Line Numbers at Start of Lines: This option is checked when the data file has incremental values. It is going to ignore line numbers at the start of lines.
Zone Decimal Sign Explicit: Controls whether there is an extra character for the minus sign of a negative integer.
Fields with COMP Usage Store Data in a Nibble: Checking this box will ignore the COMP encryption formats where the data is stored.
COMP formats range from 1-6 in COBOL Files.
Record Filter Expression: Here, we can add a filter expression that we wish to apply to the records in the COBOL File.
On previewing output, the result will be filtered according to the expression.
Once done with this configuration, click Next, and you will be taken to the next part of the properties tab.
The COBOL Source Layout window lets the user check values which have been read as an input.
Expand the Source node, and you will be able to check each of the values and records that have been selected as an input.
This gives the user data definition and field details on further expanding the nodes.
Once these values have been checked, click Next.
The Config Parameters window will now open. Here, you can further configure and define parameters for the COBOL Source Object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Click Next.
Now, a new window, General Options, will appear.
Here, you can add any Comments that you wish to add. The rest of the options in this window have been disabled for this object.
Once done, click OK.
The COBOL Source object has now been configured. The extracted data can now be transformed and written to various destinations.
This concludes our discussion on the COBOL Source Object and its configuration in Astera Data Stack.
Each source on the dataflow is represented as a source object. You can have any number of sources in the dataflow, and they can feed into zero or more destinations.
The following source types are supported by the dataflow engine:
Flat File Sources:
Tree File Sources:
Database Sources:
Data Model
All sources can be added to the dataflow by picking a source type on the Toolbox and dropping it on the dataflow. File sources can also be added by dragging-and-dropping a file from an Explorer window. Database sources can be drag-and-dropped from the Data Source Browser. For more details on adding sources to the dataflow, see Introducing Dataflows.
Adding a Delimited File Source object allows you to transfer data from a delimited file. An example of what a delimited file source object looks like is shown below.
To configure the properties of a Delimited File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding a Fixed-Length File Source object allows you to transfer data from a fixed-length file. An example of what a Fixed-Length File Source object looks like is shown below.
To configure the properties of a Fixed-Length File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding an Excel Workbook Source object allows you to transfer data from an Excel file. An example of what an Excel Workbook Source object looks like is shown below.
To configure the properties of an Excel Workbook Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding a COBOL File Source object allows you to transfer data from a COBOL file. An example of what a COBOL File Source object looks like is shown below.
To configure the properties of a COBOL File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding an XML/JSON File Source object allows you to transfer data from an XML file. An example of what an XML/JSON File Source object looks like is shown below.
To configure the properties of an XML/JSON File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
General Properties window:
File Path – Specifies the location of the source XML file. Using UNC paths is recommended if running the dataflow on a server.
Schema File Path – Specifies the location of the XSD file controlling the layout of the XML source file.
Optional Record Filter Expression – Allows you to enter an expression to selectively filter incoming records according to your criteria. You can use the Expression Builder to help you create your filter expression. For more information on using Expression Builder, see Expression Builder.
Adding a Database Table Source object allows you to transfer data from a database table. An example of what a Database Table Source object looks like is shown below.
To configure the properties of a Database Table Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your source, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected source.
Pick Source Table window:
Select a source table using the Pick Table dropdown.
Select Full Load if you want to read the entire table.
Select Incremental Load Based on Audit Fields to perform an incremental read starting at a record where the previous read left off.
Incremental load based on Audit Fields is based around the concept of Change Data Capture (CDC), which is a set of reading and writing patterns designed to optimize large-scale data transfers by minimizing database writing in order to improve performance. CDC is implemented in Astera using Audit Fields pattern. The Audit Fields pattern uses create time or last update time to determine the records that have been inserted or updated since the last transfer and transfers only those records.
Advantages
Most efficient of CDC patterns. Only records that were modified since the last transfer are retrieved by the query thereby putting little stress on the source database and network bandwidth
Disadvantages
Requires update date time and/or create date time fields to be present and correctly populated
Does not capture deletes
Requires index on the audit field(s) for efficient performance
To use the Audit Fields strategy, select the Audit Field and an optional Alternate Audit Field from the appropriate dropdown menus. Also, specify the path to the file that will store incremental transfer information.
Where Clause window:
You can enter an optional SQL expression serving as a filter for the incoming records. The expression should start with the WHERE word followed by the filter you wish to apply.
For example, WHERE CreatedDtTm >= ‘2001/01/05’
General Options window:
The Comments input allows you to enter comments associated with this object.
Adding a SQL Query Source object allows you to transfer data returned by the SQL query. An example of what an SQL Query Source object looks like is shown below.
To configure the properties of a SQL Query Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your SQL Query, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected database.
SQL Query Source window:
Enter the SQL expression controlling which records should be returned by this source. The expression should follow SQL syntax conventions for the chosen database provider.
For example, select OrderId, OrderName, CreatedDtTm from Orders.
Source or Destination is a Delimited File
If your source or destination is a Delimited File, you can set the following properties
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Field Delimiter - Allows you to select the delimiter for the fields. The available choices are , and . You can also type the delimiter of your choice instead of choosing the available options.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Quote Char - Allows you to select the type of quote character to be used in the delimited file. This quote character tells the system to overlook any special characters inside the specified quotation marks. The options available are ” and ’.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Microsoft Excel Worksheet
If the Source and/or the Destination chosen is a Microsoft Excel Worksheet, you can set the following properties:
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Worksheet - Allows you to select a specific worksheet from the selected Microsoft Excel file.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Fixed Length File
If the Source and/or the Destination chosen is a Fixed Length File, you can set the following properties:
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Using the Length Markers window, you can create the layout of your fixed-length file, The Length Markers window has a ruled marker placed at the top of the window. To insert a field length marker, you can click in the window at a particular point. For example, if you want to set the length of a field to contain five characters and the field starts at five, then you need to click at the marker position nine.
In case the records don’t have a delimiter and you rely on knowing the size of a record, the number in the RecordLength box is used to specify the character length for a single record.
You can delete a field length marker by clicking the marker.
Source or Destination is an XML file
If the source is an XML file, you can set the following options:
Source File Path specifies the file path of the source XML file.
Schema File Path specifies the file path of the XML schema (XSD file) that applies to the selected source XML file.
Record Filter Expression allows you to optionally specify an expression used as a filter for incoming source records from the selected source XML file. The filter can refer to a field or fields inside any node inside the XML hierarchy.
The following options are available for destination XML files.
Destination File Path specifies the file path of the destination XML file.
Encoding - Allows you to choose the encoding scheme for the XML file from a list of choices. The default value is Unicode (UTF-8).
Format XML Output instructs Astera to add line breaks to the destination XML file for improved readability.
Read From Schema File specifies the file path of the XML schema (XSD file) that will be used to generate the destination XML file.
Root Element specifies the root element from the list of the available elements in the selected schema file.
Generate Destination XML Schema Based on Source Layout creates the destination XML layout to mirror the layout of the source.
Root Element specifies the name of the root element for the destination XML file.
Generate Fields as XML Attributes specifies that fields will be written as XML attributes (as opposed to XML elements) in the destination XML file.
Record Node specifies the name of the node that will contain each record transferred.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.
When importing from a fixed-width, delimited, or Excel file, you can specify the following advanced reading options:
Header Spans x Rows - If your source file has a header that spans more than 1 row, select the number of rows for the header using this control.
Skip Initial Records - Sets the number of records which you want skipped at the beginning of the file. This option can be set whether or not your source file has a header. If your source file has a header, the first record after the specified number of rows to skip will be used as the header row.
Raw Text Filter - Only records starting with the filter string will be imported. The rest of the records will be filtered.
You can optionally use regular expressions to specify your filter. For example, the regular expression ^[12][4] will only include records starting with 1 or 2, and whose second character is 4.
Raw Text Filter setting is not available for Excel source files.
If your source is a fixed-length file, delimited file, or Excel spreadsheet, it may contain an optional header row. A header row is the first record in the file that specifies field names and, in the case of a fixed-length file, the positioning of fields in the record.
If your source file has a header row, you can specify how you want the system to handle the differences between your actual source file, and the source layout specified in the setting. Differences may arise due to the fact that the source file has a different field order from what is specified in the source layout, or it may have extra fields compared to the source layout. Conversely, the source file may have fewer fields than what is defined in the source layout, and the field names may also differ, or may have changed since the time the layout was created.
By selecting from the available options, you can have Astera handle those differences exactly as required by your situation. These options are described in more detail below:
Enforce exact header match – Lets Astera Data Stack proceed with the transfer only if the source file’s layout matches the source layout defined in the setting exactly. This includes checking for the same number and order of fields and field names.
Columns order in file may be different from the layout – Lets Astera Data Stack ignore the sequence of fields in the source file, and match them to the source layout using the field names.
Column headers in file may be different from the layout – This mode is used by default whenever the source file does not have a header row. You can also enable it manually if you want to match the first field in the layout with the first field in the source file, the second field in the layout with the second field in the source file, and so on. This option will match the fields using their order as described above even if the field names are not matched successfully. We recommend that you use this mode only if you are sure that the source file has the same field sequence as what is defined in the source layout.
The Field Layout window is available in the properties of most objects on the dataflow to help you specify the fields making up the object. The table below explains the attributes you can set in the Field Layout window.
The table below provides a list of all the attributes available for a particular layout type.
Astera supports a variety of formats for each data type. For example, for Dates, you can specify the date as “April 12” or “12-Apr-08”. Data Formats can be configured independently for source and for destination, giving you the flexibility to correctly read source data and change its format as it is transferred to destination.
If you are transferring from a flat file (for example, Delimited or Fixed-Width), you can specify the format of a field so that the system can correctly read the data from that field.
If you do not specify a data format, the system will try to guess the correct format for the field. For example, Astera is able to correctly interpret any of the following as a Date:
April 12
12-Apr-08
04-12-2008
Saturday, 12 April 2008
and so on
Astera comes with a variety of pre-configured formats for each supported data type. These formats are listed in the Sample Formats section below. You can also create and save your own data formats.
To select a data format for a source field, go to Source Fields and expand the Format dropdown menu next to the appropriate field.
Sample Formats
Dates:
Booleans:
Integers:
Real Numbers:
Numeric Format Specifiers:
Some possible layouts of input invoice:
To read unstructured invoice as a source in our pipeline, we can drag and drop the object from the Sources section in the toolbox. Configure it by providing the path of your source file.
To understand these options better click .
Provide the folder path where all of our invoices are stored.
To generate an output JSON file path for each invoice using the source file name.
Provide the file path for the pre-configured dataflow.
Check the This is a COBOL data file option if you are working with COBOL files and do not have COBOL copybooks, you can still import this data by visually marking fields in the layout builder and specifying field data types. For more advanced parsing of COBOL files, you can use Astera’s .
Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Astera’s layout. You can create a file to store the values for alternate headers. You can also use the Synonym Dictionary file to facilitate automapping between objects on the flow diagram that use alternate names in field layouts.
Note: To open the source file for editing in a new tab, click icon next to the File Path input, and select Edit File.
To generate the schema, click icon next to the Schema File Path input, and select Generate.
To edit an existing schema, click icon next to the Schema File Path input, and select Edit File. The schema will open for editing in a new tab.
Note: Astera makes it possible to generate an XSD file from the layout of the selected source XML file. This feature is useful when you don’t have the XSD file available. Note that all fields are assigned the data type of String in the generated schema. To use this feature, expand the control and select Generate.
To open the Data Formats window, click icon located in the Toolbar at the top of the designer.
Column Name
Description
Data Type
Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean.
Start Position
Specifies the position from where that column/field starts.
Length
Defines the length of a column/field.
Alignment
Specifies the alignment of the values in a column/field. The options provided are right, left, and center.
Allows Null
Controls whether the field allows blank or NULL values in it.
Expressions
Defines functions through expressions for any field in your data.
Name
The system pre-fills this item for you based on the field header. Field names do not allow spaces. Field names are used to refer to the fields in the Expression Builder or tools where a field is used in a calculation formula.
Header
Represents the field name specified in the header row of the file. Field headers may contain spaces.
Data Type
Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean.
Format
Specifies the format of the values stored in that field, depending on the field’s data type. For example, for dates you can choose between DD-MM-YY, YYYY-MM-DD, or other available formats.
Start Position
Specifies the position of the field’s first character relative to the beginning of the record. Note: This option is only available for fixed length layout type.
Length
Specifies the maximum number of characters allotted for a value in the field. The actual value may be shorter than what is allowed by the Length attribute. Note: This option is only available for fixed length and database layout types.
Column Name
Specifies the column name of the database table. Note: This option is only available in database layout.
DB Type
Specifies the database specific data type that the system assigns to the field based on the field's data. Each database (Oracle, SQL, Sybase, etc) has its own DB types. For example, Long is only available in Oracle for data type string. Note: This option is only available in database layout.
Decimal Places
Specifies the number of decimal places for a data type specified as real. Note: This option is only available in database layout.
Allows Null
Controls whether the field allows blank or NULL values in it.
Default Value
Specifies the value that is assigned to the field in any one of the following cases:- The source field does not have a value - The field is not found in the source layout- The destination field is not mapped to a source field. Note: This option is only available in destination layout.
Sequence
Represents the column order in the source file. You can change the column order of the data being imported by simply changing the number in the sequence field. The other fields in the layout will then be reordered accordingly.
Description
Contains information about the field to help you remember its purpose.
Alignment
Specifies the positioning of the field’s value relative to the start position of the field. Available alignment modes are LEFT, CENTER, and RIGHT. Note: This option is only available for fixed length layout type.
Primary Key
Denotes the primary key field (or part of a composite primary key) for the table. Note: This option is only available in database layout.
System Generated
Indicates that the field will be automatically assigned an increasing Integer number during the transfer. Note: This option is only available in database layout.
Source Delimited file and Excel worksheet
Name, Header, Data type, Format
Source Fixed Length file
Name, Header, Data type, Format, Start position, Length
Source Database Table and SQL query
Column name, Name, Data type, DB type, Length, Decimal places, Allows null
Destination Delimited file and Excel worksheet
Name, Header, Data type, Format, Allows null, Default value
Destination Fixed Length file
Sequence, Name, Header, Description, Data type, Format, Start position, Length, Allows null, Default value, Alignment
Destination Database Table
Column name, Name, Data type, DB type, Length, Decimal places, Allows null, Primary key, System generated
dd-MMM-yyyy
12-Apr-2008
yyyy-MM-dd
2008-04-12
dd-MM-yy
12-04-08
MM-dd-yyyy
04-12-2008
MM/dd/yyyy
04/12/2008
MM/dd/yy
04/12/08
dd-MMM-yy
12-Apr-08
M
April 12
D
12 April 2008
mm-dd-yyyy hh:mm:ss tt
04-12-2008 11:04:53 PM
M/d/yyyy hh:mm:ss tt
4/12/2008 11:04:53 PM
Y/N
Y/N
1/0
1/0
T/F
T/F
True/False
True/False
######
123456
####
1234
####;0;(####)
-1234
.##%;0;(.##%)
123456789000%
.##%;(.##%)
1234567800%
$###,###,###,###
$1,234,567,890,000
$###,###,###,##0
$1,234,567,890,000
###,###
123450
#,#
1,000
##.00
35
###,###.##
12,345.67
##.##
12.34
$###,###,###,###
$1,234,567,890,000
$###,###,###,##0
$1,234,567,890,000
.##%;(.##%);
.1234567800%
.##%;0;(.##%)
.12345678900%
0
Zero placeholder
If the value being formatted has a digit in the position where the '0' appears in the format string, then that digit is copied to the result string; otherwise, a '0' appears in the result string. The position of the leftmost '0' before the decimal point and the rightmost '0' after the decimal point determines the range of digits that are always present in the result string. The "00" specifier causes the value to be rounded to the nearest digit preceding the decimal, where rounding away from zero is always used. For example, formatting 34.5 with "00" would result in the value 35.
#
Digit placeholder
If the value being formatted has a digit in the position where the '#' appears in the format string, then that digit is copied to the result string. Otherwise, nothing is stored in that position in the result string. Note that this specifier never displays the '0' character if it is not a significant digit, even if '0' is the only digit in the string. It will display the '0' character if it is a significant digit in the number being displayed. The "##" format string causes the value to be rounded to the nearest digit preceding the decimal, where rounding away from zero is always used. For example, formatting 34.5 with "##" would result in the value 35.
.
Decimal Point
The first '.' character in the format string determines the location of the decimal separator in the formatted value; any additional '.' characters are ignored.
,
Thousand separator and number scaling
The ',' character serves as both a thousand separator specifier and a number scaling specifier. Thousand separator specifier: If one or more ',' characters is specified between two digit placeholders (0 or #) that format the integral digits of a number, a group separator character is inserted between each number group in the integral part of the output. Number scaling specifier: If one or more ',' characters is specified immediately to the left of the explicit or implicit decimal point, the number to be formatted is divided by 1000 each time a number scaling specifier occurs. For example, if the string "0,," is used to format the number 100 million, the output is "100".
%
Percentage placeholder
The presence of a '%' character in a format string causes a number to be multiplied by 100 before it is formatted. The appropriate symbol is inserted in the number itself at the location where the '%' appears in the format string.
E0E+0E-0e0e+0e-0
Scientific notation
If any of the strings "E", "E+", "E-", "e", "e+", or "e-" are present in the format string and are followed immediately by at least one '0' character, then the number is formatted using scientific notation with an 'E' or 'e' inserted between the number and the exponent. The number of '0' characters following the scientific notation indicator determines the minimum number of digits to output for the exponent. The "E+" and "e+" formats indicate that a sign character (plus or minus) should always precede the exponent. The "E", "E-", "e", or "e-" formats indicate that a sign character should only precede negative exponents.
'ABC'"ABC"
Literal string
Characters enclosed in single or double quotes are copied to the result string, and do not affect formatting.
;
Section separator
The ';' character is used to separate sections for positive, negative, and zero numbers in the format string. If there are two sections in the custom format string, the leftmost section defines the formatting of positive and zero numbers, while the rightmost section defines the formatting of negative numbers. If there are three sections, the leftmost section defines the formatting of positive numbers, the middle section defines the formatting of zero numbers, and the rightmost section defines the formatting of negative numbers.
Other
All other characters
Any other character is copied to the result string, and does not affect formatting.
Transformations are used to perform a variety of operations on data as it moves through the dataflow pipeline. Astera Data Stack provides a full complement of built-in transformation enabling you to build sophisticated data maps. Astera Data Stack transformations are divided into two types— record level and set level.
Record level transformations are used to create derived values by applying a lookup, function, or expression to fields from a single record. Example of record level transformations include lookups, expression, and function transformations.
Set level transformations, on the other hand, operate on a group of records and may result in joining, reordering, elimination, or aggregation of records. Set transformations include join, sort, route, distinct, etc. Data sources and destinations are also considered set transformations.
In Astera, data records flow between set transformations. Record level transformations are used to transform or augment individual fields during these movements.
A record level transformation can be connected to only one set transformation. For instance, a lookup or expression cannot receive input from two different set transformations.
Other than transformations that enable combining multiple data sources—such as join, merge, and union—transformations can receive input from only one set transformation. Transformations, however, can receive input from any number of record level transformations as long as all these record level transformations receive input from the same transformation.
Delimited files are one of the most commonly used data sources and are used in a variety of situations. The Delimited File Source object in Astera provides the functionality to read data from a delimited file.
In this article, we will cover how to use a Delimited File Source object.
To get a Delimited File Source object from the Toolbox, go to Toolbox > Sources > Delimited File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Delimited File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Delimited File Source object, right-click on its header and select Properties from the context menu.
As soon as you have selected the Properties option from the context menu, a dialog box will open.
This is where you can configure the properties for the Delimited File Source object.
The first step is to provide the File Path for the delimited source file. By providing the file path, you are building the connectivity to the source dataset.
File Contains Headers
Record Delimiter is specified as CR/LF:
The dialog box has some other configuration options:
If the source file contains headers, and you want Astera to read headers from the source file, check the File Contains Header option.
If you want your file to be read in portions, upon selecting the Partition File for Reading option, Astera will read your file according to the specified Partition Count. For instance, if a file with 1000 rows has a Partition Count of 2 specified, the file will be read in two partitions of 500 each. This is a back-end process that makes data reading more efficient and helps in processing data faster. This will not have any effect on your output.
The Record Delimiter field allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination <CR/LF>, carriage-return - CR and line-feed - LF. You can also type the record delimiter of your choice instead of choosing from the available options.
In case the records do not have a delimiter and you rely on knowing the size of a record, the number in the Record Length field can be used to specify the character length for a single record.
The Encoding field allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8).
A Text Qualifier is a symbol that identifies where text begins and ends. It is used specifically when importing data. For example, if you need to import a text file that is comma delimited (commas separate the different fields that will be placed in adjacent cells).
To define a hierarchical file layout and process the data file as a hierarchical file, check the This is a Hierarchical File option. Astera IDE provides extensive user interface capabilities for processing hierarchical structures.
Use the Null Text option to specify a certain value that you do not want in your data, and instead want it to be replaced by a null value.
Check the Allow Record Delimiter Inside a Field Text option when you have the record delimiter as text inside your data and want that to be read as it is.
Check the Make All Fields As String In Build Layout option when you want to set all the fields in the layout builder as String.
Dynamic Layout
Checking the Dynamic Layout option enables the two following options, Delete Fields in Subsequent Objects, and Add Fields in Subsequent Objects. These options can also be unchecked by users.
Add Fields in Subsequent Objects: Checking this option ensures that a field is definitely added in subsequent objects in a flow in case additional fields are manually added in the source database by the user or need to be added into the source database.
Delete Fields in Subsequent Objects: Checking this option ensures that a field is definitely deleted from subsequent objects in a flow in case of deletion from the source database.
Advanced File Options
In the Header spans over field, specify the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
Check the Enforce exact header match option if you want the header to be read as it is.
Check the Column order in file may be different from the layout option, if the field order in your source layout is different from the field order in Astera’s layout.
Check the Column headers in file may be different from the layout option if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
To skip any unwanted rows at the beginning of your file, you can specify the number of records that you want to omit through the Skip initial records option.
Raw text filter
If you do not want to apply any filter and process all records, check No filter. Process all records.
If there is a specific value which you want to filter out, you can check the Process if begins with option and give the value that you want Astera to read from the data, in the provided field.
If there is a specific expression which you want to filter out, you can check the Process if matches this regular expression option and give the expression that you want Astera to read from the data, in the provided field.
String Processing
String Processing options come in use when you are reading data from a file system and writing it to a database destination.
Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Astera will omit those accordingly in the output.
Check the Trim strings option when you want to omit any extra spaces in the field value.
Once you have specified the data reading options on this window, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of the delimited source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear where you will have the option to Delete.
Selecting this option will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
After you are done customizing the layout, click Next. You will be directed to a new window, Config Parameters. Here, you can define parameters for the Delimited File Source object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Once you have configured the source object, click OK.
The Delimited File Source object is now configured according to the changes made.
The Delimited File Source object has now been modified from its previous configuration. The new object has all the modifications that were made in the builder.
In this case, the modifications that were made are:
Added the CustomerName column.
Deleted the ShipCountry column.
You have successfully configured your Delimited File Source object. The fields from the source object can now be mapped to other objects in the dataflow.
The Excel File Source object in Astera supports all formats of Excel. In this article, we will be discussing:
Various ways to get the Excel Workbook Source object on the dataflow designer.
Configuring the Excel Workbook Source object according to our required layout and settings.
In this section, we will cover the various ways to get an Excel Workbook Source object on the dataflow designer.
To get an Excel File Source object from the Toolbox, go to Toolbox > Sources > Excel Workbook Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Excel Workbook Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet. We will discuss the configuration properties for the Excel Workbook Source object in the next section.
If you already have a project defined and excel source files are a part of that project, you can directly drag-and-drop the excel file sources from the project tree onto the dataflow designer. The Excel File Source objects in this case will already be configured. Astera Data Stack detects the connectivity and layout information from the source file itself.
To get an Excel File Source object from the Project Explorer, go to the Project Explorer window and expand the project tree.
Select the Excel file you want to bring in as the source and drag-and-drop it on the designer. In this case, we are working with Customers -Excel Source.xls file so we will drag-and-drop it onto the designer.
If you expand the dropped object, you will see that the layout for the source file is already built. You can even preview the output at this stage.
To get an Excel Workbook Source directly from the file location, open the folder containing the Excel file.
Drag-and-drop the Excel file from the folder onto the designer in Astera.
If you expand the dropped object, you will see that the layout for the source file is already built. You can even preview the output at this stage.
To configure the Excel Workbook Source object, right-click on its header and select Properties from the context menu.
As soon as you have selected the Properties option from the context menu, a dialog box will open.
This is where you can configure your properties for the Excel Workbook Source object.
The first step is to provide the File Path for the excel source. By providing the file path, you are building the connectivity to the source dataset.
The dialog box has some other configuration options:
If your source file contains headers and you want your Astera source layout to read headers from the source file, check the First Row Contains Header box.
If you have blank rows in your file, you can use the Consecutive Blank Rows to Indicate End of File option to specify the number of blank rows that will indicate the end of the file.
Use the Worksheet option to specify if you want to read data from a specific worksheet in your excel file.
In the Start Address option, you can indicate the cell value from where you want Astera to start reading the data.
Check the Make All Fields As String In Build Layout option when you want to set all the fields in the layout builder as String.
Dynamic Layout
Checking the Dynamic Layout option enables the two following options, Delete Fields in Subsequent Objects, and Add Fields in Subsequent Objects. These options can also be unchecked by users.
Add Fields in Subsequent Objects: Checking this option ensures that a field is definitely added in subsequent objects in a flow in case additional fields are manually added in the source database by the user or need to be added into the source database.
Delete Fields in Subsequent Objects: Checking this option ensures that a field is definitely deleted from subsequent objects in a flow in case of deletion from the source database.
Advanced File Options
In the Header spans over option, give the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
Check the Enforce exact header match option if you want the header to be read as it is.
Check the Column order in file may be different from the layout option if the field order in your source layout is different from the field order in Astera layout.
Check on Column headers in file may be different from the layout if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
String Processing
String processing options come in use when you are reading data from a file system and writing it to a database destination.
Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Astera will omit those accordingly in the output.
Check the Trim strings option when you want to omit any extra spaces in the field value.
Once you have specified the data reading options on this screen, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of your Excel source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column) and double-click on it. A blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line and select Delete from the context menu.
This will Delete the entire row from the layout.
If you want to change the position of any field and want to move it below or above another field in the layout, you can do this by selecting the row and using Move up/move down keys.
For example: To move the Country field right below the Region field, we will select the row and use the Move up key to move this field from the 9th row to the 8th.
Other options that the Layout Builder provides:
Column Name
Description
Alternate Header
Assigns an alternate header value to the field.
Data Type
Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean.
Allows Null
Controls whether the field allows blank or NULL values in it.
Output
The output checkbox allows you to choose whether or not you want to enable data from a particular field to flow through further in the dataflow pipeline.
Calculation
Defines functions through expressions for any field in your data.
After you are done customizing the Object Builder, click Next. You will be taken to a new window,Config Parameters. Here, you can further configure and define parameters for the Excel source.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Once you have been through all the configuration options, click OK.
The ExcelSource object is now configured according to the changes made.
You have successfully configured your Excel Workbook Source object. The fields from the source object can now be mapped to other objects in the dataflow.
The SQL Query Source object enables you to retrieve data from a database using an SQL query or a stored procedure. You can specify any valid SELECT statement or a stored procedure call as a query. In addition, you can parameterize your queries dynamically, thereby allowing you to change their values at runtime.
In this article, we will be looking at how you can configure the SQL Query Source object and use it to retrieve data in Astera Data Stack.
Before moving on to the actual configuration, we will have to get the SQL Query Source object from the Toolbox onto the dataflow designer. To do so, go to Toolbox > Sources > SQL Query Source. In case you are unable to view the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the SQL Query Source object onto the designer.
The source object is currently empty as we have not configured it yet.
To configure the SQL Query Source object, right-click on its header and select Properties from the context menu. Alternatively, you can double-click the header of the source object.
A new window will pop up when you click on Properties in the context menu.
In this window, we will configure properties for the SQL Query Source object.
On this Database Connection window, enter information for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you want to connect to. The required credentials will vary according to your chosen provider.
Provide the required credentials. Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A separate window will appear, showing whether your test was successful. Close this window by clicking OK, and then, click Next.
The next window will present a blank page for you to enter your required SQL query. Here, you can enter any valid SELECT statement or stored procedure to read data from the database you connected to in the previous step.
The curly brackets located on the right side of the window indicate that the use of parameters is supported, which implies that you can replace a regular value with one that is parameterized and can be changed during runtime.
In this example, we will be reading the Orders table from the Northwind database.
Once you have entered the SQL query, click Next.
The following window will allow you to check or uncheck certain options that may be utilized while processing the dataset, if needed.
When checked, The Trim Trailing Spaces option will refine the dataset by removing extra whitespaces present after the last character in a line, up until the end of that line. This option is checked by default.
The Dynamic Layout option is unchecked by default. When checked, it will automatically enable two other sub-options.
o Delete Field In Subsequent Objects: When checked, this option will delete all fields that are present in subsequent objects.
o Add Fields In Subsequent Objects: When checked, this option will add fields that are present in the source object to subsequent objects.
Choose your desired options and click Next.
The next window is the Layout Builder. Here, you can modify the layout of the table that is being read from the database. However, these modifications will only persist within Astera and will not apply to the actual database table.
To delete a certain field, right-click on its serial column and select Delete from the context menu. In this example, we have deleted the OrderDate field.
To change the position of a field, click its serial column and use the Move up/Move down icons located in the toolbar of the window. In this example, we have moved up the EmployeeID field using the Move up icon, thus shifting the CustomerID field to the third row. You can move other fields up or down in a similar manner, allowing you to modify the entire order of the fields present in the table.
Once you are done customizing your layout, click Next.
In the Config Parameters window, you can define certain parameters for the SQL Query Source object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
Enter your desired values for these parameters, if any, and click Next.
Finally, aGeneral Options window will appear. Here, you are provided with:
A text box to add Comments.
A set of General Options that have been disabled.
To conclude the configuration, click OK.
You have successfully configured the SQL Query Source object. The fields are now visible and can be mapped to other objects in the dataflow.
Adding an XML/JSON File Source object to a dataflow allows you to read and transfer data from an XML or a JSON file.
In this section, we will cover how to get an XML/JSON File Source object on the dataflow designer from the Toolbox.
To get an XML/JSON File Source from from the Toolbox, go to Toolbox > Sources > XML/JSON File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the XML/JSON File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the XML/JSON File Source object, right-click on the header and select Properties from the context menu.
When you select the Properties option from the context menu, a dialog box will open.
This is where you can configure properties for the XML/JSON File Source object.
The first step is to provide the File Path and Schema Location for the XML/JSON Source object. By providing the file path and schema, you are building the connectivity to the source dataset.
Check the JSON Format checkbox if your source file is a JSON.
Check the Provide Inner XML checkbox to get the XML markup representing only the child nodes of the parent node.
Note: In this case we are going to be using an XML/JSON file with Orders sample data in the parent node and Order Details sample data in the child node.
Once you have specified the data reading options in this window, click Next.
On the XML Layout window, you can view the layout of your XML/JSON source file.
After you are done viewing the layout, click Next. You will be taken to a new window, Config Parameters. Here, you can define the parameters for the XML/JSON File Source.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
After you have configured the source object, click OK.
You have successfully configured your XML/JSON File Source object. The fields from the source object can now be mapped to other objects in a dataflow.
A PDF Form Source object provides users with the functionality of extracting data from a fillable PDF document. A fillable PDF document comprises of certain data points or digital fields which are editable by a user using any modern PDF viewers. They are often used instead of official documents on the web. The PDF Form Source object detects those points, extracts the written data, and creates relative fields for them.
In this article, we will explore how to make use of the PDF Form Source object in Astera to retrieve data.
Select the PDF Form Source object from the Toolbox and drag-and-drop it onto the dataflow designer.
Right-click on the PDF Form Source object’s header and select the Properties option from the context menu.
A configuration window will open, as shown below.
Provide the File Path for the fillable PDF document.
Owner Password: If the file is protected, then enter the password that is configured by the owner of the fillable pdf document. If the file is not protected, this option can be left blank.
Use UTF-8 Encoding: Check this option if the file is UTF-8 i.e., Unicode Transformation Format – 8-bit, encoded.
Click Next.
This is the Layout Builder window, where you can see the data fields extracted from the fillable PDF document. Click Next.
This is the Config Parameters window. Click Next.
This is the General Options window. Click OK.
Right-click on the PDF Form Source object’s header and select Preview Output from the context menu.
View the data through the Data Preview window.
The data is now available for mapping. For simplicity, we will delete the non-required data fields and store the output in a separate file. To store the data, we must write it to a destination file.
We are using a Delimited Destination object. Drag-and-drop the Delimited Destination object onto the dataflow designer and map the fields from the PDF Form Source object to the destination object.
Right-click on the fields that you do not want to store and select the Remove Element option.
Simply double-click or right-click on the Delimited Destination object’s header and select the Properties option from the context menu. Specify the File Path where you want to store the destination file. Click OK.
To preview the data, right-click on the destination object’s header and select Preview Output from the context menu.
Here, you can see data of the selected fields.
This is how a PDF Form Source object is used in Astera Data Stack to mine data point/digital fields from fillable PDF documents.
Report Model extracts data from an unstructured file into a structured file format using an extraction logic. It can be used through the Report Source object inside dataflows in order to leverage the advanced transformation features in Astera Data Stack.
In this section, we will cover how to get the Report Source object onto the dataflow designer from the Toolbox.
To get a Report Source object from the Toolbox, go to Toolbox > Sources > Report Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Report Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Report Source object, right-click on its header and select Properties from the context menu.
A configuration window for Report Source will open.
First, provide the File Path of the unstructured file (your report) for which you have created a Report Model.
Then, specify the File Path for the associated Report Model.
Click OK, and the fields added in the extraction model will appear inside the Report Source object with a sub-node, Items_Info, in our case.
Right-click on the Report Source object’s header and select Preview Output from the context menu.
A Data Preview window will open and shows you the data extracted through the Report Model.
The Email Source object in Astera enables users to retrieve data from emails and process the incoming email attachments.
In this section, we will cover how to get the Email Source object onto the dataflow designer from the Toolbox.
To get an Email Source object from the Toolbox, go to Toolbox > Sources > Email Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Email Source object onto the designer.
You can see some built-in fields and an Attachments node.
Double-click on the header of the Email source object to go to the Properties window
A configuration window for the Email Source object will open. The Email Connection window is where you will specify the connection details.
Url: The address of the mail server on which the connection will be configured.
Login Name: The Hostname
Password: Password of the user.
Port: The port of the mail server on which to configure. Some examples of SMTP provider ports are; 587 for outlook, 25 for Google, etc.
Connection Logging: Connection logging is used to log different types of messages or events between the client and the server. In case of error or debugging purposes, the user can see them.
Astera supports 4 types of Connection Logging methods:
Verbose: Captures everything.
Debug: Captures only the content that can be used in debugging.
Info: Captures information and general messages.
Error: Captures only the errors.
If you have configured email settings before, you can access the configured settings from the drop-down list next to the Recent option. Otherwise, provide server settings for the mailing platform that you want to use. In this case, we are using an Outlook server.
Test your connection by clicking on Test Connection, this will give you the option to send a test mail to the login email.
Click Next. This is the Email Source Properties window. There are two important parts in this window:
Download attachment options
Email reading options
Check the Download Attachments option if you want to download the contents of your email. Specify the directory where you want to save the email attachments, in the provided field next to Directory.
The second part of the Email Source Properties window has the email reading options that you can work with to configure various settings.
Read Unread Only – Check this option if you only want to process unread emails.
Mark Email as Read – Check this option if you want to mark processed emails as read.
Folder – From the drop-down list next to Folder, you can select the specific folder to check, for example, Inbox, Outbox, Sent Items etc.
Filters - You can apply various filters to only process specific emails in the folder.
From Filter: Filters out emails based on the sender’s email address.
Subject Filter: Filters out emails based on the text of the subject line.
Body Filter: Filters out emails based on the body text.
Click OK.
Right-click on the Email Source object’s header and select Preview Output from the context menu.
A Data Preview window will open and will show you the preview of the extracted data.
Notice that the output only contains emails from the email address specified in the Filter section.
Constant Value Transformation returns a single, predefined value for all records in a dataset.
In this example, we have an excel worksheet containing Employees’ data. The data is being written to a database table. The database table contains an additional field that stores department information. We want to pass a constant value to the Department field. To do this, we will use the Constant Value transformation object in Astera. Constant Value transformation will be mapped to the Department field to append the name of the department with the final output.
Map the Employees dataset from the source object to the destination table, Employees_Database.
Now, drag-and-drop the Constant Value transformation object from Toolbox > Transformations > Constant Value.
Right-click on the Constant Value transformation object and select Properties.
The Constant Value Map Properties window will now open. Here you will see a Constant Value section where you can write any value to be appended to your output dataset.
In this case, the value will be ‘Marketing’, to specify the department of the employees in the source dataset.
Click Next. A General Options window will now open. Click OK.
General Options window:
This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages
When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors
When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.
Your transformation object will now look like this:
Now map the Value field from the Constant Value object with to the Department field in the Employees_Database destination object.
Right-click on the Employees_Database destination object and click Preview Output.
Your final output will look like this:
The Department field from our source dataset has been successfully appended with specified records in the final output, through the use of a Constant Value transformation object.
Apache Parquet is a column storage file format used by Hadoop systems such as Pig, Spark, and Hive. The file format is language-independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension of .parquet.
The key features of Parquet with respect to Astera Data Stack are:
It offers the option of compression with a lesser size post-compression.
It encodes the data.
It stores data in a column layout.
In Astera Data Stack, you can use a Parquet file in which the cardinality of the data is maintained, i.e., all columns must have the same number of fields.
Drag and drop the Parquet File Source from the Sources section of the Toolbox onto the dataflow designer.
Right-click on the Parquet File Source object and select Properties from the context menu.
This will open a new window.
Let’s have a look at the options present here.
File Location
File Path: This is where you will provide the path to the .parquet file.
Data Load Option
If you wish to control memory consumption and increase read time, then the Data Load option can be used.
Batch Size: This is where the size of each batch is defined.
Advanced File Processing: String Processing
Treat empty string as null value: Checking this will give a null value on every empty string.
Trim strings: Checking this box will trim the strings.
Once done, click Next and you will be led to the Layout Builder screen.
The layout will be automatically built. Otherwise, you can build it using the Build Layout from layout spec option at the top of the screen.
Once done, click Next and you will be taken to the Config Parameters screen.
This allows you to further configure and define dynamic parameters for the Parquet source file.
Click Next and you will be taken to the General Options screen.
Here, you can add any comments that you wish to add.
Click OK and the Parquet File Source object will be configured.
You can now map these fields to other objects as part of the dataflow.
Integer
Time/Timestamp
Date
String
Float
Real
Decimal
Double
Byte Array
Guid
Base64
Integer96
Image
Hierarchy is not supported.
This concludes our discussion on the definition and configuration of the Parquet File Source object in Astera Data Stack.
The Filter transformation object in Astera is used to filter out records based on some pre-defined rule. The records that match the specified rule are filtered out and can be mapped further in the dataflow whereas the records that do not satisfy the specified condition are omitted. The rule or logic to filter out records from data can either be chosen from an extensive library of built-in functions or you can write one of your own.
Filter transformation is quite similar to Data Quality Rules in its functionality. However, unlike Data Quality Rules, which return an error or warning when the rule condition fails while still passing the record downstream, the Filter transformation will completely filter out any such records. The filtered records, as well as their status will not be accessible to any downstream object on the dataflow, including any type of log.
In this case, we have Customers data for a fictitious organization stored in a Delimited file source. We want to filter out the records in which:
Country = Germany
Contact title = Owner
To filter out these records from our source data, we will use Filter transformation object and write the relevant expression in the Expression Builder to achieve our desired output. We will then write our filtered output to a Fixed length destination.
Next, drag and drop the Filter transformation object from the Transformations section of the Toolbox to the designer and map fields from the source object to the Filter transformation object.
Right-click on the Filter transformation object and select Properties.
A Layout builder window will now open. Here you can modify fields and the object layout.
Click Next. This will take you to the Filter Transformation Properties window. Here, you can see the following three sections:
Functions: An extensive library of built-in functions organized in various categories. From here, you can select functions according to your requirement.
Expression: The filter expression will appear in this Expression box. You can either write your own expression or choose from the built-in functions library.
Objects: Contains the object layout. You can double click on any element in the layout to write it in the Expression field.
In this example, we want to filter out records of customers with the ContactTitle, ‘Owner’, and Country, ‘Germany’. For this, we will write the following expression in the Expression box:
Country = “Germany” and ContactTitle = “Owner”
After writing the expression, click on the Compile button to check if the expression is correct. If the Compile Status is Successful, the expression is correct. If not, then you need to check and rectify your expression before proceeding to the next window.
Click Next. This will take you to the Config Parameters window where you can further configure and define parameters for the Filter transformation object.
Click Next to proceed to the General Options window. This window consists of options common to most objects in a dataflow, such as:
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then feed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect statistics.
The Comments input allows you to enter comments associated with this object.
The Filter transformation object has been configured. Click OK.
To preview filtered records, right-click on the Filter transformation object and select Preview Output.
The output would look like this:
You can now write your output to a destination or further transform it by applying some transformation.
Rename your destination object by double-clicking its header. Here, we will rename it as German_Customers.
This concludes using the Filter transformation object in Astera.
The Join transformation object joins records from two record sets. The join functionality is similar to standard SQL joins, but the distinguishing advantage of Astera's implementation is that you can join records from any two sources and not just two database tables.
This article covers how you can use Join transformation in Astera.
Suppose we have two database tables - Customers and Orders, as shown in the screenshot below, and we want to join these two tables.
Let’s see how we can join the two tables using the Join transformation object in Astera:
Drag-and-drop the Join transformation object from the Transformations section in the Toolbox. To open the Toolbox, go to View > Toolbox.
Map the fields from the source objects to the Join transformation object.
To set the properties for the Join transformation, double-click on the object or right-click and select Properties.
The first window is the Layout Builder window. You can manage the layout for your transformation (add or remove fields) from this window. Click Next to go to the next window.
The next window is the Relation Join Transformation Properties window. Select the Join Type from the drop-down menu. Astera supports four types of joins:
Inner Join – Joins records from two record sets based on matching values in key fields. Any unmatched records are discarded.
Left Outer Join – Similar to Inner Join, but unmatched records from the left record set (also called ‘first record set’) are preserved, and null values are written for the unmatched record in the right record set (also called ‘second record set’).
Right Outer Join – Similar to Inner Join, but unmatched records from the right record set (also called ‘second record set’) are preserved, and null values are written for the unmatched record in the left record set (also called ‘first record set’).
Full Outer Join - similar to Inner Join, but unmatched records from either record set are preserved, and null values are written for the unmatched record in the other record set.
Other options in this window:
Join in Database: Check this option if you want to join the tables in the database.
Case Sensitivity: Check this option if you want a case sensitive match of the values in the key fields.
Sort (Left/Right) Input: Specify whether the left input, the right input, or both, need to be sorted.
Select the key fields from the Left Field and Right Field drop-down lists. Click Next, then *OK.
You can now preview the output and see the consolidated data.
This window consists of options common to most objects in a dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then feed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect statistics.
The Comments input allows you to enter comments associated with this object.
The List Lookup transformation object is a type of lookup that stores information in the metadata. Which means that your lookup data is stored in the dataflow itself. List Lookup uses a list of values for both the input and output fields. You can use it to look up certain values in your source data and replace them with other desired information. Or you can define a list of values in the lookup grid in properties, and the value is then looked up in the grid when you run your dataflow.
Let’s see how this object functions in Astera.
In this example, we are working with a Customers Fixed-Length File Source that contains customer information for a fictitious organization. The Customers data contains information about customers belonging to different countries. We want to convert the country names in this data into CountryCodes by using the List Lookup transformation object.
To preview the incoming data, right-click on the source object’s header and select Preview Output.
To start working, drag-and-drop the List Lookup object from Toolbox>Transformations>List Lookup.
This is what a List Lookup object looks like:
Map the field from the source dataset you want to look up values for, to the Value field in the List Lookup object.
Now, right-click on the List Lookup object and select Properties from the context menu. The List Lookup Map Properties window will now open.
Here, the first option we have is the Case Sensitive Lookup checkbox which is always checked by default. When this option is checked, the List Lookup will look up values on a case sensitive basis. If you do not want to perform a case sensitive look up, you can uncheck this option.
Next, you can see that there is a table where we can specify the Source Value and the Destination Value. Source Values are the values from your source data, and Destination Values are values which you want to replace with the source values.
For example, if we write the Destination Value as ‘DE’, against the Source Value ‘Germany’, Centerprise will write ‘DE’ in place of ‘Germany’ in the output.
This is one way of specifying the lookup values. However, there can be a lot of source values and typing them manually can be a tedious task. There is a more efficient way of doing this in Centerprise.
If you right-click on the List Lookup object, you can see that there is an option called Fill Lookup List with Unique Input Values.
Selecting this option prepopulates the source values in the Source Value column with unique source values.
Now, all you have to do is type in the Destination Values, that is, the codes corresponding to each country name.
Once you have populated the lookup list, click Next to proceed to the Lookup Options window.
In case, the lookup field does not return any value for a given source value, one of the following options should be selected:
No Message – Will not mark the unmatched source value as an error or warning
Add Error – The List Lookup table will trigger an error for the records that found no match in the lookup field
Add Warning – The List Lookup will generate a warning and return a null value for records from the source that do not have any matches in the lookup table
Additionally, when the value is not found in the lookup list, you can choose from the following options to assign it a value:
Assign Source Value – Will return the original value from the source.
Assign Null – Will return a null value for each unmatched source record.
This Value – You can type in a specific value in the given field, and the List Lookup will return the same value for each unmatched source value.
In this example, we will add an error and return the source value if the lookup value is not found. We will select the Add Error and Assign Source Value options. You can choose your preferred option and click *OK.
Now, if we preview the output, you can see that for each country name from the source table, the List Lookup has returned a corresponding code value.
These CountryCodes will flow through the annotated output port if you want to write your data to a destination.
This is how we can map the lookup values to a target or a transformation in the dataflow using the output port.
This concludes using the List Lookup transformation object in Astera.
Astera Data Stack gives the user the ability to use a MongoDB Source as part of the ETL pipeline. MongoDB is a fully cloud-based application data platform.
It is also a NoSQL platform that provides a mechanism of storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
It can be configured in accordance with the user application in Astera.
To start, drag and drop the MongoDB Source object from the Sources section of the toolbox onto the dataflow.
To configure the MongoDB Source object, right-click on it and select Properties from the context menu.
This will open the Properties screen.
User Name: The name of the user connecting to the MongoDB cluster.
Password: The password of the user connecting to the MongoDB cluster.
Primary Server Name: The address of the primary server cluster for connection.
Database: The database to be selected from the MongoDB server.
Authentication Database: The database used for authentication.
Enable set of replica: Allow the server to access the secondary cluster in case of unavailability of the primary server.
Use TLS: Check this option if you are using TLS authentication.
Secondary Server Name: The address of the secondary server cluster for connection.
Read Preference –
This drop-down menu allows the user to select which server to be given preference to while reading the data.
Primary: Means that data will only be read from the primary server.
Secondary: Means that data will only be fetched from the secondary server.
Primary Preferred: Means that preference will be given to the primary server but in case of its unavailability, it will fetch data from secondary server.
Secondary Preferred: Means that preference will be given to the secondary server but in case of its unavailability, it will fetch data from the primary server.
Nearest: Means that the preference will be given to the server closest to the connection in region and IP.
Once the credentials have been filled, you can test the connection by selecting Test Connection.
Once done, click Next and you will be led to the MongoDB Collection screen.
Here, you can pick a collection that you wish to fetch the data from using the Pick Collection drop-down menu.
Once the collection is selected, the layout will be built.
There are three ways to generate the layout,
Astera auto generates the layout with respect to the first 100 records by default.
The user can provide a JSON schema and Astera will generate the layout.
The user can manually create the layout.
Once the layout has been built, click Next and you will be led to the MongoDB Filter screen.
Here, you can provide a query to filter out your records based on some criteria.
Click Next and you will be taken to the MongoDB SortBy screen.
Here, you can set the limit to fetch the specified number of records or provide a number to skip the first ‘n’ number of records.
You can also sort your data based on a single field or a collection of fields.
Click Next and you will be taken to the Config Parameters screen.
Parameters can provide deployment of flows by eliminating hardcoded values and can also provide a dynamic way of changing multiple configurations with a simple value change.
Click Next and you will be faced with the General Options screen.
Here, you can add any comments that you wish to add.
Click OK and the MongoDB Source object will be configured.
The source data can now be further used in an ETL pipeline with transformation or destination objects.
This concludes our discussion on the MongoDB Source object in Astera Data Stack.
You can create many-to-one mappings with the help of a Denormalize transformation object in Astera. Denormalizing, also known as pivoting, allows you to combine a number of records into a single record (simply stating, it brings data from rows to columns). It is useful for reducing the number of tables in the schema, which simplifies querying and possibly improves reading performance.
The TaxInfo source data contains information about TaxType (City Tax, County Tax, State Tax, and Federal Tax), Tax Amount, and SSN (Social Security Number) of taxpayers.
We want to reduce the number of rows and create separate fields for City tax, County tax, State tax, and Federal tax.
Let’s see how we can use the Denormalize transformation object to achieve this.
First, we will use the Sort object to sort our source data based on the key field, SSN in our case.
Drag-and-drop the Denormalize transformation object from the Transformations section in the Toolbox.
Right-click on the Denormalize transformation object and select Properties from the context menu.
Following are the properties available for the Denormalize transformation object:
Layout Builder Window:
The Layout Builder window is used to add and/or remove fields, as well as to select their data type. The fields added in the Layout Builder will show in the Output node inside the object, as well as in all Input nodes corresponding to the number of mapping groups created (see below), with the exception of the key field(s).
Denormalize (Many-to-One) Transformation Properties Window:
Select Keys: Using the Select Keys dropdown, select the field or fields that uniquely identify the record. These keys will be used to match records between the normalized source and the denormalized destination.
Sort Input: Check this option only if values in the matching field (or fields) are not already sorted.
Driver Field Value: Enter the pivot values for your Denormalize transformation object. Using the example below, the pivot values would be City, State, Federal, and County.
General Options Window: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.
The Comments input allows you to enter comments associated with this object.
After you have configured the properties, click OK.
An Input mapping node will be created for each value previously specified in the Driver Field Value grid.
Map the fields and preview the output to view the denormalized data.
The Distinct transformation object in Astera removes duplicate records from the incoming dataset. You can use all fields in the layout to identify duplicate records, or specify a subset of fields, also called key fields, whose combination of values will be used to filter out duplicates.
Consider a scenario where we have data coming in from an Excel Workbook Source and the dataset contains duplicate records. We want to filter out all the duplicate records from our source data and create a new dataset with distinct records from our source data. We can do this by using the Distinct transformation object in Astera Data Stack. To achieve this, we will specify data fields with duplicate records as Key Values.
In order to add a separate node for duplicate records inside the Distinct transformation object, we will check the option: Add Duplicate Records. Then we will map both distinct and duplicate outputs to a Delimited File Destination.
Let’s see how to do that.
Drag-and-drop an Excel Workbook Source from the Toolbox to the dataflow as our source data is stored in an Excel file.
To apply the Distinct transformation to your source data, drag-and-drop the Distinct transformation object from the Transformations section in the Toolbox. Map the fields from the source object by dragging the top node of the ExcelSource and to the top node of the Distinct transformation object. To do this, go to Toolbox>Transformations>Distinct.
Now, right-click on the Distinct transformation object and select Properties. This will open the Layout Builder window where you can modify fields (add or remove fields) and the object layout.
Click Next. The Distinct Transformation Properties window will now open.
Data Ordering:
Data is Presorted on Key Fields: Select this option if the incoming data is already sorted based on defined key fields.
Sort Incoming Data: Select this option if your source data is unsorted and you want to sort it.
Work with Unsorted Data: When this option is selected, the Distinct transformation object will work with unsorted data.
On this window, the distinct function can be applied on the fields containing duplicate records by adding them under Key Field.
Right-click on Delimited Destination object and click Preview Output.
Your output will look like this:
To add duplicate records in your dataset check the Add Duplicates Output option in the Distinct Transformation Properties window.
When you check this option, three output nodes would be added in the Distinct transformation object.
Input
Output_Distinct
Output_Duplicate
Now, map the objects by dragging the top node of ExcelSource object to the Input node of the Distinct transformation object.
Distinct output:
Duplicate output:
As evident, the duplicate records have been successfully separated from your source data.
The Data Model Query object in Astera Data Stack allows you to extract multiple tables from a deployed data model. This is especially useful when you’re writing data to a fact table via the Fact Loader object, since the fact table contains attributes from multiple source tables.
In this article, we’ll be looking at how you can configure the Data Model Query object and use it to extract data from a source model.
Let’s assume that we have the following source model.
In this example, we’ll extract all of these tables as a source via the Data Model Query object.
To get the Data Model Query object from the toolbox, go to Toolbox > Sources > Data Model Query.
Drag and drop the Data Model Query object onto the dataflow designer.
The object is currently empty because we are yet to configure it.
To configure the Data Model Query object, right-click on its header and select Properties from the context menu. Alternatively, you can double-click on the object header.
A configuration window will pop up.
Using this window, you can configure the properties of the Data Model Query object.
On the Database Connection screen, you’ll notice that the Data Provider dropdown menu is limited to just one option: Astera Data Model. This option represents the data models that are deployed on the server and are available for usage.
Once you’ve provided your Astera Data Stack credentials and a server connection, you can select a deployed model from the Database dropdown menu.
In this case, we’ll select DW_Source, which represents the source model that was shown earlier.
Once you’ve chosen a deployed model, click Next.
The Query Source Layout screen will appear.
On the Query Source Layout screen, you can select a root entity from a list of entities present in the source model, via the Root Entity dropdown menu.
The root entity serves as a starting point for a tree layout that includes all of the entities that you need to extract data from.
In this case, the root entity in the source data model is InvoiceLines.
Once you’ve chosen a root entity, a tree/hierarchical layout starting from the root entity will appear on the left side of the screen. You can expand the inner nodes to reveal the fields present in other entities of the source model.
Checking the Allow Collections option will enable collection nodes in the tree layout.
In the Where Clause textbox, you can add an optional SQL statement that will serve as a filter for incoming records.
Click OK once you’ve chosen a root entity. You’ve now configures the Data Model Query object. The tree layout, starting from the root entity, will be visible in the object.
The fields present in this layout can now be mapped further to other objects in the dataflow.
This concludes our discussion on the Data Model Query object.
The Aggregate transformation object provides the functionality to create aggregations of your dataset, using aggregate functions such as Sum, Count, First, Last, Min, Max, Average, Var or Standard Deviation. The dataset can be split into groups so that the aggregate value(s) can be generated for the group instead of the whole dataset. For example, calculate product count by month of year, or get average sales price by region and year.
Aggregate Transformation can be applied to unsorted data or data sorted on group by values. When applied to an input stream that is sorted on group by fields, Aggregate Transformation performs substantially better and consumes very little memory. Alternatively, when applied to unsorted datasets, Aggregate Transformation may consume substantial memory resources for large data sets and may slow down the performance of the server.
In this scenario, we have products data stored in a csv file. The source file contains information such as ProductID, Supplier ID, UnitPrice of the various products, QuantityPerUnit of products available etc. We want to derive the following information from our source data:
Number of products per category
Total price of all the products per category
Minimum price per category
Maximum price per category
We will use the Aggregate Transformation object to derive the required information.
To work with the Aggregate Transformation, drag-and-drop the Aggregate Transformation object from Toolbox > Transformations > Aggregate.
Right-click on the transformation object and select Properties. The Layout builder window will now open.
Here, you can write names of fields that you want to map to the transformation object in the Name column and specify the relevant Aggregate Functions for them.
For this case:
CategoryID: We will select the Group-By option from the Aggregate Function drop-down list for this field as we want to group the records based on the available product categories.
ProductID: For this field we will select the Aggregate Function Count, in order to calculate the number of products per category.
UnitPrice: We will map this field thrice.
To calculate TotalPricePerCategory, select the function Sum function.
To calculate MaxPricePerCategory, select the Max function.
To calculate MinPricePerCategory, select the Min function.
Click on Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
Click on Next. The Config Parameters window will now open, where you can further configure and define parameters for the Aggregate transformation.
7. Click Next. This is the General Options window. Click OK.
General Options Window:
This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages
When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors
When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.
The Comments input allows you to enter comments associated with this object.
After you have configured the properties, click OK.
You will see the fields in the object that were added in the Layout Builder window.
10. Map the data fields from the source object to the transformation object. You can auto-map the entire dataset from the source to the transformation object, or only map selected fields that you want to work with. In this case, we will map CategoryID, ProductID and UnitPrice as those are the fields we want to find aggregations for*.*
Right-click on the Aggregate transformation object and click Preview Output.
You will see that the specified Aggregate Functions have been applied.
The Expression transformation object in Astera defines an expression or logic to process an incoming value (or values). As a result, it may return a new set of values which do not depend on any user-provided input data. Expressions can also be modified and used as variables for various other calculations.
The Expression Transformation object uses an expression as a logic to transform data. You can write an expression of your own or use different functions or operations from Astera’s extensive library of built-in functions, such as string manipulations, data conversion, date and time manipulation, etc. You can also perform various operations such as mathematical calculations and comparisons, etc. using the Expression transformation object.
In this example, we have a sample dataset, Customers, stored in an Excel file. The address information in this source is split into multiple fields such as Address, Region, Country, and PostalCode. We want to concatenate the information in these fields and return it as full address in a new data field. For this, we will use Expression transformation object.
Drag-and-drop the Expression transformation object from Toolbox>Transformations>Expression onto the designer.
Map the fields to be concatenated from the source object to the Expression transformation object. In this example, we have mapped the Address, Region, City, Country and PostalCode fields.
Now right-click on the Expression transformation object and select Properties from the context menu.
This will open the Layout Builder window where you can add or remove fields and modify your layout.
These are the following options on the Layout Builder window:
Name: This is where the field name is specified. You can change the name of existing fields if required.
Data Type: Specifies data type of the mapped fields.
Input: When checked, the field will be mapped as an input, with an input mapping port, to take data input from a source.
Output: When checked, the field will be mapped as an output. If an expression is present, the expression will be applied to this output.
Variable: Turns the field into a variable which can then be applied to other fields. These expressions are calculated first and then assigned to other expressions using it. Once a field turns into a Variable, it cannot be assigned for input or output mapping.
Expression: This is where the expression used for modifying the field or group of fields is specified.
Since we want to write the address details from multiple fields into a single field, let’s create a new field named Full_Address, and specify the Data Type as String and check the Output option.
You will find the following options in the Expression Builder window:
Functions: An extensive library of built-in functions from where you can select any function according to your requirement.
Expressions: Here, you can write an expression rule or choose one from the built-in functions in Astera.
Objects: In this panel, you can find all the fields in your layout listed under the Expression node. You can double click on any field name to map it to your expression.
In this example, we can either use a concatenate function from the built-in functions or write an expression of our own to return the complete address information in a single field.
Address + ‘ ‘ + Region + ‘ ‘ + City + ‘ ‘ + Country + ‘ ‘ + PostalCode
You can now see your expression appear in the Expression field. Click Next.
A General Options screen will now open where you can add Comments and specify other General Options. Once you’re through with these general settings, click OK.
General Options screen: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example, a destination file that will capture record messages or a log that will capture messages and collect statistics as well.
In the Comments input section, you can write comments related to the object.
To preview the output, right-click on the Expression transformation object and select Preview Output from the context menu.
Here’s a preview of the concatenated output:
You may rename your Destination object from the context menu options for this object. Here, we will rename it as Full_Address.
This concludes using the Expression transformation object in Astera.
What's New? Release Notes
Setting up
Dataflows
Workflows
Subflows
Data Model
Report Model
API Flow
Project Management & Scheduling
Data Governance
Functions
Use Cases
Connectors
Miscellaneous
FAQs
Upcoming...
Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Astera’s layout. You can create a to store values for alternate headers. You can also use the Synonym Dictionary file to facilitate automapping between objects on the flow diagram that use alternate names in field layouts.
Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Astera layout. You can create a to store the values for alternate headers. You can also use Synonym Dictionary file to facilitate automapping between objects that use alternate names in field layouts.
You can also delete the data fields in the destination file by using the Layout Builder. Or map only the relevant fields onto the nodes of the destination object. You can refer to this to learn more about the Delimited Destination object.
Drag-and-drop an from the Sources section in the Toolbox.
Select a destination object from the Destinations section in the Toolbox. This is where the transformed data will be written and stored. We will use a .
Retrieve your source data by using the relevant source object from the Sources section in the Toolbox. In this case, our sample Customers data is stored in a delimited file so we will be using a object to retrieve this data in the dataflow.
In this case, we will write our output to a .
Drag-and-drop a from the Sources section in the Toolbox to the designer window.
You can now write the Distinct output to a destination object. In this case, we will write our output into a object.
You can now write the Output_Distinct and Output_Duplicate nodes to two different destination objects. In this case we will write our output into a object.
From the sources section in the Toolbox, drag-and-drop a object to the dataflow designer.
Retrieve your data in the dataflow by using one of the source objects from the Sources section in the Toolbox. In this example, we will work with an .
You can either write an expression directly in the Expression box in the Layout Builder window, or you can build an expression in the Expression Builder. To access the Expression Builder, click on option. An Expression Builder window will open.
You can now map your output to a destination or some other transformation in the dataflow . In this example, we will be writing our output to a .
The Reconcile Transformation object in Astera enables users to identify and reconcile new, updated, or deleted information entries within the existing data source. It can be applied in a wide variety of business scenarios that require a user to identify changes in multiple data records and capture them efficiently to drive critical business decisions.
Consider an example where we have a sample data of complaints filed by customers regarding the products and services provided by a company. Assume that source file 1 contains details and status of complaints on January 1st, 2018, and source file 2 contains details and status of complaints on February 1st, 2018. We want to track the progress of the resolved complaints during that one month.
To do so, we will reconcile the information contained in the source data files and capture changes using the Reconcile Transformation object.
Drag-and-drop the Reconcile Transformation object from Toolbox> Transformations> Reconcile on the data flow designer.
This is what a Reconcile Transformation object looks like:
You can see the transformation object contains three child nodes (Output, Input_1, and Input_2) under the parent node, Reconcile.
Expand the input nodes to map fields from the source files.
Map the data fields from the source objects that you want to reconcile to the respective input node in the Reconcile Transformation object.
Right click on the Reconcile Transformation object’s header and select Properties.
This will open the Reconcile Transformation Properties window where you will see the following options:
Case Sensitive – Check this option, if you want to derive a case sensitive output
Sort Input 1 – Check this option, if the incoming data from source 1 is not sorted
Sort Input 2– Check this option, if the incoming data from source 2 is not sorted
You can choose the Reconcile Output Type from the following options:
Side By Side Element With Change Flag – If you want to get values from both sources presented side by side, with a separate column presenting the reconciled output by putting a flag – true, in case of an update, and false if it remains unchanged.
Original Layout – If you want to get the reconciled output for each record and corresponding information in the reconciled field.
Original Layout With Changed Element Collection – Applies when working with hierarchical data, to reconcile the information contained in child nodes.
Once you have selected the preferred Output Type, you can specify the records to be shown in the output by applying the Record filter and Inner Node Filter. You may choose one, multiple, or all of the following options by check marking the box.
Click Next to proceed to the Layout Builder window. Here you will have to specify a Key. This will be the common identifier in both the source files that will identify and reconcile records. In this case, we want to reconcile the progress on complaints made against each complaint_ID; therefore, we will select Complaint_ID as our Key.
Now go to the Survivor Value drop-down list to set the Survivor Value for each data field. Survivor Values are the values from your source datasets which you want to retain in the output.
You may select from the following Survivor Value options:
Second – If you want to derive the output value from the second source
First – If you want to derive the output value from the first source
First If Not Null, Otherwise Second – If you want to output a value from the first source if the record is not null, otherwise from the second source.
Second If not Null, Otherwise First – If you want to output a value from the second source if it is not null, otherwise from the first source.
Higher – If the input values are integers, and you want to choose the higher value
Lower – If the input values are integers, and you want to select the lower value
Expression – If you want to derive the output value based on a formula expression
Click Next to proceed to the General Options window, then click OK.
General Options window - This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages - When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors - When this option checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect their statistics.
Now, right-click on the Reconcile Transformation object’s header and select Preview Output to get the reconciled output.
You will get one of the following outputs according to the output type selected in the Reconcile Transformation Properties window.
Side by Side Element with Change Flag
Original Layout
Original Layout With Changed Element Collection
Reconcile Transformation objects can be applied in a variety of business cases, particularly those where monitoring the changes in assorted data records is crucial in driving critical business decisions. Here are some of the benefits and uses of the Reconcile Transformation object:
Reconciles data by deriving old and new values for specific fields in the source data
Allows users to choose from various layout options to reconcile changes in the most appropriate way
Works effectively with structured and unstructured (hierarchical) data formats
Offers the flexibility to select the information to be retained through different survivor value options
Merge transformation in Astera is designed to merge data fragments from disparate sources, based on some predefined logic, and present it in a consolidated form to draw actionable insights.
Let’s assume that there is an organization that maintains customers’ data in two different departments – Marketing and Sales. Marketing department stores information in a database table and Sales department maintains an Excel sheet for storing customers information. We want to merge the information from both the sources, so that we have consolidated data.
The Merge transformation object merges data from a single source only, we will first combine both the records using the Union transformation object. We will then map fields from the data sources to the Union transformation object and add a new field DataSource to keep track of which information is coming from which source.
Drag the Merge transformation object from the transformations section in the Toolbox and drop it on the dataflow designer.
This is what a Merge transformation object looks like:
Map the Union transformation object’s output to the Merge transformation object.
Right-click on the Merge transformation object to set up transformation properties in the Layout Builder window. This is what the Layout Builder window looks like:
In the Layout builder window, specify the Primary Key. This is a common identifier that identifies similar records from various sources and merges the information against these records.
(Since we are consolidating different customer records, we will set up CustomerID as the Primary Key in this case.)
Next, you have to specify the field to be used as Version. If your data is coming from multiple sources, the Version field shows which source the data is coming from in the final merged output. In this case, we will use the Data Source field we added in the Union transformation object as the Version field.
Next, specify the Survivor Type for each field. Survivor Type allows you to choose the survivor values – the values you want to retain from your data sources – for each field. Survivor Types are set as First by default. However, depending on your case, you can choose the Survivor Type from the following options:
Survivor Type
Description
First
Returns data from the first data source for that field
Last
Returns data from the last data source for that field
Maximum
Returns the maximum value from all available input data sources
Minimum
Returns the minimum value from all available input data sources
Count
Returns the total count number of all values that exist in the field
Sum
Aggregates the values that exist in that field in all the input sources and then returns the arithmetic sum of those values
Comma Separated Values
Separates the values that exist in that field in all the input sources with a comma and then return that representation. This option is only available when the output field is assigned the 'String' Data Type.
Since CustomerID, CompanyName, and ContactName records are common in both the source files (Customers_Marketing and Customers_Sales), we will set the Survivor Type as First for these fields. For the other fields with missing records, the Survivor Type will be set as follows:
Field
Survivor Type
ContactTitle
First
Address
First
City
First
Region
Last
PostalCode
First
Country
First
Phone
Last
Fax
Last
DataSource
Comma Separated Values
Once you have set the Survivor Type, specify Precedence for each field. Precedence is the order in which you want the source data to be assessed. For instance, we have common data fields in both the sources, but different and missing records. We can set appropriate Precedence values to bring data from the desired data source.
Next, you can set a specific Condition, and the Merge transformation will process records based on the criteria specified for a particular field.
(In this case, we have specified ‘IsNotNull’ for Address and Region fields since we want none of these fields to be empty or have missing records.)
Depending on the requirements of the business case, you can add a logical expression in the Expression field to process the incoming data value and transform it into the output according to the logic defined. The Expression field can be used for mathematical and financial calculations, date and time manipulations, comparisons and conversion functions.
Click Next to proceed to the Merge Transformation Properties window. Here, you will see the following three checkboxes:
Case Sensitive – Check if data is to be assessed on a case-sensitive basis
Sort Input – Check if the incoming data is not already sorted
Version Order Descending – Check if you want the data to be sorted in a descending version order
Click Next to proceed to the General Options window. Here, you can add Comments, instructions, or any relevant information about the transformation. This will not change or alter your transformation action in any way.
You may also skip this step by clicking OK in the previous step (on the Merge Transformation window) to close the Transformation Properties window.
To get the output, right-click on the Merge transformation object, and click on Preview Output. You will get the merged records based on your specified transformation properties.
Data Preview before applying Merge transformation:
Data Preview after applying Merge transformation:
Merge transformations can be applied in cases where data is sorted into different records. Centerprise makes it extremely convenient for users to get consolidated data that is stored in different sources, while also allowing them the flexibility to choose how the output should appear, through the various transformation properties.
The Sequence Generator Transformation object in Astera is used to add sequences of integer values to a dataflow. The sequences can start with any number and have any step, for example, 50, 55, 60, 65 etc.
The Astera Data Stack can either create a sequence instantly at the dataflow’s run-time (this is called in-memory sequence), or it can read sequence control data from a database table as your dataflow is executed.
In the case of in-memory sequences, a sequence always starts at the Start Value provided in the SeqGenerator: Context Information Properties window. In the case of database sequences, the last value used is recorded in the control database, and a new start value is used every time the sequence is invoked.
This makes it possible to generate ever-increasing values for the sequence each time the dataflow runs. In effect, such a sequence is a chain of sequences with non-overlapping values.
Here, we have retrieved data from the an Orders table using a Database Table Source object. We will use the Sequence Generator Transformation object to generate a sequence for the OrderNo field in our source data. Let’s see how this works.
Drag the Sequence Generator Transformation object from the Transformations section in the Toolbox and drop it on to the dataflow designer.
Map the required fields from the source object to a destination object.
To configure the properties of the Sequence Generator Transformation object, right-click on its header and select Properties from the context menu.
This will open the Context Information Properties window.
In this window, you can choose between three different types of sequence generations and specify the Sequence Details.
A description of these three methods is given below:
In Memory: The sequence will be created in memory at the dataflow run-time. The sequence always starts at the specified Start Value in the sequence properties.
Sequence Details:
Start Value – The initial value for the sequence
Step – The increment value
Database Table: The sequence control information for the database table can be managed within Astera through the Manage Sequences option.
Connection: Specify the connection to the database where the sequences will be stored
Sequence: Select the sequence from the list of available sequences in database.
Note: To manage database sequences, go to Menu > Tools > Sequences.
Batch Size: Specifies the minimum number of values to be allocated to the sequence.
Use Memory Sequence during preview: Prevents the user from breaking the sequence cycle during a data preview by substituting with an in-memory sequence, which does not affect (i.e. increase) the database sequence’s current value.
Sequence Object - The sequence control information is read from a special database table coming from SQL Server or Oracle database.
Connection: Specify the connection to the database that stores your sequences.
Sequence: Select the sequence from the list of available sequences.
Use Memory Sequence during previews: Prevents the user from breaking the sequence cycle during a data preview by substituting with an in-memory sequence.
Let’s specify the Sequence Details as follows:
Start Value: 0
Step: 1
In the destination object, a new field will be created where the sequence generator value will be mapped.
The NextVal field will be mapped to the OrderNo field in the destination object.
You can see the output of the excel destination object in the Data Preview window.
The sequence has been generated in the new field, OrderNo.
This is how the Sequence Generator Transformation is used in Astera.
A Route transformation object invokes one or more paths in the dataflow, in accordance with some decision logic expressed as a set of Rules. Using the Route transformation object, you can create some custom logic involving multiple paths and adapt it to suit your data processing scenario.
For example, a record passing some rule will be written to Destination1, a record passing another rule will be written to Destination2, and a record which fails to pass any rules can still be written to a Destination, and be fed to a downstream transformation object.
To add a Route transformation object, drag the Route object from the Transformations section in the Toolbox, and drop it on the dataflow designer.
An example of what a Route object might look like is shown below:
To configure the properties of a Route object after it was added to the dataflow, right-click on the object’s header, and select Properties from the context menu. The following properties are available:
Layout Builder window:
The Layout Builder window allows you to add or remove fields in the field layout, as well as select their data type. The fields added in the Layout Builder will be added to the Input node inside the object box, as well as in all Rule nodes corresponding to the number of rules created (see below).
Route Transformation Properties window:
The Route Transformation Properties window allows the interface to manage Route rules.
Type a descriptive name for the rule in the Description field.
Click Compile to check for any syntax errors in your rule. The Compile Status should read “Successful” for a successful compilation.
To activate the rule, check the Active checkbox.
General Options window: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect their statistics.
The Comments input allows you to enter comments associated with this object.
An example of using the Route transformation object is shown below.
The Passthru Transformation object creates a new dataset based on the elements that were passed to the transformation. This is useful for organizing datasets for better readability and grouping of values that are otherwise calculated over and over again (e.g. a Sequence Generator Transformation).
In this document, we will learn how to use the Passthru Transformation object in Astera.
The source file contains customers’ information in the parent node and their order and shipping details in the collection/child node.
Preview data by right-clicking on source object’s header and select Preview Output.
A Data Preview window will open, showing you the preview of the hierarchical data.
Now, we want to create a field in the parent node that contains the count of orders that arrived late for each customer and write these records to a destination file. This new field in the parent node will depend on two fields, RequiredDate and ShippedDate, that are already present in the collection/child node.
For this purpose, we will use the Passthru Transformation object.
To get a Passthru Transformation object from the Toolbox,* go to Toolbox > Transformations > Passthru and drag-and-drop the Passthru object onto the designer.
You can see that the dragged transformation object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map source fields to the transformation object by dragging-and-dropping the top node of the source object, SampleData to the top node of transformation object, Passthru.
Now the mapping is done. Let’s configure the Passthru Transformation object.
Right-click on the transformation object’s header and select Properties from the context menu. A configuration window for the Passthru Transformation object will open. The first window is the Layout Builder window. This is where we can create or delete fields, change their name or data types, mark any field as a variable, or attach expressions to any field.
Create a new field under the parent node. Let’s call it LateOrders and mark it as an Output field.
Click Next, and you will be directed to a Layout Modifications window.
Select the LateOrders field. Here you can modify any field by applying expressions to it.
Write the expression that counts the number of late arriving orders. Click “…” , located next to the expression box to open the Expression Builder window where you can make use of Astera’s extensive library of built-in functions and expressions.
Click on the Orders node and a panel (Options for Collection) will appear. These options are only available for collection nodes. This is where we will specify the rule for routing late arriving orders. Select Route Based on Rules and a new section for adding rules will appear on the screen.
Add a new rule by clicking on the Add switch condition icon
Now, write an expression to route late arriving orders and name this rule as “LateArrivals”.
Click OK. Now observe that a new collection node, Orders_LateArrivals, has been added to the Passthru Transformation object.
To preview data, right-click on the header of the transformation object and select Preview Output from the context menu.
A Data Preview window will open. On expanding records, you will get corresponding order details and appended details of late arriving orders.
To store the output for late arriving orders, write it to a destination file.
Configure settings for Excel Workbook Destination object.
Click on the Start Dataflow icon located in the toolbar at the top of the designer window to create the destination file.
Upon clicking this icon, an excel file will successfully be created. You can find the link to this file in the Job progress window.
The Normalize transformation object in Astera Data Stack is used to create one-to-many mappings. It allows users to create multiple records from a single record by transposing the columns in a dataset into rows. In other words, you can take a dataset that has many columns and turn it into one that has many rows.
In this use case, we have a sample Taxpayers Excel dataset that contains information on the types and amounts of taxes paid by taxpayers. This includes taxpayers’ Social Security Number (SSN) and the different types of taxes that they have paid. These types are divided into different fields, such as City, County, State, and Federal, with each column containing the amount paid by each customer for a particular tax type. Our goal is to reduce the number of fields and increase the number of records by specifying the tax type in one consolidated field. To do this, we will use the Normalize object in Astera.
Drag the relevant source object from the Toolbox and drop it onto the designer. In this case, we will select the Excel Workbook Source object from Toolbox > Sources > Excel Workbook Source and configure it so that it reads data from the Taxpayers’ dataset.
To preview the data, right-click on the object header and select Preview Output from the context menu. Here is a look at the dataset:
Drag the Normalize object from Toolbox > Transformations > Normalize and drop it onto the designer.
You will notice that the object header contains one Output node and two Input nodes by default.
Any field mapped as a new member to one Input node will appear in all of the input nodes as well as the Output node. In this case, we will map the SSN field from the source object to an Input node.
Right-click on the header of the Normalize object and select Properties from the context menu.
A dialogue box will appear.
This dialogue box is used to configure the Normalize object.
In the Layout Builder window, create the layout of your normalized dataset by providing field names. In this case, we have already mapped SSN from the source and will create two new fields, one for the TaxAmount and the other for the TaxType.
Once you have created the layout, click Next.
In the Normalize (One to Many) Transformation Properties window, make appropriate selections for the following options:
Number of Mapping Groups: Here, you can specify the number of mapping groups that are required. Increasing this number from 2 will also increase the number of Input nodes in the object. In this case, there are four tax types. Hence, we will increase the number to 4.
Omit Record If this Element is Null: From this drop-down menu, you can select a field from your layout. If an element in this field is null, the entire record containing that null element will be omitted from the dataset. In this case, we will select , the default option, which denotes that this option will not apply to any field.
Once you have made the required selections, click Next.
On the last window, which is the General Options window, you will be provided with an empty text box for Comments. Moreover, you can also select a few options that are common to most objects in Astera.
Clear Incoming Record Messages: When this option is checked, any messages coming in from the preceding object will be cleared.
Do Not Overwrite Default Values with Nulls: When this option is checked, actual values are not overwritten with null values in the output.
In this case, we will leave the options unchecked. Once you are done, click OK.
Now that you have configured the Normalize object, you will notice that new input nodes have been added to the object based on our selection for the Number of Mapping Groups option. Each node contains the layout we specified in the Layout Builder window.
The next step is to make the required mappings from the source object to the Normalize object. These are the mappings needed for this particular use case:
Map SSN from the Excel Workbook Source object to SSN in all four input nodes of the Normalize object.
Map City to TaxAmount in the first input node, County to TaxAmount in the second input node, State to TaxAmount in the third input node, and Federal to TaxAmount in the fourth input node.
Map the City Field Name to TaxType in the first input node, the County Field Name to TaxType in the second input node, the State Field Name to TaxType in the third input node, and the Federal Field Name to TaxType in the fourth input node. To map field names, right-click on the mapping link, hover over Change Map Type, and select Field Name.
Here is what the final dataflow should look like:
Preview the output to have a look at the normalized dataset.
You can map these fields further to other objects in the dataflow using the output node of the Normalize object.
This concludes using the Normalize object in Astera.
The Tree Transform object in Astera enables users to transform data in a hierarchical structure. Users can create new fields in the parent node based on the data in the collection/child node. Tree Transform supports rule-based routing, filtering, merging and sorting of data while maintaining its hierarchy.
In this document, we will learn to use the Tree Transform object in Astera.
The source file contains customers’ information in the parent node and their orders and shipping details in the collection/child node.
You can preview this data by right-clicking on source object’s header > Preview Output.
A Data Preview window will open, displaying a preview of the hierarchical source data.
Now, we want to create a new field in the parent node that contains the count of orders that arrived late for each customer and route these records to a destination file while maintaining the hierarchical format of the dataset.
This new field in the parent node will depend on two fields: RequiredDate and ShippedDate, that are already present in the collection/child node.
In other words, we are trying to transform a hierarchical data without flattening its structure. For this purpose, we will use the Tree Transform Transformation object.
To get a Tree Transform object from the Toolbox, go to Toolbox > Transformations > Tree Transform and drag-and-drop the Tree Transform object onto the designer.
The transformation object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map source fields onto the transformation by dragging and dropping the root node of the source object CustomerOrders onto the root node of the transformation object – TreeXform.
Now that the mapping is done, let’s configure the TreeXfrom object.
Right-click on the object’s header and select Properties from the context menu. A configuration window for Tree Transform will open. The first window is the Layout Builder window. This is where we can create or delete fields, change their name or data types, mark any field as a variable or attach expressions to fields.
Create a new field under the parent node. Let’s name it LateOrders.
Click Next, and you will be directed to a Layout Modifications window.
Select the LateOrders field. Here you can modify any field by applying expressions to it.
Write the expression that counts the number of late arriving orders.
Click “…” next to the expression box to open the Expression Builder where you can make use of the extensive library of built-in functions and expressions in Astera.
This creates the field values for LateOrders based on the field values of RequiredDate and ShippedDate fields.
Click on Orders node and a panel (Options for Collection) will appear. These options are only available for collection nodes.
Options for Collection:
Show – Shows collection node in theTree Transform object.
Flatten With Item Count – Flattens data based on the record count in the collection node against each item in the parent node.
Flatten Based on Rules – Flattens a part of hierarchical data based on predefined rules.
Route Based on Rules – Routes and creates subsets of data based on predefined rules.
Merge With Item Count – Merges data based on the record count in the collection node against each item in the parent node.
Hide – Hides collection node from the Tree Transform object.
Calculation Formula – An expression box used for writing rules to route or flatten hierarchical data.
Sort Based On Keys – Sorts hierarchical data based on the field in the collection node. Only available for Show and Flatten With Item Count options.
This is where we will specify the rule for routing late arriving orders. Select Route Based on Rules. A new section for adding rules will appear in the window.
Now, write an expression to route late arriving orders and name this rule “LateArrivals”.
Click OK. Now observe that a new collection node Orders_LateArrivals has been added to the Tree Transform object.
To preview data, right-click on the header of the transformation object and select Preview Output from the context menu.
A Data Preview window will open. On expanding the records, you will get corresponding order details and appended details of late arriving orders.
To store the output for late arriving orders, you can write it to a destination file or use that data further in the dataflow.
This concludes using the Tree Transform transformation object in Astera.
The Sort Transformation object in Astera is used to sort an incoming data stream. It also provides the option to remove duplicate values from the input.
It is a blocking transformation which means that input records are accumulated until the end of input. Blocking transformations affect the performance of overall dataflow because subsequent steps cannot be executed until all the records have been received and processed by the blocking transformation.
The Sort Transformation uses storage on the server for temporary data during sorting. The server must have enough capacity to store the entire data set and index.
We have retrieved the OrderDetails data from a database table. The dataset contains fields such as OrderID, ProductID, UnitPrice, Quantity, and Discount. This data is unsorted and we want to sort it in the ascending order of UnitPrice.
Drag the Sort Transformation object from the Transformations section in the Toolbox and drop it on the dataflow designer.
Map fields from the source object to the Sort Transformation object.
To configure the properties of the Sort Transformation object, right-click on its header and select Properties from the context menu.
A Layout Builder window will appear.
In this window you can either:
Add Member Objects or Collection Objects to the layout.
Edit the elements of the Sort object. The Layout Builder allows you to add or remove fields in the layout, as well as select their data type. The fields added in the Layout Builder will be added to the Input node inside the object box. Once you’re done making changes to the layout, click Next.
The next window is the Sort Transformation Properties window.
Here, you can specify the sorting criteria. You will see the following options on this screen:
Return Distinct Values Only: Check this option if you want to remove duplicate values from the output.
Treat Null as the Lowest Value: Check this option if you want a null value to be returned first in the ascending sort order, and conversely, have the null value returned last in the descending sort order.
Case Sensitive: Check this option if you require case sensitive comparison for strings.
On the same screen, you need to select the sorting Field from the drop-down list and set the Sort Order as Ascending or Descending.
The last window is the General Options window. Here you can add Comments or specify some General Options. Once done, click OK and the window will close.
You can now map the Sort Transformation object to a destination and preview the output.
The output now shows the entire source data sorted in the ascending order of UnitPrice.
This is how the Sort Transformation can be used in Astera.
The Switch Transformation object matches source data for the criteria specified by the user, and wherever the criteria is met, it replaces the information in the particular field with the desired output (also specified in the layout). This gives users more control over their data and helps them manage it in a better way.
There are two modes in Switch transformation:
Basic Mode
Enhanced Mode
The Basic Mode in the Switch transformation object matches for specific values in the source data and replaces it with the desired output. Enhanced Mode enables users to set lookup criteria by writing expressions and thus makes the feature more flexible.
Select your source by dragging the relevant object from the Sources section in the Toolbox on to the dataflow designer and configure the connection by putting in relevant details.
After setting up the connection and configuring the source file, drag the Switch transformation object from the Toolbox. If the Toolbox is hidden, go to View > Toolbox > Transformation > Switch.
Map the required fields from the source to the Switch transformation object.
Either double-click on the Switch Transformation object to open the Properties window or right-click on the object and go to Properties from the list.
The first window is the Layout Builder window. Here you can manage the fields (add and/or remove the fields) to make your Switch field layout.
After specifying the layout and selecting the relevant output field, click Next. This will take you to the Switch Map Properties window. At this stage, you can select the mode of the Switch transformation and assign the rules in the Case Value and Output Value sections.
Astera will look for the values specified in the Case Value column in the source file and replace it with the corresponding values in the Output Value column.
In this example, the source table contains information about departments in numbers. We will use the Switch transformation object in basic mode, to switch the stored numeric information with the descriptive information.
Data Preview (Before Switch)
Data Preview (After Switch)
Steps 1-5 are going to remain the same even when working with the Enhanced Mode in Astera.
After you have created the layout in the Layout Builder window in the object’s properties, click Next and go to the Switch Map Properties window and select Enhanced Mode.
An organization stores information about employees’ salaries. They have set criteria about issuing credit cards, which is dependent on individual’s salary. In this scenario, to see which individual is eligible for which perk, define the salary range in the Case Expression field and specify the corresponding output in the Output Expression section (See the screenshot above). To store the information in a separate field, we created a new field (CreditCard) in the Layout Builder and selected it as the Output.
Data Preview (Before Switch)
Data Preview (After Switch)
The Subflow transformation object is used to call a Subflow that will run as part of your Dataflow. The Subflow acts like a wrapper for the objects it contains. Subflows can be seen as ‘black boxes’ inside your Dataflow, simplifying and streamlining the Dataflow design, increasing reusability, achieving an easier-to-understand view of your Dataflow, and possibly eliminating the need to know what is going on inside the Subflow so that you can focus on the output it creates. Over time, if the logic inside your Subflow changes, you can modify the Subflow, and the modified Subflow can now be used by the Dataflow calling your Subflow.
Subflows can be nested, meaning that a Subflow can call other Subflows. The output of the Subflow can be fed into downstream objects on your Dataflow, just like the output of any Dataflow object.
To add a Subflow transformation object, drag the Subflow object from the Transformations section in the Toolbox and drop it on to the dataflow designer.
An example of what a Subflow object might look like is shown below.
To configure the properties of a Subflow object after it was added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Subflow Properties window:
Enter the file path of your subflow in the Path input. Using UNC paths is recommended to allow for remote execution of your dataflow on a server.
General Options Window: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object, and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect their statistics.
The Comments input allows you to enter comments associated with this object.
Creating a Subflow is similar to creating a regular Dataflow, because a Subflow is essentially a Dataflow or a sub-Dataflow. The difference between the two, however, is that a Subflow may optionally have an input and output.
The Subflow input makes it possible to feed data into the Subflow from an upstream object on the Dataflow that calls the Subflow. The Subflow’s output is used to send data to the downstream Dataflow object connected to the Subflow.
To create a new Subflow, go to File > New > Subflow on the main menu.
Designing a Subflow is similar to designing a Dataflow. For more information on working with Dataflows, see the Creating Dataflow chapter.
When a Subflow tab is active, the flow Toolbox has an additional group labeled Subflow. This group has two objects that control the input and output properties of your subflow.
Subflow Input object is a connector controlling the input layout of your Subflow. Any data feeding into the Subflow will pass through the Subflow Input when the Subflow is called by a Dataflow or another Subflow.
To add the Subflow Input, drag the Subflow Input object from the Subflow group in the Toolbox and drop it on to the Subflow designer.
To configure the properties of a Subflow Input object after it is added to the Subflow, right-click on it and select Properties from the context menu. The following properties are available:
Layout Builder screen:
Meta Object Builder screen allows you to add or remove fields in the field layout, as well as select their data type. The fields added in Meta Object Builder will show in the SubflowInput1 node inside the object box.
General Options screen:
This screen shares the options common to most objects on the dataflow.
Subflow Output object is a connector controlling the output layout of your subflow. Any data leaving the subflow will pass through the Subflow Output when the subflow is called by a dataflow or another subflow.
To add the subflow output, drag the Subflow Output object from the Subflow group in the Flow toolbox and drop it on to the subflow.
To configure the properties of a Subflow Output object after it was added to the Subflow, right-click on it and select Properties from the context menu. The following properties are available:
Layout Builder window:
The Meta Object Builder window allows you to add or remove fields in the field layout, as well as select their data type. The fields added in the Meta Object Builder will show in the SubflowOutput1 node inside the object box.
General Options window:
This screen shares the options common to most objects on the Dataflow.
Some examples of using Subflows are shown below:
The Tree Join Transformation object in Astera enables users to create complex, hierarchical data structures such as EDI or XML documents with ease. Unlike the standard relational join which combines left and right elements to create a new record, the Tree Join Transformation object allows users to create collection and member nodes. It also enables users to join datasets in parent-child hierarchies using a key field. It is a set level transformation that operates on a group of records.
In this document, we will learn to use the Tree Join Transformation object in Astera.
In this use case we have two different source datasets. The first data contains information about Customers and has fields such as CustomerName, CustomerID, Address, etc.
The second dataset contains details of Orders placed by customers. It includes fields such as OrderID, CustomerID, and order details such as RequiredDate, ShippedDate and other shipping details.
We will join these two datasets using the Tree Join Transformation object and create a hierarchical dataset in which all orders placed by a customer along with the order details are represented in a parent-child hierarchy.
Contrary to the regular Join that joins two datasets in a flat layout, the Tree Join Transformation joins data from two different datasets into a hierarchical data structure.
In this use case, each record from the first dataset that contains Customer details will be a parent node, and under the parent node, the details of Orders placed by that customer will be returned in a child node.
1. To get the Tree Join Transformation object from the Toolbox, go to Toolbox > Transformations > Tree Join and drag-and-drop the Tree Join Transformation object onto the designer.
2. Now map fields from the Customer source dataset to the TreeJoin object.
3. Right-click on the Tree Join object header and go to Properties from the context menu. In the Tree Join Layout Builder window, you can see the fields from the Customer dataset listed under the root node.
4. Next, click on the TreeJoin node, you will see that the small icons or buttons at the top of the screen will become active. If you click on the icon, you will get two options:
Add Member Object – To add a new member node to your layout
Add a Collection Object – To add a new collection node under the parent node. It will return all corresponding records as a collection under the parent node.
In this case we will Add a Member Object to create a separate record for each order placed by a customer, under a separate customer record node.
5. Add a Member Object to this root node. A dialogue box will open to name your member object.
In this case, let’s name it ‘Orders’ and click OK. A member object has been added to our parent node.
6. Click OK, to close the properties window. Now map the Orders dataset to the member node that we created in the previous step to complete the layout.
7. Go to the properties of the TreeJoin object again. We have already created the layout, so we will proceed to the next window.
8. In the TreeJoin Transformation Properties window, we must specify the Join Key.
The join key is a common field or a common identifier in both the datasets which will identify and join records in a tree-like structure. The parent and child fields are the same fields which are common in both the source datasets and serves as a key identifier to join records.
Parent Field – Join field from the first dataset.
Child Field – Same field as the parent field, selected from the second dataset.
In this case, the CustomerID field is common in both the datasets, so we will use it as the join Key.
9. Click on the Parent field dropdown button. Expand the TreeJoin node and select the CustomerID field.
10. Click on the Child field column and expand the TreeJoin root node. Scroll down to your member node, expand this node and select the CustomerID field from the second dataset.
Let’s discuss the other options on the properties window:
Join In Database – Lets you join the tables in the database itself rather than in-memory. However, it applies only when both the tables are sourced from the same database.
Case Sensitive – To process and join records on a case sensitive basis.
11. We have our layout and the TreeJoin properties ready, click OK.
12. Right-click on the TreeJoin object and select Preview Output.
The TreeJoin object has returned the customer records in parent nodes. Upon expanding the node, you can see the order placed by the customer listed as its member unit under the parent node.
If we choose to Add a Collection Object in the Layout Builder, all the records for orders placed by a customer will be returned in a collection under a single parent node for each customer.
13. The joined dataset can now be written to a desired destination. In this case we will write it to an XML File Destination object.
This concludes using the Tree Join Transformation object in Astera.
You can download the file for this use case from the following link:
The File Lookup Transformation object in Astera is used to look up values coming from a source. It uses an excel or delimited file which contains the lookup values as well as the output values to perform the lookup. File lookup can be performed based on a single lookup field as well as a combination of fields.
Similarly, a File Lookup Transformation object can return a single output field from a lookup table or a combination of fields. In either case, the output field or fields are returned from the records in which the lookup values match the incoming values.
In this scenario, we have a Customers dataset from a fictitious organization stored in a database source. It contains information of customers from different countries. We want to replace the country names by country codes in our database, by switching them with the lookup values (country codes) stored in an excel file. To achieve this, we will use the File Lookup Transformation object.
Select the relevant source object from the Sources section in the Toolbox. In this example, we will use Customers data stored in a database table.
Right-click on the source object’s header and select Preview Output. You can see the country names in the Country field which we want to convert into country codes.
Drag-and-drop the File Lookup Transformation object from Tooblbox>Transformations>FileLookup onto the dataflow designer.
Now, let’s configure the Transformation object.
Right-click on the header of File Lookup Transformation object and select Properties.
A File Lookup Map Properties window will open where you can see an option for Source File Type.
Source File Type: Here, you need to specify the type of your lookup file.
You can perform the task using an Excel or Delimited lookup file.
Select the Source File Type from the dropdown menu. In this example, our country codes are stored in an Excel file so we will specify the Source File Type as Excel.
Click Next to proceed to the File Lookup Details window. You will see two options:
File Location: Here, you need to specify the File Path to the lookup source file.
Options:
First Row Contains Header: You can check this option if your lookup file contains a header in the first row. Otherwise, you can leave it unchecked.
Worksheet: If your lookup file contains multiple worksheets, you can select the worksheet you want to use to perform the lookup.
Click Next to proceed to the Layout Builder. Here, you can make changes to the object’s layout by modifying the existing fields or creating new fields.
Once you are done, click Next.
On the next window, you will see various Lookup Options.
If Multiple Values Are Found
Multiple Matches Found Option: This option provides the flexibility to choose the output value, if more than one matches are found for a single value in your lookup file. The option expands into a drop-down list where you can select one from the following three options:
Return First: Will return the first matched value found.
Return Last: Will return the last value among all the matched values.
Return All: Will return all the values in the lookup file that match a source value.
If Value Is Not Found In The Lookup List:
In case no lookup value is found against a source value, you can choose one of the following three options to be appended with your output.
No Message: There will be no message and the output will be the same as the input value.
Add Error: An error message will appear with the output.
Add Warning: A warning will appear with the output.
If Value Is Not Found In The Lookup List, Assign Value:
If no lookup value is found against a source value, you can assign an output value of your choice.
Assign Source Value: Will return the source value in the output.
Assign Null: Will assign null to your output value.
This Value: You can select this option and assign any value of your choice.
Click Next. This will take you to the Config Parameters window, where you can further configure and define parameters for the File Lookup Transformation object.
Once you have configured the File Lookup Transformation object, click OK.
Map the Country field from source object to the Country field in the File Lookup Transformation object. Now map the Code field from the transformation object to the Country field in the Database Table Destination object.
This is what your dataflow will look like:
Map the remaining fields from the source object to the destination object.
Right-click on the destination object’s header and select Preview Output.
You can see that the country names in the database table have been successfully converted into country codes.
This concludes using the File Lookup Transformation object in Astera.
The Union Transformation object in Astera is used to combine incoming data from two or more inputs into a single output. Its functionality is similar to a union operator in SQL query. It has multiple input nodes and a single output node. It puts together two sets of data irrespective of the repetition that might occur in the datasets. In order to perform this transformation on two datasets, their cardinality must be same.
To work with a Union Transformation object, drag-and-drop the Union Transformation object from Toolbox > Transformations > Union onto the dataflow designer.
Map the Customers_Marketing data to Input_1 and Customers_Sales data to Input_2 in the Union Transformation object.
Now, right-click on the Union Transformation object’s header and select Properties.
The first window is the Layout Builder window, where you can customize your layout or modify your fields. You can also provide a default value to be used in place of null values.
Add a new field, name it Category and specify its Data Type as String.
Click Next.
Next is the Union Transformation Properties window where there are two input nodes defined as Input_1 and Input_2 by default. You can also rename them if you want. You can also define any number of input nodes based on the number of datasets you want to combine using the Union Transformation object.
Click OK.
Now, map the categories of respective departments from the Variables resource object to the Category field in the Union Transformation object. This is done to identify which department a particular record is coming from.
Now, we have successfully configured the Union Transformation object.
Right-click on the Union Transformation object’s header and select Preview Output.
You can see that the Union Transformation has successfully combined the two datasets into a single, unified dataset.
You can now further transform your dataset or write it to a destination.
This concludes working with the Union Transformation object in Astera.
The Data Cleanse Transformation object is a new addition to Astera's library of transformations. It makes it all the more convenient for business users to cleanse raw data and present it in a more refined, standardized, and enterprise-ready format. Using the Data Cleanse Transformation object, users can clean up data from null values and redundant text and characters, and prepare raw data for transformation, validation, profiling, and record matching functions.
Now drag the Data Cleanse Transformation object from the Transformations section in the Toolbox and drop it onto the designer.
This is what a Data Cleanse transformation object looks like.
Map data from the source object to the Data Cleanse Transformation object. You can either auto-map the entire data set or map a few fields manually.
Now you have to specify the criteria for data cleansing. Right-click on the Data Cleanse Transformation object and select Properties from the context menu.
This will open a new window where you have to set up the properties for data cleansing. The first window is the Layout Builder window. Here you can customize the layout of your dataset by adding, removing or renaming fields. Once you have created the layout, click Next to proceed to the next window.
This is where you set up the data cleanse criteria for your source data.
You can find various data cleanse options arranged in different sections. Let’s explore them one by one.
The options provided within this category allow you to remove values, spaces, tabs, and line breaks from your data. You can find the following options within this category:
All whitespaces – Removes all whitespaces from the data
Leading and trailing whitespaces – Removes whitespaces preceding and succeeding the values
Tabs and line breaks – Removes tabs and line breaks within source values
Duplicate whitespaces – Removes double spaces from the data
Letters – Removes all alphabets from the data
Digits – Removes all digits from the data
Punctuation – Removes all punctuation from the data
Specified Character – Removes any specific character from the data
As the name suggests, the options within this category allow you to replace null values inside a string or numeric field with a corresponding value – blank in case of a string, and zero, in case of a numeric field.
Null strings with blanks: Replaces all null strings with blanks
Null numerics with zeros: Replaces all null numeric values with zeros
The Find and Replace options enable users to replace a value in the source dataset with another value.
It also provides users the option to choose whether the find and replace function is to be performed on a case sensitive basis. You can select a search mode from three options:
Normal – Will perform a normal find and replace function
As in this example, we want to change the status from ‘Planned’ to ‘Scheduled.’
So, we will type in ‘Planned’ in the Find field and ‘Scheduled’ in the Replace field.
Now, if we look at the output, we can see that the Data Cleanse Transformation Object has found and replaced the status values from ‘Planned’ to ‘Scheduled.’
Extended – Allows you to search for tabs(\t), newline(\r\n), or a character by its value (\o, \x, \b, \d, \t, \n, \r and \) and replace with the desired value
In the example below, we want to replace whitespaces within our source values with a hyphen (-).
So, we will type ‘\s’ in the Find field and ‘-’ in the Replace field.
Now, if we look at the output, we can see that the Data Cleanse Transformation object has found and replaced whitespaces from within the values with a hyphen.
Preview before applying the ‘’Extended’ Find and Replace" function.
Preview after applying the “Extended Find and Replace” function.
Regular Expressions – Allows you to find and replace a value based on a regular expression.
In the example below, we want to replace the “ALFKI” value(s) in the CustomerID field with “A1234”.
For this, we will write a regex in the Find field and the desired value in the Replace field.
Now, if we look at the preview, you can see that Astera has replaced values in the source data with the desired values.
Case options allow users to convert the letter case of source data to Upper, Lower, or Title case.
You can choose from the following options:
None – Keeps the letter case as is.
Upper – Changes all letters to upper case.
Lower – Changes all letters to lower case.
Title – Changes all letters to title case.
The Modify Data option provides you the flexibility and convenience of applying an expression to all fields in your data. Check the Run expression on all fields option to activate this feature.
The Run Expression on all fields feature was previously called ApplyToAll and offered as a standalone transformation in Astera 7.5 and previous releases. It had a limited functionality though, compared to the all-new Data Cleanse Transformation object, which is why it has been replaced altogether with the Data Cleanse Transformation object in Astera 7.6 and now in Astera 8.0.
The Run Expression on all fields feature is enabled by default for any existing flows created prior to Astera 7.6. This means that existing flows created on Astera 7.5 or a prior release will continue to work seamlessly on 8.0 or further upgrade and won’t require any modification at all.
Here, you can choose from the extensive library of built-in expressions and apply it to all mapped fields by adding it to a “$FieldValue” parameter.
As in this example, we have mapped a regular expression to the “$FieldValue” parameter.
Now if we look at the preview, you can see that Astera has applied the regular expression to all fields and removed whitespaces from the values.
Preview before running the expression on all fields:
Preview after running the expression on all fields:
This function was previously performed using the ApplyToAll transformation in Astera 7.5 and previous releases. However, now you can perform this and other data cleanse tasks using the Data Cleanse Transformation object.
Astera provides an array of source options to read and extract data from. Different source objects can be found in Toolbox > Sources.
In this article, we will discuss:
How various sources in dataflows can be used as a transformation.
Some common scenarios where you could use a source as a transformation.
While the basic function of source objects in dataflows is to extract data and bring it to the designer for further integration, a source object can also be used as a transformation function.
To use a source object as a transformation, you will need to:
Select the relevant source object from Toolbox > Sources and drag-and-drop it onto the designer.
Right-click on the source header and select Transformation from the context menu.
As soon as the Transformation option is selected from the context menu, the header color of the source object will change from green to purple. This is because, by default, Source objects in Astera's dataflows are indicated by a green header and Transformation objects are indicated by a purple header. Hence, the change in color.
Listed below are the source objects that can be used as a transformation:
Generally, source objects are used as transformations when the source file path is dynamic.
In the next section of the article, we will discuss how to use a Delimited File Source object as a transformation.
A Delimited File Source object can be used as a transformation when it is taking a dynamic file path; therefore, it will have multiple files of the same layout processing in a single dataflow or workflow.
Drag-and-drop the Delimited File Source object onto the designer.
Go to the object’s properties and provide the File Path for the delimited source file.
Once you have provided the File Path and configured the properties of the source object, click OK. Now, right-click on the header and select Transformation from the context menu, to change it to a Transformation object.
The header of the Delimited Source object will change to purple indicating that the source object is now converted into a Transformation object.
The transformed DelimitedSourceTrans object will now have two nodes:
Input node: To map the file path of the folder that contains delimited files that are to be processed.
Output node: On expanding the Output node, you will see the data fields in the delimited source file. Map these fields to other objects in a dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the Delimited source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, point the path to the directory and folder where the delimited files are located.
Map the FullPath field from FileSystem to the DelimitedSource object’s input node (FilePath).
Now our Delimited Source Transformation object is ready. To preview the data, right-click on the DelimitedSourceTrans object and select Preview Output.
Once you select Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed preview format. Click on the +icon right next to the root node of theDelimitedSourceTran object to expand the node and preview your data.
You now have an expanded version of your data:
Root Node: Object Path – DelimitedSourceTran
Sub Node:
Input: Displays the path of the file that is being used as the input for this data.
Output: Displays the fields in the source data.
This is how you use a Delimited File Source as a transformation.
Next, we will see how to use the Excel Workbook Source object as a transformation.
The Excel Workbook Source can be used as a transformation when you have multiple Excel files with the same layout, and want to process them together in a dataflow or workflow.
Drag-and-drop the Excel Workbook Source object onto the designer.
Go to the object’s properties and provide the File Path for the excel source file.
The header of the ExcelSource object will change to purple indicating that the ExcelSource object is now converted to a transformation object.
The transformed ExcelSource object will now have two nodes:
Input node:
FilePath: To map the path of the folder that contains excel files that are to be processed.
Output node: On expanding this node, you will be able to see the data fields in the excel source file. Map these fields to other objects in the dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the Excel source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, provide the path of the directory and folder where the excel files are located.
Map the FullPath field from FileSystem to the ExcelSource object’s Input node (FilePath).
Map the Value field from ConstantValue to the ExcelSource object’s Input node (Worksheet).
Now our Excel Source transformation object is ready. To preview the data, right-click on the ExcelSourceTrans object and select Preview Output.
On selecting Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed preview format. Click on the +icon right next to the root node ExcelSourceTran to expand the node and preview your data.
You will see the following nodes:
Root Node: Object Path – ExcelSourceTran
Sub Node:
Input: Gives the file path of the file that is being used as the input for this data.
Output: Displays the fields in the source data.
This is how you use an Excel Workbook Source as a transformation.
Now we will discuss how to use an XML/JSON File Source as a transformation in Astera.
The XmlJson File Source object can be used as a transformation when you have multiple XML or JSON files with the same layout, and want to process them in a dataflow or a workflow.
Drag-and-drop the XML/JSON File Source object onto the designer.
Go to the object’s properties and provide the File Path for the XML source file and its schema.
The header of the XmlJsonSource object will change to purple indicating the conversion from a source object to a transformation object.
The transformed XmlJsonSource object will now have two nodes:
Input node: To map the file path of the folder that contains XmlJson files that are to be processed.
Output node: Once expand, you will be able to see the data fields that are in the XmlJson source file. You can map these fields to other objects in a dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the XmlJson source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, provide the Path of the directory and folder where the XML/JSON files are located.
Map the FullPath field from FileSystem to XmlJsonSource object’s Input node (FilePath).
Now our XmlJson source transformation object is ready. To preview the data, right-click on the XmlJsonSourceTrans object and select Preview Output.
On selecting Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed form. To expand the data and preview your output, you need to click on the +icon right next to the root node – XmlJsonSourceTran.
You now have an expanded version of your data:
Root Node: Object Path – XmlJsonSourceTran
Sub Node:
Input: Gives the file path of the file that is used as the input for this data.
Output: Displays the fields in the source data.
This is how you use an XmlJson File Source as a transformation.
In the next section of the article, we will discuss how to use Report Source as a transformation in dataflows.
The Report Source object can be used as a transformation when you have multiple report models with the same layout, and process them in a dataflow or a workflow.
Drag-and-drop the Report Source object onto the designer.
Go to the properties and provide the File Path for the report source and its report model.
The header of the ReportSource object will change to purple indicating the conversion from a source object to a transformation object
The transformed ReportSource object will now have two nodes:
Input node: Map the file path of the folder that contains report files that are to be processed.
Output node: When expanded, you will be able to see the data fields that are in the report source file. You can map these fields to other objects in the dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the Report source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, provide the path of the directory folder where report files are located.
Map the FullPath field from FileSystem to the ReportModel object’s Input node (FilePath).
Now our Report Source Transformation object is ready. To preview the data, right-click on the report source object and select Preview Output.
On selecting Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed form. To expand the data and preview your output, you need to click on the +icon right next to the root node – ReportModelTran. And then to further expand the data, click on the +icon right next to the sub node – Output.
You now have an expanded version of your data:
Root Node: Object Path – ReportModelTrans
Sub Node:
Input: Gives the file path of the file that is being used as the input for this data.
Output: On further expansion it will show the fields/data that is there in the report model.
This is how you use the Report Source object as a transformation object.
Drag-and-drop the appropriate source objects and point them towards the files that you want to reconcile. In this example, we will be working with an
Drag-and-drop the relevant source objects from the Toolbox to the designer. (Click here to find how to .)
Note: We have the Orders table as our source from a . We will map the fields OrderDate, RequiredDate, ShippedDate, ShipVia and Freight to an object.
Click theicon to create a new rule.
In the Expression input, enter an expression for the rule. For example, LTV > 60 and LTV <= 80, or any rule or condition you want to apply to your data. Alternatively, you can click on the button to open the Expression Builder window - a tool that allows you to visually build your rule using Record tree and IntelliSense.
Add other Route rules if necessary. To delete an existing Route rule, select it and click the icon.
In this case, we are using an .
Right-click on the collection node, Orders_LateArrivals, and go to Write to > Excel Workbook Destination. An object will be added to the dataflow designer with auto-mapped fields from the collection field.
To learn how you can configure an Excel Workbook Source object, click
In this case, we are using the to extract our source data. You can download this sample data from here.
Add a new rule by clicking on the Add switch condition icon .
Note: In this case we will write the data to an .
Or you can expand the dropdown located in the main toolbar and select Subflow as shown below.
To know more about writing to an XML File Destination object, click .
In this example, we have a customers data from two different departments: Sales and Marketing, stored in two separate files. We want to combine this data into a single dataset using a Union Transformation object. To keep track of records coming in from each department, we will also add a new field, Category, in the layout of the Union Transformation object and pass the value using a Variables object.
Retrieve the data you want to cleanse using the relevant Source object. (Click to learn more about setting up Sources.)
Now click on this button to open the Expression Builder.
Look for the supported sources and data providers
For a detailed overview of different source objects in Astera's dataflows, see
Transformations in dataflows are used to perform a variety of operations on data as it moves through the dataflow pipeline. Astera provides an extensive library of built-in transformations enabling you to cleanse, convert, and transform data as per your business needs. Transformations can be found in Toolbox > Transformations. For a detailed review on transformations, see .
·
·
·
·
Once you have provided the file path and configured the properties of the object, click OK. Now right-click on the header and select Transformation from the context menu, to change it into a transformation object.
Worksheet: This option can be used when you have more than one worksheet in an excel source file and want to use any particular worksheet in the dataflow/workflow. This can be done by specifying the worksheet name using a object which you can find in Toolbox > Transformation > Constant Value.
Once you’ve provided both paths and configured the source object, click OK. Now, right-click on the header and select Transformation from the context menu to change it into a transformation object.
Once you’ve provided both the paths and configured the properties of the object, click OK. Now right-click on the header and select Transformation from the context menu, to change it to a transformation object.
The SQL Statement Lookup object in Astera is used to look up certain values that are mapped to it from a source object. It uses an SQL statement to access a table that contains the lookup values and their corresponding output values. Once the lookup is performed, the SQL Statement Lookup object returns either a single or multiple output fields, depending on the nature of the lookup table. Similarly, the lookup can be performed based on one lookup field or multiple lookup fields. When the incoming values match the lookup values, the output field or fields for those particular records are returned by the SQL Statement Lookup object.
In this use case, we will read data from the Customers table in the Northwind database using a Database Table Source object. This table contains customer information from a fictitious organization and will serve as the source table. Our purpose is to use an SQL Statement Lookup object to find some information about the orders placed by customers. This data is stored in a separate table called Orders, which will serve as the lookup table.
Drag-and-drop the Database Table Source object from Toolbox > Sources > Database Table Source onto the dataflow designer. Configure the object so that it reads data from the Customers table.
Now, drag-and-drop the SQL Statement Lookup Transformation object from Toolbox > Transformations > SQL Statement Lookup onto the dataflow designer, next to the source object.
Right-click on the header of the SQL Statement Lookup object and select Properties from the context menu.
This will open a new window.
Here, we need to configure the properties of the SQL Statement Lookup object.
In the Database Connection window, enter details for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you wish to connect to. The required credentials will vary according to your chosen provider.
Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A separate window will appear, showing whether your test is successful. When the connection has been successfully established, close it by clicking OK, and then click Next.
The next window will present a blank space for you to write an SQL statement. Here, you can enter any valid SELECT statement or stored procedure to read any table from the database that was specified earlier. This table will serve as the lookup table.
In this case, we will be reading data from the Orders table.
Enter the SQL statement and click OK. This will take you back to the dataflow designer.
As you can see, the SQL Statement Lookup object has been populated with all the fields present in the Orders table.
The next step is to choose an incoming field or multiple incoming fields from the source object, based on which the lookup action will be performed. This field needs to be mapped to the transformation object.
In this case, we can clearly see that CustomerID is a common element between the two tables. Hence, this field will be used to perform the lookup. It will be mapped from the Database Table Source object to the SQL Statement Lookup object as a new member.
Right-click on the transformation object’s header and select Properties to open the Properties window. Keep clicking Next until you reach the Layout Builder window. Here, you can customize the layout by modifying the existing fields or creating new fields.
Once you are done, click Next.
On the next window, you can define one or more lookup conditions. These conditions will determine what values are returned when the lookup is complete.
You will have to make appropriate selections from three drop-down lists:
Database Element Name: This list contains all the elements present in the SQL Lookup object. Select the element that you wish to use as a lookup field. In this case, it is CustomerID.
Operator: This list contains a set of operators that are used to define the condition. In this case, we will be using the ‘equals to’ operator because the lookup value is supposed to match the incoming value.
Input Element: This list contains the elements that have been mapped to the lookup object. In this case, the only input element available is CustomerID from the Customers table.
Once you are done defining the condition, click Next.
The next window will allow you to choose a Lookup Caching Type. The following options are available:
No Caching: No data will be stored in cache. This option is selected by default.
Static: The lookup values are stored in a cache. Once the cache is created, the lookup object will always query the cache instead of the lookup table. When you select this option, the following sub-options are enabled:
Fill Cache With All Lookup Values at Start: Fills the cache with all of the lookup values at the start and continues to use this cache for every lookup.
Cache After First Use: Uses the database table for the first lookup and fills the cache right after it is done. This cache is then used for every subsequent lookup. Checking this option enables another sub-option:
o Cache Commit Count: Defines the number of records collected per cache chunk before they are committed to the cache.
Persistent: Saves the lookup values in a cache file that can be reused for future lookups. When you choose this option, the following sub-options are enabled:
Rebuild Persistent Cache on Next Run: Checking this option will allow the contents of the cache file to be modified after every run.
Cache File Name: Here, you can enter a name for your cache file.
In this case, we will select the No Caching option. Once you are done, click Next.
On the next window, you will see multiple lookup options.
The page provides a set of options for different scenarios that could be faced during a lookup.
If Multiple Values Are Found
Multiple Matches Found Option: This option provides the flexibility to choose the output value if more than one matches are found for a single value in the lookup table. You can select one out of three options that appear in the drop-down list:
Return First: Returns the first matched value.
Return Last: Returns the last value among all matched values.
Return All: Returns all the matched values.
If Value Is Not Found In the Lookup List
If no lookup values are found for a source value, you can choose from the following options to be appended with the output:
No Message: The output value will be the same as the input value and no message will appear with it.
Add Error: An error message will appear with the output.
Add Warning: A warning message will appear with the output.
If Value Is Not Found in the Lookup List, Assign Value
If no lookup value is found for a source value, you can assign an output value of your choice.
Assign Source Value: Returns the source value in the output.
Assign Null: Returns null in the output.
This Value: Allows you to enter any value that will be returned in the output.
In this case, we want to look up the details for all of the orders placed by every customer. Hence, we will select Return All from the drop-down list in the Multiple Matches Found Option. This will automatically disable the rest of the options available on the screen.
Once you are done choosing the option, click Next.
On the next window, you can define certain parameters for the SQL Statement Lookup object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
In this case, we will be leaving them blank. Click Next.
On the last window, you will be provided with a text box to add Comments. The General Options in this window have been disabled.
You are now done configuring the SQL Statement Lookup object. Click OK.
Right-click on the SQL Lookup object’s header and select Preview Output.
You will able to see the following results:
Scroll down the Data Preview window to see the rest of the results.
The SQL Statement Lookup object has successfully returned the details for the orders placed by every customer in the Customers table (Source table) by comparing the CustomerID to its counterpart in the Orders table (lookup table).
This concludes using the SQL Statement Lookup Transformation object in Astera.
Astera Data Stack introduces an innovative AI Matching feature which leverages the power of Artificial Intelligence to perform intelligent matching. This feature works based on semantic similarity, ensuring more accurate and comprehensive matching results.
In Astera Data Stack, the AI Match object can be found in the Toolbox and can be used within the scope of the Dataflow.
For our use case, we have a Customers dataset from the sales department as shown below:
We want to replace the values in the Country column of the sales dataset by semantically matching them with Country values from the Customers dataset provided by the marketing team, ensuring both departments follow a unified naming standard.
To get started, let’s drag-and-drop an Excel Workbook Source object and configure it with the customers dataset provided by the sales department.
Next, drag-and-drop the AI Match object from the Toolbox onto the Dataflow and auto-map the fields from the Excel Workbook Source onto the AI Match object.
Once all the fields have been mapped, right-click on the AI Match object and select Properties from the context menu.
This will open the Layout Builder screen, which shows the layout of the incoming dataset. Click Next.
The AIMatch Transformation Properties screen will open, let’s configure these properties.
File Path: This is where we provide the path of the file on the basis of which we want to perform our semantic matching.
Worksheet: This is where we can define which Excel sheet data to use if there are multiple sheets.
Lookup Field: This is the field based on which we are performing the lookup.
Incoming Field: This lets us define the lookup field from the incoming dataset
For our use case, let’s select the Country Field for both.
Once done, click OK and right-click on the AI Match object to preview its output.
As you can see below, the values in the Country field have been semantically matched and replaced from the file, using AI. We can also see that, since the country Pakistan did not have a matching field in the marketing dataset, it hasn’t been replaced.
Now, let’s drag-and-drop a Database Table Destination object and map the matched data onto it.
Running this Dataflow will write the data to the destination table.
This concludes the working of the AI Match object in Astera Data Stack.
Astera’s Delimited File Destination provides the functionality to write data to a delimited file. Delimited File Destination gives you the ability to control the structure and content of file including numeric, date, and Boolean formats, encodings, text qualifiers (quotes), and character sets. You can choose to create a new file or append data to an existing file.
To get a Delimited File Destination object from the Toolbox, go to Toolbox > Destinations > Delimited File Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Delimited File Destination object onto the designer.
The dragged destination object is empty right now. This is because the object has not been configured yet.
For a Delimited Destination object to work, data fields should be mapped to the object so that the mapped data can be written to the destination.
Configure the source object and place it onto the designer next to the Delimited File Destination object.
Note: In this case a Customers sample table has been used from a Database Table Source to write to the Delimited File Destination.
Now map the source object to the destination object by mapping the source parent node by dragging it to the destination parent node manually.
You can also directly write the source layout to a Delimited File Destination through the source context menu of its parent node.
The fields are now mapped.
To configure the Delimited File Destination object, right-click on the header, select Properties from the context menu and a dialog box will open.
Select the Properties option from the context menu and a dialog box will open.
Provide the File Path. This is where the delimited destination file will be saved.
The dialog box has some other configuration options. Let’s go over these options:
Options:
If the source File Contains Header, check the box to read headers from the source file.
Field Delimiter - Allows you to select a delimiter from the drop-down list for the fields.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return and line-feed combination, carriage-return and line-feed. You can also type the record delimiter of your choice instead of choosing the available options.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
A Text Qualifier is a symbol that identifies where text begins and ends. It is used specifically when importing data.
Apply Text Qualifier to all Fields will add the specified qualifier to all the fields that have been mapped
Say you need to import a text file that is comma delimited (commas separate the different fields that will be placed in adjacent cells).
Use Null Text to specify a certain value that you do not want in your data and want it to be replaced by a null value.
Choose Append to File (If Exists) to append to an existing file or create a new file. Creating a new file will overwrite any existing file data.
Check on Hierarchical Destination when the data in the source file needs to be sorted into hierarchies in the destination file.
Check on Write to Multiple Files for the data to be saved to multiple files instead of one single file. This can be done within a single dataflow through the destination object and supporting transformations.
Once the data reading options have been specified in this window, click Next.
The next window is the Layout Builder. On this window, the layout of the delimited destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added and select the subsequent properties for it. A new field will be added to the source layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear in which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down keys.
For example: To move the Country field right below the Region field, select the row and use the Move up key from the 9th row to the 8th row.
Once the object layout is configured, click Next. A new window will appear, Config Parameters, which allows you to further configure and define parameters for the delimited destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Parameters left blank will use their default values assigned on the properties page.
Next a General Options window will appear. In this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The DelimitedDest object is now configured according to the changes that were made in the properties window.
The Delimited File Destination object is successfully configured, and the destination file can now be created by running the dataflow.
The Database Table Destination object in Astera provides the functionality to write data to a database table. This destination option provides a great deal of control over how data is written to a database table with its extended data loading options.
Astera supports a wide range of on-premise and cloud-based databases including SQL Server, Oracle, DB2, Sybase, MySQL, Salesforce, Microsoft Dynamics CRM, and more. Astera delivers highly-optimized implementations for these database connectors including high-performance bulk insert, set-based updates and transaction management. This, combined with Astera’s parallel-processing architecture, delivers industrial-strength performance and scalability.
To add a Database Table Destination object to your dataflow, go to Toolbox > Destinations > Database Table Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Database Table Destination object onto the designer.
If you look at it, the Database Table Destination object is empty right now, as in, it does not have any fields or mappings. This is because the object has not been configured yet. There are two empty sub-nodes for Input and Output under the DatabaseDest root node.
To configure the properties of the Database Table Destination object, right-click on the header and select Properties from the context menu.
This will open a new window, Database Connection, in Astera.
First, you will need to select the relevant data provider from the Data Provider drop-down list.
This is where you select the specific database provider you want to connect to. For instance, if you want to write you data to a SQL database, select SQL Server from the list. The connection details will vary according to the data provider selected.
Test Connection to make sure that your database connection is successful and click Next.
Now, you need to provide details to configure a connection with the destination database.
Enter your User ID and Password. You can also use the Recently Used drop-down list to connect to a recently-connected database.
The next window is the Pick Table window. Here, you can choose from the following options:
Pick Table: To append data into an existing table.
Create/Replace: To write data to a new table or replace an existing table.
Truncate Table: To overwrite data in an existing table.
Choose the option based on your requirement. In this case we will select the Create/Replace Table option and create a new table in the database.
For a database destination object to work, data fields should be mapped to the object so that the mapped data can be written to the destination.
Configure the source object and place it onto the designer next to the Database Table Destination object.
Map the source object to the destination object. Data mapping from source to the destination can be done in the following ways:
By dragging and dropping the parent node from the source object onto the destination object.
By mapping the output port of the source object onto the input port of the destination object.
By right-clicking on the parent node inside the source object and selecting Write to > Database Table Destination from the context menu.
The fields are now mapped.
The Pick Table window has some other configuration options.
Define Input Ports for Mapping
Single Port: Works only for the records that have been updated and won’t treat records individually.
Individual Ports for Actions: Works for all the records individually as per the selected action. The actions that are provided are: Insert, Delete, Update and Upsert.
Database Options
Use constraint based write: When the layout has certain restrictions that you want to keep while writing.
Preserve system generated key values: To generate unique values for the selected primary key in the dataset. This option is only available if you assign at least one field in your destination layout as the System Generated field.
Use transaction
Always commit transaction on completion: When you want the whole transaction to be processed regardless of errors.
Rollback if there are any errors: When you don’t want the dataset to process in case of errors and roll back completely.
Check field lengths for the incoming dataset with the lengths defined in the destination layout.
Check for null values from the incoming dataset.
Write null string as zero length strings where the string values are null, their length will be defined as zero.
Write null numeric values as zeros for the numeric data types, the null values will be defined as zeros.
Disable indexes during load for lengthy processing.
Data Load Options
Bulk insert with batch size when you want the whole dataset to be loaded in batches for the specified size. Typically, larger batch sizes result in better transfer speeds; however performance gains may be less with relatively large batch sizes.
Bulk insert with all records in one batch when you want all the records to be loaded into a table in one batch. In this case, any database specific error in your transfer won’t show until the end of the transfer.
Use single record insert when you want records to be loaded individually. Records are inserted into a destination table one-by-one. This loading option renders the slowest performance among the three insert types. However, any errors or warnings during the transfer are displayed immediately as the transfer progresses.
Bulk Copy Options
Use Internal Transaction: When specified, each batch of the bulk-copy operation will occur within a transaction.
Fire Triggers: When specified, it will cause the server to fire the insert triggers for rows being inserted into the database.
Keep Nulls: Preserves null values in the destination table regardless of the settings for default values. When not specified, null values are replaced by default values where applicable.
Table Lock: Obtain a bulk update lock for the duration of the bulk copy operation. When not specified, row locks are used.
Check Constraints: Check constraints while data is being inserted. By default, constraints are not checked.
Keep Identity: Preserve source identity values. When not specified, identity values are assigned by the destination.
Default: Use the default values for all options.
Parallel Writing is used when you want to expedite the data loading process by increasing the number of writers for that dataset.
Once you have specified your options on this screen, click Next.
The next window you will see is the Layout Builder. Here, the layout of the database destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added, and select the properties for it. A new field will be added to the destination table’s layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down buttons.
For example: To move the Country field right below the Region field, select the row and click the Move up button in the toolbar at the top, to move the field up from the 9th to the 8th position.
Once the object layout is configured, click Next. This will take you to the Config Parameters window where you can further configure and define parameters for the database destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change during the runtime.
Click Next. A General Options window will appear. Here you have the following options:
Comments can be added.
General Options are given, which relate to the processing of records in the destination file.
Clear Incoming Record Messages for any messages coming in from objects preceding the current object to be cleared.
Do Not Process Records With Errors will not let erroneous records to process further for the output.
Do Not Overwrite Default Values with Nulls makes sure that values are not overwritten with null values in the output.
Click OK.
The DatabaseDest object is now configured according to the settings made in the properties window.
The Database Table Destination object is now successfully configured, and the destination file can now be created by running the dataflow.
The job can be traced through the Job Progress window once the job starts running.
The Excel Workbook Destination object in Astera provides the functionality to write data to Microsoft Excel workbooks. An important thing to note here is that, it is not necessary to have Microsoft Excel installed on the machine for the excel destination object in Astera to work. The feature gives you the option to specify the worksheet and the starting cell where the data write begins.
To get the object from the Toolbox, go to Toolbox > Destinations > Excel Workbook Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Excel Workbook Destination object onto the designer.
The dragged destination object is empty right now. This is because the data fields are not mapped to it yet. In this case, we will use a simple source to excel destination mapping scenario as an example.
Configure the source object and place it onto the designer next to the Excel Workbook Destination object.
Note: We are using a sample table containing Customers data from an SQL database.
Now map the data fields from the source object to the destination object. Mapping can be done in the following ways:
i. By dragging and dropping the parent node of the source object onto the parent node of the destination object for auto-mapping the layout.
ii. By creating a map from the source parent node to the destination parent node.
iii. By directly writing the fields in the source layout to an Excel Destination through the source context menu of its parent node.
The fields are now mapped.
To configure the Excel Workbook Destination object, right-click on the header, select Properties from the context menu and a dialog box will open.
Provide the File Path. This is where the excel destination file will be saved.
The dialog box has some other configuration options. Let’s go over these options:
Options:
If the First Row Contains Header, check this option to read headers from the source file.
The Worksheet field can be used to specify the name of a worksheet for either overwriting the data in an already existing worksheet or adding a new worksheet.
Choose Append to File (If Exists) to append to an existing file or create a new file. Creating a new file will overwrite any existing file.
Check on Write to Multiple Files for the data to be saved to multiple files instead of one single file. This can be done within a single dataflow through the destination object and supporting transformations.
Once the data reading options have been specified on this screen, click Next.
The next window is the Layout Builder. On this window, the layout of the excel destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added and select the subsequent properties for it. A new field will be added to the source layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down keys.
For example: To move the Country field right below the Region field, select the row and use the Move up key to from the 9th row to the 8th row.
The row is now moved from the 9th position to the 8th position.
Once the object layout is configured, click Next. A new window will appear, Config Parameters, which allows to further configure and define parameters for the excel destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
A General Options window will appear. On this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The ExcelDest object is now configured according to the changes that were made in the properties window.
The Excel Workbook Destination object is successfully configured, and the destination file can now be created by running the dataflow.
The Database Lookup object in Astera is used to look up values from a source. It uses a database table that contains the lookup values as well as a set of corresponding output values. When the lookup is performed, the object returns either a single output field or multiple output fields, depending on the nature of the lookup table. Similarly, the lookup can be performed based on one lookup field or multiple lookup fields. In each case, the output field or fields are returned from the records in which the lookup values match the incoming values.
In this use case, we have a sample Customers dataset that is stored in a database table. Within this dataset, there is a field that contains the country of residence for each customer. We have another database table that contains all of these countries and their corresponding codes. Our goal is to replace the full country names with codes while writing the customer dataset to an Excel file. To do this, we will use a Database Lookup object.
Drag the relevant source object from the Toolbox and drop it onto the designer. In this case, we will select the Database Table Source object from Toolbox > Sources > Database Table Source and configure it so that it reads data from the Customers dataset.
To preview the data, right-click on the object header and select Preview Output from the context menu. Here, you can see that there is a field that contains each customers’ country of residence.
Drag the Database Lookup object from Toolbox > Transformations > Database Lookup and drop it onto the designer.
Right-click on the header of the Database Lookup object and select Properties from the context menu.
This will open a new window on your screen.
Here, you are required to configure the properties for the Database Lookup object.
On the Database Connection window, enter the details for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you wish to connect to. The required credentials will vary according to your chosen provider.
Provide the required credentials. Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A new window will open, showing whether your test is successful or has ended in an error. When the connection has been successfully established, close it by clicking OK, and then click Next.
The next window is the Database Lookup Map Properties window. Here, you can pick a table from the database that you have connected to.
In this case, we will select the table named CountryCodeLookup. This table contains the code for each country and will serve as the lookup table in our use case.
In the text box provided under the Pick Table option, you can enter a where clause to modify the lookup query. In this case, we will leave it empty.
Once you have chosen a table, click Next.
On the next window, you can choose a Lookup Cache Type from the following options:
No Caching: No data will be stored in a cache. This option is selected by default.
Static: The lookup values are stored in a cache. Once the cache is created, the transformation object will always query the cache instead of the lookup table. When you select this option, the following sub-options are enabled:
Fill Cache With All Lookup Values at Start: Fills the cache with all of the lookup values at the start and continues to use this cache for every lookup.
Cache After First Use: Uses the database table for the first lookup and fills the cache right after it is done. This cache is then used for every subsequent lookup. Checking this option enables another sub-option:
Cache Commit Count: Defines the number of records collected per cache chunk before they are committed to the cache.
Dynamic: The lookup values are stored in a temporary cache file, which is deleted once the dataflow has been executed. When you select this option, the following sub-options are enabled:
Fill Cache With All Lookup Values at Start: Fills the cache with all of the lookup values at the start and continues to use this cache for every lookup.
Cache After First Use: Uses the database table for the first lookup and fills the cache right after it is done. This cache is then used for every subsequent lookup. Checking this option enables other sub-options:
Cache Commit Count: Defines the number of records collected per cache chunk before they are committed to the cache.
Cache Key Column: Defines a matching key field to check whether a record already exists in the cache.
Persistent: Saves the lookup values in a cache file that can be reused for future lookups. When you choose this option, the following sub-options are enabled:
Rebuild Persistent Cache on Next Run: Checking this option will allow the contents of the cache file to be modified after every run.
Cache File Name: Here, you can enter a name for your cache file.
In this case, we will select the No Caching option. Once you are done, click Next.
On the Lookup Options window, you can choose between multiple lookup options.
This page provides a set of options for different scenarios that could be faced during a lookup.
If Multiple Values Are Found
Multiple Matches Found Option**:** This option provides the flexibility to choose the output value if more than one matches are found for a single value in the lookup table. You can select one out of the three options that appear in the drop-down list:
Return First: Returns the first matched value.
Return Last: Returns the last value among all matched values.
Return All: Returns all matched values.
If Value Is Not Found In the Lookup List
If no lookup values are found for a source value, you can choose from the following options to be appended with the output:
No Message: The output value will be the same as the input value and no message will appear with it.
Add Error: An error message will appear with the output.
Add Warning: A warning message will appear with the output.
If Value Is Not Found in the Lookup List, Assign Value
If no lookup value is found for a source value, you can assign an output value of your choice.
Assign Source Value: Returns the source value in the output.
Assign Null: Returns null in the output.
This Value: Allows you to enter any value that will be returned in the output.
In this case, there is only one code for each country. Therefore, we will choose Return First from the drop-down list in the Multiple Matches Found Option. Moreover, we will leave the other options at their default selection i.e. No Message under If Value Is Not Found in the Lookup List, and Assign Null under If Value Is Not Found, Assign Value.
Once you are done choosing the options, click Next.
On the Config Parameters window, you can define certain parameters for the Database Lookup object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
In this case, we will leave them blank. Click Next.
On the last window, which is the General Options window, you will be provided with a text box to add Comments. The General Options in this window have been disabled.
You are now done configuring the Database Lookup object. Click OK to close the configuration window.
Expand the Database Lookup object to view the layout of the lookup table. In this case, it contains two fields, Country and Code. The former contains the full name of each country and the latter contains each country’s code.
Map the Country field from the Database Table Source object to its counterpart in the Database Lookup object.
Drag an Excel Workbook Destination object from Toolbox > Destinations > Excel Workbook Destination and drop it onto the designer. Configure the object by providing a name and the path to the directory and folder where you want to save your destination file.
Auto-map the source dataset to the destination object.
Delete the mapping link between the Country fields in the source and destination. To do this, right-click on the mapping link and select Delete from the context menu.
Map the Code field from the Database Lookup object to the Country field in the destination object. This is what the final dataflow should look like:
Right-click on the destination object’s header and select Preview Output from the context menu.
In the Data Preview window, you will see that each county name has been replaced by its corresponding code.
This concludes using the Database Lookup Transformation object in Astera.
Each destination on the dataflow is represented as a destination object. You can have any number of destinations of the dataflow. Each destination can only receive data from a single source. To feed multiple sources into a destination you need to connect them through a transformation object, for example Merge or Union. For more information on Transformations, see the Creating Transformations article.
The following destination types are supported by the dataflow engine:
Flat File Destinations:
Tree File Destinations:
Database Destinations:
All destinations can be added to the dataflow by grabbing a destination type from the Toolbox and dropping it on the dataflow. File destinations can also be added by dragging-and-dropping a file from an Explorer window while pressing the ‘Shift’ key. Database destinations can be dragged-and-dropped from the Data Source Browser while holding down the ‘Shift’ Key. For more details on adding destinations, see Introducing Dataflows article.
Adding a Delimited File Destination object allows you to write to a delimited file. An example of what a Delimited File Destination object looks like is shown below.
To configure the properties of a Delimited File Destination object after it was added to the dataflow, right-click on its header and select Properties from the context menu.
Adding a Fixed-Length File Destination object allows you to write to a fixed-length file. An example of what a Fixed-Length File Destination object looks like is shown below.
To configure the properties of a Fixed-Length File Destination object after it was added to the dataflow, right-click on it’s header and select Properties from the context menu.
Adding an Excel Workbook Destination object allows you to write to an Excel file. An example of what an Excel Workbook Destination object looks like is shown below.
To configure the properties of an Excel Workbook Destination object after it was added to the dataflow, right-click on its header and select Properties from the context menu.
Adding an XML/JSON File Destination object allows you to write to an XML file. An example of what an XML/JSON File Destination object looks like is shown below.
To configure the properties of an XML/JSON File Destination Object after it was added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
General Properties window:
File Path – Specifies the location of the destination XML file. Using UNC paths is recommended if running the dataflow on a server.
File Options:
Using the Encoding dropdown, select the appropriate encoding scheme for your destination file.
Check the Format XML Output checkbox to have line breaks inserted into the destination XML file for improved readability.
Schema Options:
Read From Schema File Specified Below – Specifies the location of the XSD file controlling the layout of the XML destination file.
To generate the schema, click the icon next to the Schema File input, and select Generate.
To edit an existing schema, click the icon next to the Schema File input, and select Edit File. The schema will open for editing in a new tab.
Using the Root Element dropdown, select the node that should be the root of your destination schema. Any nodes up the tree will be excluded.
Adding a Database Table Destination object allows you to write to a database table. An example of what a Database Table Destination object looks like is shown below.
Destination Connection screen – Allows you to enter the connection information for your destination, such as Server Name, Database and Schema, as well as credentials for connecting to the selected destination.
Pick Table window:
Database Transaction Management: Enable Transaction Management if you want to wrap your transfer inside a transaction. Depending on your database settings, this can give you performance improvements during the transfer. When Transaction Management is enabled, you should choose between always committing transaction at the end of the transfer, or only committing it if there were no errors. Any errors would result in the entire transaction being rolled back.
Preserve System Generated Key Values: This option is only available if you have assigned at least one field in your destination layout as a System Generated field. If enabled, Astera will pass the incoming value from the source to the system generated field. Otherwise, the incoming source value will be ignored, and the system will write auto-increasing values to the destination System Generated field.
Data Load Options: specify the type of insert of your records into a destination database. The available types are Use Single Record Insert, Bulk Insert with Batch Size, and Bulk Insert with All Records in One Batch.
These types allow you to customize your transfer to balance performance vs. logging needs. Bulk inserts typically result in a better performance (faster transfer for a given number of records), but they also come with less logging, and less ability to undo unwanted inserts should you need to.
Use Single Record Insert: Records are inserted into a destination table one-by-one. Performance is the slowest among the three insert types. However, any errors or warnings during the transfer are displayed to you immediately as the transfer progresses.
Bulk Insert with All Records in One Batch: Typically a quick method of transferring large amounts of data. But keep in mind that, should there be any database specific errors in your transfer, they will not show until the end of the transfer at which time the entire batch were to be written to the destination database.
Bulk Insert with Batch Size: A good tradeoff between performance and logging needs. Records are inserted in batches of the specified size. Typically, larger batch sizes result in better transfer speeds; however performance gains may be less with relatively large batch sizes.
SQL Statement Destination object offers extra flexibility over database destination objects in applying a custom INSERT or UPDATE SQL code that controls what will be written into the destination table. An example of what an SQL Statement Destination object looks like is shown below.
To configure the properties of an SQL Statement Destination object after it was added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Database Connection window – Allows you to enter the connection information for your SQL Statement, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected database.
SQL Query window: In the SQL Query window, you can enter an SQL expression controlling which fields and records should be written to the destination. The SQL expression should follow standard SQL syntax conventions for the chosen database provider.
For example,
Insert into Orders values (@OrderId, “@OrderName”, “@CreatedDtTm”)
Notice the @ symbol in front of a field name. This makes the field appear in the field list inside the object box so that the field can be mapped. The fields not showing @ symbol in front of them will not show in the list of fields, but they can still receive values according to the logic of the SQL statement itself.
For example,
Insert into Orders (OrderId, OrderName, CreatedDtTm) values (@OrderId, “@OrderName”, “2010/01/01”)
The Excel Workbook Report object in Astera is designed to tabulate information from selected fields and present the results in a one- or two-dimensional matrix. This feature enables deeper analysis of data by organizing it in a way that facilitates the identification of trends, patterns, and insights.
To get the object from the Toolbox, go to Toolbox > Destinations > Excel Workbook Report. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Excel Workbook Report object onto the designer.
The dragged report object is empty right now. This is because the data fields are not mapped to it yet. While any source can be used, for this particular use case, we will demonstrate using a Report Source that is extracting data from a PDF source file.
Configure the source object and place it onto the designer next to the Excel Workbook Report object.
Now map the data fields from the source object to the report object.
To configure the Excel Workbook Report object, right-click on the header, select Properties from the context menu and a dialog box will open.
Provide the File Path. This is where the excel report file will be saved.
Once The File Path and Data reading options have been specified on this screen, click Next.
The next window is the Layout Builder. On this window, the layout of the excel report file can be modified.
Here, you can write names of fields as you want them to appear in your destination in the Header column and specify the relevant Aggregate Functions for them.
Aggregate Functions define how the data will be summarized in the report:
Group By: Groups records based on unique values in the specified field.
Sum: Calculates the total sum of the specified field.
Count: Counts the number of records.
Average: Calculates the average value of the specified field.
Max: Finds the maximum value in the specified field.
Min: Finds the minimum value in the specified field.
First: Returns the first record in a sorted list based on the specified field.
Last: Returns the last record in a sorted list based on the specified field.
Variance: Calculates the variance of the specified field.
Standard Deviation: Calculates the standard deviation of the specified field.
None: Includes the field in the report without applying any aggregation. This is useful when you want certain field values in the data lines but don’t want to apply any aggregation on them.
For this case:
AccountID: We will select the Group By option from the Aggregate Function drop-down list for this field as we want to group the records based on individual accounts.
OrderID: We will select the Group By option from the Aggregate Function drop-down list for this field want to see orders within each account.
TOTAL: For this field we will select the Aggregate Function Sum, to calculate the total amount per order.
QUANTITY: For this field we will select the Aggregate Function Sum, to calculate the total quantity per order.
ITEM: Since we want to show item names in the data lines but do not want to apply any aggregates on them, we will select Aggregate Function None.
The same approach will be applied to the ITEM CODE, DESCRIPTION, and PRICE fields. We will select Aggregate Function None for each of these fields to ensure that their specific values are displayed in separate data lines without any aggregation.
Click Next. The Report Options window will now open.
Report Type: You can select from three report types. Summary, Cross Tab or Time Series.
Title: To provide a meaningful title to your report, enter a new title into the Title field.
Subtotal Text: You can specify the name for Subtotal field.
Grand Total Text: You can specify the name for Grand total field.
Enable Case Sensitive Match – Check this option if you want to Group your data on a case sensitive basis. For example, if you have customer names like "john" and "John," enabling this option will treat them as distinct groups rather than combining them into a single group.
Style
You can also modify the style of your report.
Show Data Lines: If you want to see the subtotals and grand totals along with the actual records of data then you check this option.
Insert Blank Line Before Grand Total: Inserts a blank line before the grand total in the report.
Write Grand Total: Adds the grand total to the report. If unchecked, the grand total won't be included.
Insert Blank Line Before Subtotal: Inserts a blank line before each subtotal in the report.
Insert Blank Line After Subtotal: Inserts a blank line after each subtotal in the report.
Click Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
The Excel Report object is successfully configured, and the report file can now be created by running the dataflow.
Below, you can see a sample of how the summary report appears.
A Crosstab Summary displays summarized information about two fields in a two-dimensional matrix. The values for one field are displayed down the left-most column of the matrix and the values for the other key field are displayed across the top row as columns. This two-dimensional arrangement displays only a single measure at a time.
Let’s see how we can make a Cross Tab Summary using Excel Report Source.
Since we want to use information from two tables (order and order details) we have joined them and to calculate the total we used an expression object. We can then map data fields we needed from both to the Excel Report Object.
For this case:
CustomerID: We will select the Group By option from the Aggregate Function drop-down list for this field as we want to group the records based on individual customers.
ProductID: We will select the None option from the Aggregate Function drop-down list for this field as we want to spread the Product Values to spread across the top row(columns).
Total: We will select the Sum option from the Aggregate Function drop-down list for this field as we want totals for each type of product and the totals for each customer.
Title: To provide a meaningful title to your report, enter a new title into the Title field.
Subtotal Text: You can specify the name for Subtotal field.
Row Total Text: You can specify the name for Row total field.
Grand Total Text: You can specify the name for Grand total field.
Enable Case Sensitive Match – Check this option if you want to Group your data on a case sensitive basis.
Crosstab Options
Column Field: You can select which field or attribute you want to use as the column headers in the resulting cross-tabulation report.
Row Totals – Check this Option if you want to add each rows total to your report.
Style
You can also modify the style of your report.
Add Grand Total: Inserts the grand total in the report.
Add Blank Line Before Grand Total: Inserts a blank line before the grand total in the report.
Add Row Total: Inserts the row totals in the report.
Add Blank Line After Subtotal: Inserts a blank line after each subtotal in the report.
Click on Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
After defining the options, Click Ok.
The Excel Report object is successfully configured, and the report file can now be created by running the dataflow.
Below, you can see a sample of how the summary report appears. The summary in the table shows the sales data for different products purchased by various customers, identified by their CustomerID.
Time Series summary displays summarized information about two key fields in a two - dimensional matrix. The values for one field are displayed down the left most column of the matrix and the time intervals (such as days, months, quarters, or years) are displayed across the top row as columns.
Let’s see how we can make a Time Series Summary using Excel Report Source.
Since we want to use information from two tables (order and order details) we have joined them and to calculate the total we used an expression object. We can then map data fields we needed from both to the Excel Report Object.
For this case:
CustomerID: We will select the Group By option from the Aggregate Function drop-down list for this field as we want to group the records based on individual customers.
OrderDate: We will select the None option from the Aggregate Function drop-down list for this field as we want to use this date field across the top row(columns).
Total: We will select the Sum option from the Aggregate Function drop-down list for this field as we want the totals for each customer.
Title: To provide a meaningful title to your report, enter a new title into the Title field.
Subtotal Text: You can specify the name for Subtotal field.
Row Total Text: You can specify the name for Row total field.
Grand Total Text: You can specify the name for Grand total field.
Enable Case Sensitive Match – Check this option if you want to Group your data on a case sensitive basis.
Timeseries Report Options
Time Unit Drop-down: You can specify the time interval for the time series analysis. Available options include:
Year: Analyze data on a yearly basis.
Month: Analyze data on a monthly basis.
Day: Analyze data on a daily basis.
Week: Analyze data on a weekly basis.
Quarter: Analyze data on a quarterly basis.
Start Date: You can specify the start date for the time series analysis. This defines the beginning of the time period for which data will be analyzed.
End Date: You can specify the end date for the time series analysis. This defines the end of the time period for which data will be analyzed.
Date Field: Field from the dataset that contains the date or timestamp information. The selected date field will be used to create the time series.
Style
You can also modify the style of your report.
Add Grand Total: Inserts the grand total in the report.
Add Blank Line Before Grand Total: Inserts a blank line before the grand total in the report.
Add Row Total: Inserts the row totals in the report.
Add Blank Line After Subtotal: Inserts a blank line after each subtotal in the report.
Click on Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
After defining the options, Click Ok.
The Excel Report object is successfully configured, and the report file can now be created by running the dataflow.
Below, you can see a sample of how the summary report appears. This summary table shows the number of sales across different years for customers, identified by their CustomerID.
MongoDB is a document-oriented database in which one collection holds different documents. The MongoDB Destination object in Astera Data Stack, provides functionality to write data onto it. This component provides functionality to control how data should be written in collections.
While writing data, the number of fields, content, and size of the document can differ from one document to another. This can be easily catered to by including configurations on write concerns that describe level of acknowledgment from MongoDB for write operations.
MongoDB is mainly used for Big Data.
The MongoDB Destination object can be used to map incoming data from a source to the MongoDB server. MongoDB makes it easier for users to store structured and unstructured data both.
For our use case, we already have an XML/JSON Source object configured in a Dataflow.
To start, drag-and-drop the MongoDB Destination object from the Destinations section of the Toolbox onto the Dataflow.
Right-click on the MongoDB Destination object and select Properties from the context menu.
This will open a new window.
User Name: This is where we enter the user name of the MongoDB or Local Server
Password: The password of the MongoDB or Local Server is used here
Primary Server Name: The name of the primary cluster is used here
Database: This is where we select the database on which we wish to write the data.
Authentication Database: This is the database used for authentication.
Port: The port is used to handle incoming and outgoing requests to the server
Enable Set of Replica: Selecting this checkbox allows the use of a secondary cluster
Secondary Server Name: This name of the secondary cluster is used here
Use TLS: Select this option if the server requires TLS security.
Once your credentials have been filled, test the connection, and click Next.
For our use case, we have input the credentials to use the MongoDB Destination for our local server.
We will now be taken to the MongoDB Pick Collection screen.
For our use case, we will select Create/Replace and add a new Collection.
Database Operations – These operations are used when we are picking an already existing collection.
Insert: To insert a new record into the collection.
Update: To update an existing record in the collection.
Delete: To delete a record from the collection.
Upsert: To insert and update a record in the collection.
Select Fields for matching database records: Selecting from this drop-down menu lets the user select fields based on which to match the records for the selected database operation.
Write Concern Options – Selecting from these options lets the server provide an acknowledgment to the user based on how the process was carried out.
ACKNOWLEDGED: This will return an acknowledgment in the Job trace window if the process stops after getting an error or if the process successfully completes.
UNACKNOWLEDGED: This option will not return an acknowledgment, no matter how the data write is carried out.
MAJORITY: If there are multiple primary and secondary servers, this option will return when the majority of the servers have been processed.
W1: Selecting this option will return an acknowledgment when the primary server has been processed.
W2: Selecting this option will return an acknowledgment when the primary server and one secondary server have been processed.
W3: Selecting this option will return an acknowledgment when the primary server and two secondary servers have been processed.
Data Load Options – These options let the user define how the data is going to be loaded into the database.
Bulk insert with batch size: This will insert all records divided into batch sizes that the user has defined.
Bulk insert with all records in one batch: This will insert all records in a single batch.
Use single record insert: This option will treat every record individually and insert them one by one.
Select Type of Bulk Insert: Selecting from this drop-down menu lets the user define whether the Bulk Insert will be Ordered or UnOrdered.
In the case of Ordered, data writing will be stopped if an error is encountered between record insertions.
In the case of UnOrdered, data writing will continue despite any errors being encountered.
Click Next and you will be led to the MongoDB Layout screen.
Currently, our layout is empty since we have not mapped any fields to it.
We will map the incoming fields from the XML/JSON Source object to the MongoDB Destination object.
We will then reopen the MongoDB Layout screen.
As you can see below, the entire layout has now been defined.
Click OK and the MongoDB Destination object will be configured.
Select the Start Dataflow option in the main toolbar and the data will be written to the destination.
As you can see in the Job Progress window, the data has been successfully written to the destination.
This concludes the configuration of the MongoDB Destination object in Astera.
The Parquet File Destination object allows the user to fetch and map data from various kinds of sources that the Parquet format supports. It is used to write data to Parquet files since it can efficiently store large datasets. It can also be used with various transformations.
The Parquet File Destination object in Astera offers compression methods to reduce file size and control memory consumption.
Drag and drop the Parquet File Destination object from the Destinations section of the Toolbox.
Right-click on the Parquet File Destination object and select Properties from the context menu.
This will open the Properties screen.
Now, let’s look at the options present on this screen.
File Location
File Path: This is where the file path to the destination file is to be defined. It will be created once the dataflow is executed.
Options
Compression Method - You can select a compression method from this drop-down menu.
Snappy: This method offers high speed and reasonable compression.
Gzip: This method allows the reduction of data size at a fast rate.
Append to File (If Exists): This option will append data to the destination if there is a previously existing file present with data.
Write Numeric Nulls As Zero: Checking this box will write all null values as zero.
Write Null Booleans As False: Checking this box will write all Null Boolean values as false.
Once done, click Next and you will be led to the Layout Builder screen.
Here, the layout is going to be mapped for the destination. It can be built from the incoming data source or can be altered by the user.
We will be using our pre-configured Excel Workbook Source to map the incoming data to the Parquet File Destination object.
Open the Layout Builder again and it will be populated.
Click Next and you will be taken to the Config Parameters screen.
Parameters allow the deployment of flows by eliminating hardcoded values and provide a dynamic way of changing multiple configurations with a simple value change.
Click Next and you will be led to the General Options screen.
Here, you can add any comments that you wish to add.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be processed by the object.
Do Not Overwrite Default Values with Nulls: Selecting this option will make sure that values are not overwritten with null values in the output.
Click OK and the Parquet File Destination object will be configured.
This concludes the configuration of the Parquet File Destination object in Astera.
The SQL Statement Destination in Astera offers extra flexibility over other destination objects by providing the option to apply a custom INSERT, UPDATE, or DELETE SQL statements to control what will be written to the destination table. The object can also be used to call stored procedures. Moreover, you can parameterize your SQL statement using the Parameterize Replacement functionality.
In this article, we will be looking at how you can configure and use the SQL Statement Destination object in Astera.
Before moving on to the actual configuration, we will have to get an SQL Statement Destination object from the Toolbox. To do so, go to Toolbox > Sources > SQL Statement Destination. In case you are unable to view the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the SQL Statement Destination object onto the designer.
The destination object is currently empty because we are yet to map any data fields to it.
To configure the SQL Statement Destination object, right-click on its header and select Properties from the context menu. Alternatively, you can double-click the header of the destination object to go to its Properties.
A new window will open when you click on Properties from the context menu.
Here, you need to configure the properties for the SQL Statement Destination object.
On the Database Connection window, enter the details for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you wish to connect to. The required credentials will vary according to your chosen provider.
Provide the required credentials. Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A separate window will appear, showing whether your test is successful. Close it by clicking OK, and then click Next.
The next window will present a blank page for you to enter an appropriate SQL statement for the required outcome. This can consist of an INSERT, UPDATE, or DELETE statement that manipulates the data being written to the database.
The curly brackets on the right side of the window indicate that the use of parameters is supported, which implies that you can replace a regular value with a parameterized value that can be changed during runtime.
In this use-case, we will be inserting new records into an existing table, named TESTTABLE, that has three columns: OrderID, CustomerID, and EmployeeID.
Notice the @ symbol in front of a field name. This makes the field appear in the field list inside the object box so that the field can be mapped. The fields that do not have a @ symbol in front of them will not show in the list of fields, but they can still receive values according to the logic of the SQL statement itself. String fields need to be surrounded by single quotes, whereas Integer fields do not. In this case, CustomerID is a String field, while OrderID and EmployeeID are Integer fields.
The Database Options given at the bottom of the window provide support for transaction management. Checking the Use Transaction option will enable two other sub-options:
Always commit transaction on completion: Ensures that the job is completed regardless of any erroneous records.
Rollback if there are any errors: Aborts the job in case of one or more erroneous records.
Once you have entered an SQL statement and chosen your desired option, click Next.
On the new Config Parameters window, you can define certain parameters for the SQL Statement Destination object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
At the end, a General Options window will appear. Here, you are provided with:
A text box to add Comments.
A set of General Options related to the processing of records.
To conclude the configuration, click OK.
For a destination object to work, data fields must be mapped to it from a source. In this case, we will be using an SQL Query Source object to get data from the Orders table in the Northwind database.
Configure the source object and place it next to the SQL Statement Destination object.
Map the required data fields from the source object to the destination object. This can be done in the following ways:
By dragging and dropping the parent node of the source object onto that of the destination object.
By individually dragging and dropping the required fields from the source object onto their respective nodes in the destination object.
To preview the output, right-click on the destination object’s header and select Preview Output from the context menu. In this case, you will see the following result:
You can now write data to the destination table by running the dataflow.
This is how we use the SQL Statement Destination object in Astera.
Astera’s Fixed Length File Destination provides the functionality to write data to a Fixed Length File.
To get a Fixed Length File Destination object from the Toolbox, go to Toolbox > Destinations > Fixed Length File Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Fixed Length File Destination object onto the designer.
The dragged destination object is empty right now. This is because the object has not been configured yet.
For the Fixed Length File Destination object to work, it needs to be provided with a data source.
Configure the source object and place it onto the designer next to the Fixed Length File Destination object.
Now map the source object to the destination object. The following ways can be used for mapping:
i. By dragging and dropping the parent nodes onto each other for automatic mapping.
ii. By mapping the source parent node by dragging it to the destination parent node manually.
iii. By directly writing the source layout to a Fixed Length File Destination through the source context menu of its parent node.
The fields are now mapped.
To configure the Fixed Length File Destination object, right-click on the header, select Properties from the context menu, and a dialogue box will open.
Provide the File Path. This is where the fixed length destination file will be saved.
The dialog box has some other configuration options. Let’s go over these options:
Options:
If the First Row Contains Header, check the box to read headers from the source file.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination, carriage-return and line-feed. The record delimiter can also be of choice, instead of choosing from the available options.
In case the records don’t have a delimiter, the Record Length field is used to specify the character length for a single record.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Choose Append to File (If Exists) to append to an existing file or create a new file. Creating a new file will overwrite any existing file.
Check the Write to Multiple Files option for the data to be saved to multiple files instead of one single file. This can be done within a single dataflow through the destination object and supporting transformations.
To define hierarchical file layout and process the data file as a hierarchical file check the This is a Hierarchical File option. Astera IDE provides extensive user interface capabilities for processing hierarchical structures.
Once the data reading options have been specified on this window, click Next.
The next window is the Layout Builder. On this window, the layout of the fixed length destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added and select the subsequent properties for it. A new field will be added to the source layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down keys.
For example: To move the Country field right below the Region field, select the row and use the Move up key to from the 9th row to the 8th row.
The row is now moved from the 9th position to the 8th position.
Once the object layout is configured, click Next. A new window will appear, Config Parameters, which allows to further configure and define parameters for the fixed length destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
A General Options window will appear. On this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The FixedDest object is now configured according to the changes that were made in the properties window.
The Fixed Length File Destination object is successfully configured, and the destination file can now be created by running the dataflow.
Astera’s XML/JSON File Destination object provides the functionality to write data to an XML or JSON file when the data is in hierarchical format.
In order to understand how to write to an XML/JSON File Destination object, we will take help of a use-case where we will convert flat datasets to an hierarchical set, and then write the transformed data to an XML file.
Customers and Orders data from database tables will be used as source objects. We will then join them using the TreeJoin Transformation to create a hierarchical dataset.
To get an XML/JSON File Destination object from the Toolbox, go to Toolbox > Destinations > XML/JSON File Destination and drag-and-drop it onto the designer. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
The dragged destination object is empty right now. This is because the object has not been configured yet.
A schema file is needed to write data to an XML/JSON File Destination. To create the schema file, right click on the source object (Treejoin transformation in this case), a context-menu will appear, select the option Generate XML Schema for layout.
A new dialog box will open where you will be asked to save the XML schema file. Give the File Name and file path for the schema file and click Save.
The schema file has been created in the specified location. To view this file, go to the file location and open the file in Astera.
The opened file would look like the one below:
To configure the XML/JSON File Destination object, right-click on its header and select Properties from the context-menu.
A window, Destination XML File, will open. Here we will specify the file locations; File Path and Schema File for the XmlJsonDest object.
The destination file will now be saved in the provided file location. Click OK, and map the destination object to the source object before further configuration.
The XmlJsonDest object will now have the layout of the source object (Treejoin Transformation in this case).
To map the source object to the destination object, the following ways of mapping can be used:
By dragging and dropping the parent node of the source object (TreeJoin node in the TreeJoin table) onto the child node of the destination object (TreeJoin node in the XmlJsonDest object) for automatic mapping.
By manually mapping the source parent node (TreeJoin in the TreeJoin table) by dragging it to the respective destination child node (TreeJoin in the XmlJsonDest object).
The fields are now mapped.
Once the file locations have been specified and the mappings have been done, further properties can be defined.
XML Layout
The next window after the Destination XML File window is the XML Layout window.
This window shows the XML layout for the XmlJsonDest object. The collection nodes for the object can be seen in this window with their fields.
Config Parameters
Click Next, and a window, Config Parameters, will open, which will allow us to further configure and define parameters for the XML/JSON Destination object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
General Options
Click Next, and a new window, General Options, will open. On this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The XmlJsonDest object has been successfully configured and the destination file can now be created by running the dataflow.
The Quick Profile option in Astera gives users the ability to preview field statistics of any set-level object in the dataflow in design time. It provides information such as the data type, minimum/maximum values, data count, error count, etc which can be used to identify and correct data quality issues while designing flows.
The Quick Profile window can be viewed for an entire flow by clicking View > Data Pipeline > Quick Profile or using the shortcut key Ctrl+Alt+A.
To view field statistics at a particular object in the dataflow, right-click on the object’s header and select Quick Profile.
A window like this will slide up from the bottom of the screen:
Quick Profile provides an overview of the content and quality of all the fields, allowing us to determine whether the data is suitable for further transformation. When creating the flows, we may use this functionality at any point to identify any erroneous data that might be affecting the final results.
Enter a valid file name and click Save.
This concludes the use of Quick Profile feature in Astera.
The Field Profile feature captures statistics for selected fields from one or several objects. Field Profile is essentially a transformation object as it provides Input and Output ports similar to other transformations. These output ports make it possible to feed the statistics collected to another object on the dataflow.
In this document, we will learn how to create a Field Profile in Astera.
We want to collect detailed statistics from some of these fields of data and write it to a Delimited File Destination. For this purpose, we will use Astera's Field Profile feature.
To get a Field Profile object from the Toolbox, go to Toolbox > Data Profiling > Field Profile. If you are unable to see the toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Field Profile object onto the dataflow designer.
You can see that the dragged Field Profile object contains an Input node and an Output node. The Input node is empty as we have not mapped any fields to it yet.
One-by-one map ShipName, CustomerID, Country, OrderDate, ProductName, UnitPrice, and Quantity from the source object to the Field Profile object’s Input node.
To configure the Field Profile object, right-click on its header and select Properties from the context menu.
A configuration window will open. The first screen you will see the Layout Builder. This is where we can create or delete fields, change their name, and data type.
Click Next. On the Properties window, specify the Statistics Type from the dropdown list.
Field Statistics dropdown allows you to select detail levels of statistics to collect. Select among the following detail levels:
Basic Statistics: This is the default mode. It captures the most common statistical measures for the field’s data type.
No Statistics: No statistics are captured by the Data Profile.
Detailed Statistics – Case Sensitive Comparison: Additional statistical measures are captured by the Data Profile, for example Mean, Mode, Median etc. using case-sensitive comparison for strings.
Detailed Statistics – Case Insensitive Comparison: Additional statistics are captured by the Data Profile, using case insensitive comparison for strings.
In this case, we will select Detailed Statistics – Case Sensitive Comparison.
Click OK.
Right-click on Field Profile object’s header and select Preview Output from the context menu.
A Data Preview window will open and show you the statistics of each mapped field as a record.
Observe that the Field Profile object contains an Output node. Once expanded, you will see various statistical measures as fields with output ports.
We can write these statistical measures to a destination file.
Right-click on the Output node and go to Write To > Delimited File Destination.
A Delimited File Destination object will be added to the dataflow designer with auto-mapped fields from the Output node.
A Job Progress window will open at this instant and will show you the trace of the job.
You can open the delimited file that contains field statistics from the link provided in the Job Progress window.
The Data Profile feature provides a complete data field statistic – basic and detailed – containing information such as the data type, minimum/maximum values, data count, error count etc. The statistics are collected for each of the selected fields at the time the dataflow runs.
In this document, we will learn how to create a Data Profile in Astera.
We want to collect statistics on these fields of data. For this purpose, we will use Astera’s Data Profile feature.
To get the Data Profile object from the Toolbox, go to Toolbox > Data Profiling > Data Profile. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Data Profile object onto the dataflow designer.
You can see that the Data Profile object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map the fields from the source object onto the profile object.
To configure the Data Profile object, right-click on its header and select Properties from the context menu.
A configuration window will open. The first screen you will see is the Layout Builder. This is where we can create or delete fields, change field names, and their data type.
Click Next. This is the Properties window.
Here we will provide the Profile File path to specify where the profile should be stored.
Specify the type of Field Statistics to be collected.
Field Statistics dropdown allows you to choose detail levels of statistics to collect. Select among the following detail levels:
Basic Statistics: This is the default mode. It captures the most common statistical measures for the field’s data type.
No Statistics: No statistics is captured by the Data Profile.
Detailed Statistics – Case Sensitive Comparison: Additional statistical measures are captured by the Data Profile, for example Mean, Mode, Median etc. using case-sensitive comparison for strings.
Detailed Statistics – Case Insensitive Comparison: Additional statistics are captured by the Data Profile, using case insensitive comparison for strings.
In this case, we are collecting a Detailed Statistics – Case Sensitive Comparison
Click OK.
A Job Progress window will open at this instant and will show you the trace of the job.
2. Click on the Profile link provided in the Job Progress window and the profile will open in Astera. Expand the Profile node to see each field inside the object. Click on these fields to see the collected statistical values.
To learn how you can configure a Database Table Source object, click .
Added a new field AccountType in the layout and mapped it’s value from the object.
To learn how you can configure a Database Table Source object, click .
To learn how you can configure an Excel Workbook Destination object, click .
The dialog box has some other configuration options such as worksheet and start address, which work just like that of .
Note: We already have an object configured for our use case.
Learn more about how the works in Astera.
In this case, we are using data from a Loans :
In this case, we have extracted data from a sample Invoices .
Configure settings for your Delimited File Destination from .
After configuring the settings for the Delimited File Destination object, click on the Start Dataflow icon from the toolbar located at the top of the window.
In this case, we will use data from a Customers .
After configuring the settings for the Data Profile object, click on the Start Dataflow icon from the toolbar located at the top of the window.
This section talks about the various database write strategies offered within Astera.
Database Diff Processor is one of the four Database Write Strategies offered in Astera. Its purpose is to synchronize the data present in two separate datasets. The object compares the two datasets and performs write actions (insert and update) on the destination table so that both tables contain the same information.
In this use case, we have a sample dataset of customers that is stored in a database table. Currently, this dataset contains 10 records, but two more customer records are to be added later on. Furthermore, updated phone numbers are to be added for two other customers.
We want to write the initial dataset to another database table and ensure that whenever the aforementioned changes are made, they are applied to both tables. To achieve this, we will be using the Database Diff Processor object in Astera.
1. Drag and drop the Database Table Source object from Toolbox > Sources > Database Table Source onto the dataflow designer. Configure this object so that it reads data from the source table.
Drag-and-drop the Database Diff Processor object from Toolbox > Database Write Strategies > Database Diff Processor onto the dataflow designer. Auto-map all of the elements from the source object to the Database Diff Processor object.
Right-click on the header of the Database Diff Processor object and select Properties.
This will open the Database Connection window. Here, you will have to enter the credentials for the database you want to connect to. Alternatively, you can connect to a recently used database by selecting it from the Recently Used dropdown list.
In this case, we will connect to a test database that contains an empty Customers table.
Once you have entered the required credentials, click Next.
On the next window, you will have to pick a destination table, where the write actions (Insert and Update) will be performed.
In this case, we will pick the empty Customers table that has already been created in this database.
There are a couple of options related to Record Matching at the bottom of the screen:
Select a field to be used for matching records between the source table and the destination table. In this case, we will select CustomerID because it cannot change for each customer.
Check the Case Sensitive option if you want the comparison to be case sensitive. In this case, we will leave this option unchecked.
Now that the required table and Record Matching options have been selected, click OK to close the configuration window.
Run the dataflow to write your source data to the destination table. To preview the source data, right-click on the Database Table Source object and select Preview Output.
This is what the source data looks like:
To check whether this data has been written to the destination table, right-click on the Database Diff Processor object and go to Database Table > View Table Data.
The destination table will open in a separate tab within Astera.
The data present in the destination table is the same as that in the source table, showing that we have successfully written the data by running the dataflow.
The source dataset has been updated to include two more customer records. Moreover, two other customers have updated their phone numbers. This is what the source data looks like after the changes have been implemented:
Run the dataflow again to apply these changes to the destination table. This is what the destination table should look like when you open it in Astera after running the dataflow again:
The changes that were made to the source table have automatically been applied to the destination table as well, showing that the Database Diff Processor object has achieved its task.
This concludes using the Database Diff Processor write strategy in Astera.
The Source Diff Processor object is one of the Database Write Strategies offered in Astera. It works like the Database Diff Processor, however, unlike the Database Diff Processor, it is used to perform write actions (such as Insert, Update and Delete) on file destinations. It stores a snapshot of your data processed in the first run in a CDC file. So, the next time you run it, it will only import the new records.
We have a sample Employees dataset coming in from an Excel Workbook Source. Initially, we had records of 10 employees but later on, 2 more were added in the source dataset. We wish to apply a database write strategy that can read the data incrementally from file sources. To achieve this, we will use the Source Diff Processor in Astera.
Drag-and-drop the Source Diff Processor object from Toolbox > Database Write Strategy > Source Diff Processor onto the dataflow designer and map the source data to it.
Right-click on the Source Diff Processor object’s header and select Properties.
A Layout Builder window will open where you can modify your layout. Click Next.
The next window is the Incremental Write Options window.
Here, you have to specify the Record Matching field. This field is used to match and compare the incoming and existing records. We will select EmployeeID as the Record Matching field.
Case Sensitive – Check this option if you want to compare records on a case sensitive basis.
Sort Input – Check this option if you want to sort the incoming data.
Now, if the incoming dataset has a new record with a new EmployeeID i.e. the ID is not present in the existing file which is being compared against the incoming file, Astera will perform the INSERT action.
If the EmployeeID is already present in the existing file, Astera will compare the records against that ID and will perform UPDATE action in the fields where the information has updated.
If the EmployeeID is there in the existing file, but not present in the incoming file, it means that the particular record has been deleted. In this case, Astera will perform the DELETE action.
In the Output Options section, you can either select the Single Output option or One Port for Each Action.
The Single Output option is selected if you wish to load your data into the destination without modifying it further on the basis of individual write actions. If you select Single Output, the database action such as INSERT, UPDATE, SKIP or ERROR will be chosen by the database write strategy’s logic rather than being specified by the user. Using a Single Output is recommended when a database write strategy is applied.
One Port for Each Action is used when you want to further transform or log your data. If you select One Port for Each Action, you will get separate nodes for each Diff action in the Source Diff Processor’s object.
In this example, we will select Single Output.
The third section in the Incremental Write Options window is the Incremental Transfer Information File Path option. Here, you must specify the file path where you want to store information related to the last run.
Now, if you have worked with Excel Workbook and Database table Sources in Astera, you would have noticed that the Database Table Source object gives you the option to read incremental changes. However, no such option is available in Excel or other file source objects. This option in the Source Diff Processor enables you to read incrementally from different file formats such as Excel, Delimited, and Fixed Length.
Click OK.
Now, right-click on the Source Diff Processor object’s header and select Preview Output.
Output preview for Single Output:
Output preview if One Port for Each Action is selected:
You can now write your data to any destination or perform any transformation on the dataset.
This concludes using the Source Diff Processor write strategy in Astera.
Data Driven Write Strategy is a set level functionality, which means that the entire incoming dataset must flow through it. It allows a user to stamp a database directive on the record so that when it reaches its destination, it will be loaded according to that directive to perform the specified write action. You can specify multiple rules within the properties of a Data Driven write strategy object. These rules are tried against each record from top to bottom. If a record passes the first rule, it will not be tried against the next rule(s).
Assume a scenario in which Orders data from a Database Table Source is written to a Database Table Destination. We want to DELETE those records where ShippedDate is before the year 2000 and declare those records where Freight is less than 10 as ERROR. We will use the Data Driven write strategy object to achieve this task.
Drag-and-drop the Data Driven object from Toolbox > Database Write Strategy > Data Driven onto the dataflow designer and map the source data to the Data Driven object.
Right-click on the header of the Data Driven object and select Properties.
A Layout Builder window will open where you can modify your layout. Click Next.
The next window is the Data Driven Write Strategy Conditions window, where you can specify rules to route your data. Click on the fx option to enable the Expression Builder.
Once you select this option, the Expression Builder will be enabled.
Now, specify the following rules in the Expression Builder and select Database Action Type as ERROR for the Freight rule and DELETE for the Date rule.
Year(ShippedDate) < 2000
Freight < 10
There are five Database actions: Insert, Update, Delete, Skip and Error. From these, you can select the action you want to be taken for a certain rule.
Once you are done specifying the rule(s), click OK.
You can now write your data to a Database Table Destination.
It is important to note here that while working with Database Write Strategies, Single Port is selected. Once you check the Single Port option in the Database Table Destination object, a box will appear in which you will have to specify a field for matching database record. In our case, we will select OrderID.
We have successfully configured the settings and built the layout.
Let’s preview the output.
Data Driven output:
You can see that Astera has appended an error message with the records where Freight is less than 10. You can create an error log of these records or load them into a database if you want.
Now, whenever you access the same database table, you will see that the records where ShippedDate is before the year 2000 would be deleted.
This concludes using the Data Driven Write Strategy in Astera.
Fixed Length Parser processes a stream of fixed length data as an input and give elements as parsed output. This is useful when the source data is in text format (information is contained in a single field) and it needs to be put into multiple fields for further integration.
In this document, we will learn how you can use Fixed Length Parser in Astera.
The sample data that we are using is a fixed length text file containing customers’ information.
Observe that all the information such as name, street number, street type, city, state, and zip code are concatenated in a single field. We want to parse the Contact_Details field into Name, Street Number, Street Type, and other respective fields.
To parse the data into separate fields, we will use the Fixed Length Parser object in Astera.
To parse the data, we first have to extract it. Since the data is stored in a continuous stream, can not extract it directly using a Fixed Length File Source object. Therefore, we will use a combination of transformations to get the desired outcome. The diagram below explains the process that we’ll be following, right from extraction to to parsing the data.
Let’s begin with extracting the data.
Go to Toolbox > Transformations > Constant Value and drag-and-drop the Constant Value object onto the designer.
Right-click on the object’s header and select Properties from the context-menu.
In the Constant Value field, provide the file path of the source file (file containing customers data in a stream). Click OK.
Next, go to Toolbox > Function Transformations > Files > ReadFileText(String filePath) - String, and drag-and-drop the object onto the designer.
You an see that the dragged transformation object has two sub-nodes - Input and Output.
Expand the Input node and map the Value field from the Constant Value Transformation object to the filePath field inside the Read File Text object.
This will redirect Astera to read the data from the given file path. Now we can use the Fixed Length Parser object to parse the text data into separate fields.
To get the Fixed Length Parser object, go to Toolbox > Text Processors > Fixed Length Parser and drag-and-drop the object onto the designer. Map the Value field, under the Read File Text object, onto the Text field inside the Fixed Length Parser object.
You can see that the dragged-object also contains an Output sub-node which is currently empty.
Configure the Fixed Length Parser object by right-clicking on its header and selecting Properties.
A properties window will open. Here you will see three options, make sure the first two options are checked.
Click Next, and you will be directed to a Source Fields window.
Here, you have to provide the name of each field that you want to parse the source date into, as shown below.
Click OK. Now, expand the Output node inside the Fixed Length Parser object. You will see all the fields that you have created in the previous step.
Right-click on the object’s header and select Preview Output from the context menu.
A Data Preview window will open. Expand the nodes, and you will see a parsed output of each record.
To store this parsed output, write it to a destination file.
Right-click on the Output sub-node inside the Fixed Length Parser object, and go to Write to > Excel Workbook Destination. An Excel Workbook Destination object will be added to the dataflow designer with auto-mapped fields.
Configure settings for the Excel Workbook Destination object.
Click on the Start Dataflow icon, located in the toolbar at the top, to create this destination file.
In addition to the standard logging functionality, Astera provides a special Data Quality Mode option, useful for advanced profiling and debugging. When a dataflow is created/opened in Data Quality Mode, most objects on the dataflow show the Messages node with output ports.
In this document, we will learn how to use the Data Quality Mode in Astera.
If you preview the Customers dataset output at this stage, you will see that some of the records have missing values in the Region and Fax fields.
Data quality rules are set so that records with empty Region values are marked as errors and records with empty Fax values are marked as warnings. A red exclamation sign in the Data Preview window identifies the records that have failed to match the rule and returned an error or a warning as a result.
Now, for instance, we want to collect information regarding the number of errors/warnings in a single record, the error/warning messages attached to these records, and write this information to a destination. For this purpose, we will use Data Quality Mode.
Once Data Quality Mode is activated, a Messages node will be added to all the objects in the dataflow.
The Messages node captures the following statistical information:
TotalCount
ErrorCount
WarningCount
InfoCount
MessagesText
DbAction
Custom
In addition, FirstItem, LastItem, and Items sub-nodes provide a way to collect quality control data for each of the records. The quality control data includes ElementName, MessageType, or Action etc. and can be written to a destination object for record-keeping purposes.
Connecting the Messages node’s output ports to another object’s input ports on the dataflow makes it possible to get both - summary statistics and record-level statistics for the dataset, which are useful for analysis and debugging. To do this:
Right-click on the Messages node inside the NoNullValues_Rule object and go to Write to > Delimited File Destination.
A Delimited Destination object is added to the designer with mapped fields.
Right-click on the header of the destination object and select Preview Output from the context menu.
A Data Preview window will open, showing error and warning information.
A Record Level Log captures the status (Success, Error, Warning, or Skip) for each of the records transferred, and includes snapshots of the source record and the destination record. It also provides additional details, such as error messages.
You can have any number of record level logs on the dataflow. Each record level log will collect the status of the records in the object that it is connected to.
In this document, we will learn how to use Record Level Log object in Astera.
If you Preview Output for the Customers dataset, you will see that some of the records for the Region and Fax fields are empty.
If you hover on these warning signs, it will show you the error message.
Now, when we run this dataflow we want to know of records which passed the validation check, records that failed the validation check, records containing errors and records that ended in only warnings.
For this purpose, we will use Record Level Log.
To get a Record Level Log object from the Toolbox, go to Toolbox > Data Profiling > Record Level Log. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Record Level Log object onto the dataflow designer.
Another way to get Record Level Log object is to right-click on the Output node inside Database Table Destination and go to Write to > Record Level Log
You can see that the dragged Record Level Log object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map the fields from the source object to the Log object.
To configure the Log object, right-click on its header and select Properties from the context menu.
A configuration window will open. The first window you will see is the Layout Builder window. This is where we can create or delete fields, change their name and data type.
Click Next, and you will be directed to a Properties window where you can configure settings for creating the log file.
Specify the Profile File path where Astera will save this log file. Log files are saved with .prof extension.
Specify the Log Level Type from the dropdown list.
All – all records (including Success records) are logged
Errors – only error records are logged
Warnings – only warning records are logged
Errors and Warnings – both error and warning records are logged
Off – no logging
In this case, we will select Errors and Warnings as our log type.
Stop Logging After … Records with Errors – allows you to limit excessive logging by setting a cap on the maximum number of errors to be logged. The logging stops after the cap has been reached.
The default value is 1000 errors.
Click OK.
Click on the log file link provided in the Job Progress window.
Record Level Log will open in Astera showing you the status of logged records.
Astera stores the error logs in XML format and if you expand each record, it will give you the Field Name to which the error/warning message was attached, the Processing Step of data check that resulted in the errors, as well as the error Message.
If you click on View Record, Astera will show you the field values of the record failing the data quality rule.
The Data Quality Rules object found in the Data Profiling section of the Toolbox is used to apply one or more conditions, called Data Quality Rules, against incoming records. Records that do not meet the data quality rule criteria will be assigned the ‘Error’ status and may be optionally excluded from processing by the downstream objects.
Data Quality Rules is a record-level component which means that it does not require the entire dataset to flow through it. In other words, you can map a single or a couple of fields to the Data Quality Rules component to set up quality validation criteria and the transformed records can be mapped further in the dataflow.
Let’s understand the application and usage of Data Quality Rules with the following example.
Here we have sample data of employees of a fictitious organization which we have retrieved using an Excel Workbook Source.
If we look at the preview of the Employee_Report dataset, the values in the SalariedFlag column specify whether an employee is salaried in terms of 0 and 1.
1 = the employee is salaried
0 = the employee is non-salaried and therefore is eligible for overtime.
We can apply data quality rules to these values and identify which employees are not salaried and therefore, are eligible for overtime. The Data Quality Rules object will process all records and those that do not match the criteria will be returned with an error. This means that in this example, the salaried employees with the salary flag ‘True’ will return an error, whereas the records of employees with the salary flag ‘False’ will pass the data quality rule.
To do this, drag the Data Quality Rules object from the Data Profiling section in the Toolbox and drop it onto the dataflow designer.
Now, map the SalariedFlag field to the Data Quality Rules object.
Right-click on the Data Quality Rules object and select Properties from the context menu.
This will open a new window. This is the Layout Builder, where you can see the ‘SalariedFlag’ field we have mapped from our source.
Click Next to proceed to the Data Quality Rules window.
Once a new rule is added, the options on this window will activate and the rule will be added to the grid.
Let’s explore these options one by one:
Description: The Description field contains the name or description of the rule. By default, the rules are termed as Rule1, Rule2 and so on, depending on the number of rules you add. But you can also rename the rules for better understanding and convenience.
In our case, as we want to set a data quality criteria to identify non-salaried employees, we can rename the rule as “NonSalariedEmployeesRule.”
Attach rule to the field: This is a drop-down list using which you can attach a rule to a particular field. You can see that there is a root node named Data Quality Rules.
Listed within the Data Quality Rules node are the fields mapped to the Data Quality Rules object. Here we have only one field mapped to which we want to apply this rule. In case you want to apply a rule to the whole dataset, you can simply double-click on the Data Quality Rules root node and the rule will be applied to all fields mapped to the Data Quality Rules object.
In this case, we will map the rule to the SalariedFlag field.
Expression box: This is where you can type in the expression for your rule.
In this example, we want to validate records with the Salary Flag ‘False.’ To do this we will write the expression:
‘SalariedFlag = 0’ in the Expression field.
Observe that, simultaneously, Astera shows you a compile status of your expression below the expression box.
It says ‘Successful’ so we can click OK. Alternatively, it will give you an error if the expression is incorrect and you will have to correct the expression before clicking OK.
Show Message: We can also write a message to show up with the errors, which can also be written to the error log. Let’s write a message:
‘Salaried employees are not eligible for overtime.’
This message will help identify why a particular record was marked erroneous. And in case multiple rules are applied, the message will point out which rule was not qualified by a particular record.
Next, we have two checkboxes:
Active – to activate a rule.
Is Error – when this is checked, all records that return an error will not be written to a target. Which means that only the records that have passed the data quality rule will flow further in the dataflow pipeline.
However, if we uncheck this option, it will automatically check the Warning checkbox. This will return the records that failed to match the rule with a warning and will be written to a target.
In this case, let’s keep the errors as errors by checking the Is Error box.
Now we have set up a data quality rule.
Now, let’s look at the preview. Right-click on the Data Quality Rules object and select Preview Output from the context menu.
You can see that the records that have matched the rule, the records with ‘False’ salary flag, have been validated. On the other hand, the records that failed to match the rule, the records with the ‘True’ flag, have returned an error, denoted by a red warning sign.
If you move the cursor over this warning sign, it will show the error message in the tooltip. This is especially useful in cases where you have applied more than one rule and you want to track which records have failed to match which rule or when you want to store the erroneous records in an error log.
So now that we have validated the records against our data quality rule, we can map it to a target which is a Delimited File Destination in this case. We will name this file ‘Employees eligible for overtime,’ so the records of employees with the ‘False’ salaried flag will be passing through the Data Quality Rules object and consequently be mapped to the destination file. Let’s do the mapping.
Now, if we open the Properties window of the destination file, you can see the option, Do Not Process Records With Errors on the last window. It is checked by default in all target formats in Astera. Therefore, when we run this dataflow, all records that have matched the data quality rule will be written to the destination file, whereas records that failed to match the rule and returned an error will be omitted.
The Delimited Parser in Astera reads and processes a single stream of text in delimited format as input and returns its elements as parsed output. It enables users to transform an otherwise semi-structured data into a structured format.
In this document, we will learn to use the Delimited Parser to parse an incoming text stream in Astera.
In this case, we are using the Delimited File Source to extract our source data. You can download this sample data from the following link:
The source file contains customers’ contact information including their name, address, postal code, phone number, etc.
Upon previewing the data, you can see that it is difficult to decipher fields and elements since the data is in a single text stream with fields and records separated by delimiters. To make sense of this data, each records needs to be parsed into its elements in respective fields.
To do this, we will use the Delimited Parser object.
To get the Delimited Parser object, go to Toolbox > Text Processors > Delimited Parser and drag-and-drop the object onto the designer.
You can see that the dragged object contains a single Text field.
Map the Customer_Info field inside the source object onto the Text field inside DelimitedParser object.
Right-click on the object’s header and select Properties.
A configuration window will open as shown below.
Let’s look at the properties on this window.
Parse Data Pattern – Contains three patterns in which the dataset can be parsed:
Single Record – Data is parsed into a single record with multiple fields. Users need to provide a field delimiter, and a text qualifier, if necessary.
Multiple Records – Data is parsed into multiple records with a single or multiple fields. Users need to provide a field delimiter as well as a record delimiter.
Field Arrays – Data is parsed into an array of records and fields. Users need to provide a field value delimiter and an array separator.
The source data in this case contains multiple records with many different fields. Therefore, we will set the Parse Data Pattern option to Multiple Records.
Provide a Field Delimiter and a Record Delimiter. The source file also contains a Text Qualifier.
Click Next. This is the Layout Builder screen.
Here, write the names of the fields that you want to create.
Click OK. The Delimited Parser object now has new fields in the Output node.
To preview data, right-click on the object’s header and select Preview Output from the context menu.
A Data Preview window will open. Upon expanding the records, you can view the parsed output.
To store this parsed output, you can write it to a destination file or use it for some transformation further in the dataflow.
This concludes using the Delimited Parser in Astera.
The Delimited Serializer converts a structured data set into a single text stream with fields and records separated by delimiters or identified by text qualifiers. A serialized data with delimiters allows sharing or storage of the data in a form that allows recovery of its original structure.
In this document, we will learn how to use a Delimited Serializer to serialize a structured data in Astera.
In this case, we are using the Customers table from the Northwind database. You can download this sample data from the following link:
The source file contains customer’s contact information including their ContactName, Address, PostalCode, Phone etc., in a structured format.
We want to convert the information contained in multiple fields into a single text stream separated by a delimiter.
To perform this task, we will use the Delimited Serializer object in Astera.
To get the Delimited Serializer object, go to Toolbox > Text Processors > Delimited Serializer and drag-and-drop the object onto the designer.
You can see that the dragged object contains a Text field with an output port and an Input sub-node which is currently empty.
Auto-map source fields by dragging and dropping the top node of the source object, that is Customers, onto the Input node of the transformation object – Delimited Serializer.
Right-click on the object’s header and select Properties.
A configuration window will open as shown below.
Let’s look at the properties on this window.
Field Delimiter – Allows users to specify a delimiter for the source fields from the dropdown list.
Text Qualifier – Allows users to specify qualifiers at the start and end of a text stream. In most cases, text qualifier encloses an entire record.
Build Operation Type – Contains two options in which a dataset can be serialized:
One Record Per Input – creates a single text record separated by delimiters for the entire data set.
One Record Per Transaction – creates as many text records as the source file separated by only field delimiter.
Let’s leave the properties as default, and click OK. The data has been serialized.
To preview the data, right-click on the Delimited Serializer object’s header and select Preview Output from the context menu.
A Data Preview window will open showing the serialized data with field delimiters
To store this serialized output, write it to a destination file or you can use this data further in the dataflow.
This concludes using the Delimited Serializer in Astera.
Fixed Length Serializer is useful when source data is stored in multiple fields and is to be converted to text format (store entire data in a single field). This is helpful is scenarios where you have to:
Store large datasets, so you compress the information in a single field
Transport volumes of data over a network
In this document, we will learn how to use Fixed Length Serializer in Astera.
The source file contains customers’ ContactDetails.
To preview this data, right-click on the source object’s header, and select Preview Output from the context menu.
A Data Preview window will open, displaying the source data.
Now, we want to convert the information contained in multiple fields into a single field in text format.
To perform this task, we will use the Fixed Length Serializer object in Astera.
To get the Fixed Length Serializer object, go to Toolbox > Text Processors > Fixed Length Serializer, and drag-and-drop the object onto the designer.
You can see that the dragged-object contains a Text field and an Input sub-node which is currently empty.
Auto-map source fields by dragging-and-dropping top node of the source object, ContactDetails, onto the Input node of the transformation object – Fixed Length Serializer.
Right-click on the object’s header and select Properties.
A configuration window will open, as shown below.
Check the options according to the contents and format of your source file.
In this example, both options have been checked, as the first row in the source file contains header, and the data also contains multiple records.
Specify the Record Delimiter of the source data. In this case, it is <CR><LF>.
You can specify any delimiter based on the format of your source file.
Under the Builder Options, select the relevant type from the drop-down list. For this dataset, we will use One Record Per Input.
Click Next. This will open the Destination Fields screen, where you can see all the incoming fields from the source object.
Here you can delete fields, change their Header or Data Type, modify the string’s Start Position, define field Length, and customize the fields according to your requirements.
Click OK.
Right-click on the Fixed Length Serializer object, and select Preview Output.
A Data Preview window will open, displaying the serialized data.
To store this serialized output, write it to a destination file.
Right-click on the FixedLengthSerializer node, and go to Write to > Fixed Length File Destination. A Fixed Length File Destination object is added to the dataflow designer with a Text field auto-mapped to it.
Click on the Start Dataflow icon, located in the toolbar at the top, to create the destination file.
A Fixed Length File Destination file will successfully be created. You can find its link in the Job progress window.
We have added Language Parser functionality to the Expression Builder to enable advanced expressions such as interpolated string and verbatim. This will enable users to compile different values from incoming datasets into an expression and present it as an interpolated string in the output.
In this article, we will look at a use case and understand how the Language Parser works in Astera.
We have some EmployeeData from a fictitious organization stored in an Excel spreadsheet. We will work with that data to create:
An interpolated string
A verbatim string
We will be using the fields in the EmployeeData Excel sheet and interpolate the data into an EmployeeInfo string. After string interpolation, we will create a new field to apply escape characters inside an expression transformation object through a verbatim string.
Astera also provides an extensive and detailed support for error information that may occur while data parsing and string interpolation. We will use some examples to see how error information works in Astera.
Follow through the steps below to see how it works:
Retrieve the source data.
Next, we will drag-and-drop the Expression Transformation object to the designer, to create string expressions. Map fields from the Excel source object (EmployeeData).
Right-click on the Expression transformation object and select Properties. A Layout Builder will open, where you can use the Expression Editor to define the expressions for each field.
Let’s discuss how to create an interpolated string in an Expression transformation object.
Create a new field in the Layout Builder. Since this field will only return an output, we will check the Output box and set the data type as ‘String.’ In this case, we have named the new field as EmployeeInfo.
In the Expression Editor, we will define the Interpolated String Expression for the new field. You can either write the expression in the given field or click the ellipses in the left corner of the field to go to the Expression Editor.
The interpolated string expression will be built in the Expression box.
Here, we will create an interpolated string using the EmployeeID, LastName, FirstName, Title, TitleOfCourtesy, HireDate, and City fields from our source data.
The interpolated string expression in this case will be written as:
During both the preview and the runtime, the variables or the parameters enclosed in {} will take input from the source data and return the output for each record in new field EmployeeInfo. Here’s an example of the parameter and the source value for the first record:
Hence, the output of the interpolated string for the first record will return as: “Ms. Nancy Davolio from Seattle was hired on 01/04/06 and works as Sales Representative.”
Now, if we look at the preview, you can see that a new field ‘EmployeeInfo’ has been created and the interpolated string output for each record has been returned in that field.
Let’s go over how to use a Verbatim String with a line break in an Expression transformation object.
Create a new field in the Layout Builder for the verbatim string. Since this field will only give an output, we will check the Output box and set the data type as String. In this case, we will name the new field as Office.
In the Expression field of the Layout Editor, we will define the verbatim string expression for that field. You can either write the expression in the corresponding expression field or click on the ellipses in the left corner of the field to enter the Expression Editor.
The verbatim string expression will be built in the Expression box on the Expression Builder screen.
It is important to note that verbatim string only works with an at the rate (@) sign before the expression and returns the value as it is, throughout the output field.
In this example, we want the following output to be returned in the Office field that we created:
Output:
Office
North Street, 301, San Francisco
The verbatim string expression to achieve this output will be written as:
@”Office
North Street, 301, San Francisco”
5. The output preview will show the verbatim string upon hovering over values in the verbatim field or you can also export it to any destination to view the data.
The error information feature in Astera has now been improved to provide more comprehensive information about errors to the users in real time. The information that it now provides is more precise and specific which helps in debugging errors as you build expression logics.
Let’s work on a few examples to explore how the error information functionality has been improved and how it can help in resolving errors and successfully compiling expressions in the Expression Builder.
For an incomplete expression, the user will instantly be notified that the expression status is Not Compiled and further give a message explaining that there is an unexpected token which expects a further expression for successful compilation.
‘Unterminated String Literal’ means that a string variable is not closed properly, either because of an un-escaped character in it, or a line break. Hence, it won’t compile till the expression is closed.
Successful Compilation of the Error:
The string has now been closed, and the Compile Status has been updated to ‘Successful’.
The ‘Invalid Identifier’ error message appears when a field name is not valid. If you look at the given object, there is no field by the name TitleCourtesy; hence the invalid identifier error. The available field is TitleOfCourtesy.
This concludes using the Language Parser functionality in Astera.
To learn how you can configure a Database Table Source object, click .
Learn how to configure settings for Excel Workbook Destination from .
In this case, we have a simple dataflow designed to perform a data quality check. It contains customers’ data coming in from an . A object is added to validate data for null values and perform warning checks.
To activate this feature, click on the Data Quality Mode icon located at the top of the dataflow designer.
Configure settings for the to save this data.
In this case we have a simple dataflow performing a data quality check process. It contains a Customers dataset stored in an . Then, a is applied to validate data for error and warning checks and finally, data is written to a .
A is applied to identify null records in the Region field as errors, and empty records in the Fax field as warnings. Upon previewing its output you will see that the records that failed to match the rule have returned an error, denoted by a red warning sign.
After configuring settings for the Log object, click on the Start Dataflow icon from the toolbar located at the top of the window. A Job Progress window will open at this instant and will show you the trace of the job.
Here, we will set rules or the data quality criteria. Click this button to add a new rule.
Or you can click this button to enter the Expression Builder window where you can choose an expression from Astera's library of built-in expressions, or you can write one of your own.
You can add as many rules as you want by clicking this button and similarly, you can also delete a rule by pointing it in the grid and then right-click > Delete. In this example, we will work with a single rule which has been set, so let’s go ahead and click OK.
The records that fail to match the data quality rule can be written and stored in a separate error log. Click to learn how you can store erroneous records using a Record Level Log object.
In this case, we are using an .
Configure settings for the object.
Learn to configure settings for a Fixed Length File Destination from .
Note: In this case, the source data of Employees is stored in an Excel file. We will extract it using the object in a dataflow.
Field Name
Source Value
TitleOfCourtesy
Ms.
FirstName
Nancy
LastName
Davolio
City
Seattle
HireDate
01/04/06
Title
Sales Representative