Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Astera is proud to unveil version 10.2 of our industry-leading products offering cutting-edge features and capabilities.
Encompassing numerous areas of operation, the features include:
Microsoft SharePoint functionality, now available via the Cloud Connection object, allows users to easily store and access files, whilst working on the dataflow designer.
A new OCR (Optical Character Recognition) capability that simplifies data extraction and processing.
Additionally, Astera Data Stack's API area has also been upgraded, offering users an even more streamlined experience. Astera is also a one-stop platform within the Astera umbrella that allows users to consume and manage APIs in a code-free environment.
Finally, Astera 10.2 comes with an updated user interface that includes a wealth of UI improvements and bug fixes, enhancing the overall user experience.
The Report Model component in Astera 10.2 has introduced, modified, and enhanced some new and existing features to make the process of data extraction even more flexible and user-friendly.
The highlights of this release include:
Verification of created fields
The addition of the pattern bar context menu
An option to change data region type.
Additionally, we have introduced:
Optical Character Recognition (OCR)
which allows users to read data from PDFs having scanned documents. These new additions have made the experience of capturing data easier than ever before.
Astera now provides the functionality to extract data from PDFs that contain scanned documents using Optical Character Recognition.
When provided such a PDF, the tool recognizes it as an image PDF and automatically starts the OCR process. This option has now made scanned documents available for extraction to users, minimizing the effort of manual data entry from such documents. Users can select the Resolution for OCR, allowing them to get the best result for their documents.
Additionally, to ensure correct data extraction, as noise elements can cause erroneous data to be extracted, an Edit Mode is also available for the users to clean and tweak the extracted data.
Edit Mode allows you to deal with the data as a text file and make changes accordingly.
Report Models now have the functionality to verify if the data fields have been captured properly for all data instances by checking for any non-blank character being adjacent to instances of data fields. This option gives users a one-click check for the data fields they have created.
Additionally, to allow users better visibility of the erroneous fields, navigation between instances of the data field is also provided along with an option to auto-adjust field lengths for all data fields within the selected data region.
Users can now access wildcards and other additional features for patterns in a report model through a context menu by right-clicking on the pattern box.
Now, if need be, users can change a data region to an append region and vice versa within the Model Layout panel.
This allows users flexibility in changing the model layout as they are creating their extraction template.
The multipart format is a way of structuring data in an API request or response that allows multiple files or data types to be transmitted together as a single unit.
In Astera Data Stack for the consumption side, the multipart format can be used to simplify the process of uploading or downloading large files, or when sending a single request that contains both file data and metadata.
Now, you can consume APIs In Astera using an API client that supports multipart content.
AWS Signature authentication is the process of verifying the authenticity of requests made to Amazon Web Services (AWS) using the AWS Signature method.
This authentication process involves calculating a digital signature for each request using the requester’s access key and secret access key, along with details about the request being made. AWS verifies the signature against the user’s access credentials and grants access to the requested resources if the signature is valid.
The AWS Signature authentication method ensures that requests are securely transmitted and that only authorized users can access AWS resources.
NTLM (NT LAN Manager) authentication is a Microsoft proprietary authentication protocol used to authenticate users in a Windows-based network.
It provides secure authentication by using a challenge-response mechanism, where the server sends a challenge to the client, and the client sends a response that is encrypted using a hash of the user’s password.
NTLM authentication is used in various Microsoft products, including Windows, Internet Explorer, and Microsoft Office.
A raw preview request and response feature allows API developers to view the exact request and response payloads being exchanged between clients and servers in their APIs.
This feature provides a detailed look at the headers, body, and metadata of the HTTP request and response, which can help API developers debug issues, test APIs, and optimize performance. By using raw preview request and response capabilities, API developers can gain a deeper understanding of how their APIs are being used and troubleshoot issues quickly and efficiently.
Curl is a command-line tool that can be used to send HTTP requests to APIs and retrieve the respective responses.
It allows API developers and testers to easily interact with APIs and perform tasks such as testing, debugging, and troubleshooting. Curl supports various HTTP methods such as GET, POST, PUT, and DELETE, and can handle HTTP headers, cookies, and authentication.
It is a simple yet powerful tool that is widely used in API development and management.
API logging is the process of keeping track of how an application programming interface (API) is being used.
It helps to understand how often the API is being used, how long each request takes, and any errors that occur. API logging can be used for troubleshooting, monitoring performance, and identifying security threats.
The logs can be stored locally or in a cloud-based system, where they can be analyzed to provide insights.
The Install Manager installs the dependencies required for running auto-generate layout (AGL) and optical character recognition (OCR) on your system/machine.
The auto-generate layout option allows for the generation of an extraction template at the click of a button. With optical character recognition, scanned pdfs can be processed by Astera to get extracted data.
AGL was introduced in Astera 10.0, and OCR has been introduced in Astera 10.2.
When you install the client and the server, you’ll see two install managers (one for the client and one for the server) installed.
You can run this from here directly or go inside the client, when it runs under an Admin account, and go to Tools > Run Install Manager and it would run the install manager to install the dependencies.
Note: If the client and server are on the same machine, then you need to run only one Install Manager out of the two (client and server). However, if the client and server are on different machines, then you’ll have to run the install manager for the client and the server on their respective machines.
You can learn more about the install manager and its setup here.
Project refresh has been greatly improved.
Project loading times have been improved.
A new UI has been implemented for the ‘Add New Item’ window in the Project with better-looking icons and a side panel that shows a description.
Improvements in the Job trace window have been made.
Deployment selections have been improved.
Scheduler refresh has been improved to work more efficiently.
This concludes the release Notes for Astera 10.2.
Astera Data Stack lets users design an API flow, which opens with Request and Response objects already present in the flow. These can then be used in a pipeline to the application of the user.
A simple, configured API flow can look like this:
Note: API Consumption is not a new module, it is already present in Astera Data Stack, but it has also been integrated into Astera Data Stack.
When it comes to the Consumption of APIs, Astera lets users configure an API Connection, its corresponding API Client object, along with an API Browser to maintain various imported or custom API Collections.
Astera Data Stack also makes use of various HTTP methods for Consumption such as GET, POST, PUT, DELETE, and PATCH.
More information related to HTTP Methods can be found here.
Apart from that, the product also offers Pagination functionalities, details for which can be found here.
Enhancements to API Consumption
Pagination
Automated Read till End Options for Page Number, Offset-limit Paginations.
Cursor Pagination using body fields.
Pagination support added for POST Requests
OAuth2
OAuth2 token caching and auto-refresh.
New features of API Consumption
OAuth2 and E-Tag
OAuth 2 Grant Flow: Authorization Code with PKCE
E-tags for request caching and concurrency control
C-API Connectors
Create and manage Custom C-API Connectors
CAPI connectors library includes:
Zendesk Support
Zendesk Sales CRM
HubSpot CRM
Box API
Other New Features
Auto-redirect API calls.
Use of Default browser for authentication as an alternative to Embedded Browser.
Import postman API collection.
The Server Browser in Astera Data Stack lets the user manage and publish APIs once they have been deployed.
A wide range of functionalities are offered in the deployment of APIs, including setting authentication and security functions.
Within the Server Browser, user roles can be assigned, and specific resources can be provided to each role, respective to their area of application. The Server Browser also has the feature to auto-generate a swagger Url.
Once the APIs have been deployed, they can also be viewed in a dashboard present in Astera Data Stack.
Finally, users can also utilize Astera for logging and tracing.
Astera 10.3 is here, brimming with excitement and a sea of new features!
With Astera's new AI Automapping feature, field mapping becomes easier, alongside seamless connectivity through providers such as Azure SQL, Google BigQuery, and many more.
Astera’s AI-powered data extraction is a game-changer, while Astera Data Stack's warehousing component introduces AI-Select and more. Unlock the full potential of your APIs with enhanced connectivity options and new features in Astera Data Stack.
Finally, experience the power of the Data Analytics Workbench, and refine your data with ease using Astera Data Prep. Elevate your data game with Astera 10.3!
Windows authentication is a security feature that allows users to log into a system using their Windows credentials.
Astera Data Stack leverages this authentication method and provides the option to register new users using Windows authentication through its Server Browser interface.
In response to Microsoft Outlook discontinuing support for Basic Authentication and advocating the use of Modern Authentication, the SendMail object in workflow tasks has been updated.
You can view what Microsoft had to say about this here.
It now incorporates a new feature of Modern Authentication, enabling users to seamlessly add authentication credentials directly from within the SendMail object.
This enhancement simplifies the process and ensures compatibility with the latest authentication standards recommended by Microsoft.
The option has also been added in the Cluster Settings as seen below:
Existing Astera customers can easily upgrade to version 10.3.1 by executing an exe. script, which automates the repository update to the latest release.
This streamlined approach enhances the efficiency and effectiveness of the upgrade process, ensuring a smoother transition for users.
Note: This upgrade applies to v10.0 and later ones. Previous versions cannot be upgraded and will need a clean repository as part of the upgrade.
The Catalog feature in Astera Data Stack is a centralized repository where one can store artifacts and share them with users as per the application.
The artifacts within the catalog are stored in a Catalog table. The security aspect of the Resource Catalog lets the user give permission to only the people whom they wish to share the artifacts in the catalog with.
For more information on Resource Catalog, please visit the documentation site here.
With the release of Astera 10.3, there have been quite a few new developments in the connectors’ domain.
Microsoft SharePoint is an enterprise document management and collaboration platform that helps organizations manage critical content. Its enterprise content management capabilities streamline flows and centralize important content to enhance collaboration.
The SharePoint provider is present within the Cloud Connection object dropdown.
Simply drag and drop the Cloud Storage Connection object from the toolbox onto the designer.
Right-click on the object header and select Properties from the context menu.
This will open a new window.
Select Microsoft SharePoint Document Library from the Providers dropdown menu.
Note: The SharePoint connection can be accessed in any object where Cloud files are available.
You can learn more about the SharePoint connector here.
The Azure SQL Database is a fully managed platform as a service (PaaS) engine that handles most database management functions such as upgrading, patching, backups, and monitoring without user involvement.
In Astera Data Stack, users can access Azure SQL Databases using Database Table Source or Database Table Destination, DB lookup, SQL Statement Lookup, and Database Write Strategies objects. They can also connect with the Run SQL Script task in a workflow.
Google BigQuery is a serverless, highly scalable data warehouse that comes with a built-in query engine. The query engine can run SQL queries on terabytes of data in a matter of seconds, and on petabytes in minutes.
This kind of performance is achieved without having to manage any infrastructure and without having to create or rebuild indexes.
In Astera Data Stack, users can connect with Google BigQuery as a database source or destination. As a source, flat and hierarchical data both can be read. Only flat data can be written to the Google BigQuery destination.
Azure Data Lake Gen 2 is a cloud-based big data storage and analytics solution provided by Microsoft Azure. It offers scalable and cost-effective storage for structured, semi-structured, and unstructured data.
The Azure Data Lake Gen 2 provider is present within the Cloud Connector object in Astera.
Improvements have been made to the user-friendliness of the Cloud Browser.
Changing the Cloud Connection, browsing cloud folders has been made easier in Astera 10.3. Cloud Browser’s functionality with SharePoint has also been improved.
In the File System Items Source object, multiple filters are supported for both local and cloud connections.
In Astera 10.3, the overall user interface of Astera has been improved and revamped. There are quite a lot of new features being introduced in this version. Let us take a look at them.
To provide users with greater control over their report editor, we have introduced a new feature called Pages to Read. This feature enables users to filter and selectively display specific PDF pages within the report editor, ensuring a focused and efficient document analysis experience.
The Auto Create Table option empowers users to select and create tables within the document seamlessly. With this addition, users can automate the table creation process, saving valuable time and effort.
The Report Model Path Parameterization feature introduces enhanced flexibility to dataflows by enabling users to specify a customizable Report Model path in addition to the Report Source path. This crucial enhancement allows for runtime parameterization, granting the ability to dynamically change the Report Model or apply different templates to various report sources.
As a result, automating data extraction from Report Models becomes significantly easier, streamlining the overall data processing and analysis workflows. This feature empowers users with increased control and adaptability, facilitating more efficient and versatile data-driven decision-making processes.
Astera now uses AI to recommend report model templates, allowing you to automatically generate models for multiple source files at once. By specifying the layout and document type, Astera recommends the most suitable model templates, saving you valuable time and energy when building your data extraction processes.
With this new feature, you can streamline your workflow and eliminate the need for manual data extraction. In this document, we will see how to use this feature to create the report models.
To view more information on AI-Powered Data Extraction, click here.
Astera now provides the functionality to extract data from PDFs that contain scanned documents using Optical Character Recognition.
When provided such a PDF, the tool recognizes it as an image PDF. However, the Use OCR option must be enabled manually by the users first. This option has now made scanned documents available for extraction to users, minimizing the effort of manual data entry from such documents. Users can select the Resolution for OCR, allowing them to get the best result for their documents.
Additionally, to ensure correct data extraction, as noise elements can cause erroneous data to be extracted, an Edit Mode is also available for the users to clean and tweak the extracted data.
Edit Mode allows you to deal with the data as a text file and make changes accordingly.
To learn more about loading PDFs with OCR, click here.
Report Models now have the functionality to verify if the data fields have been captured properly for all data instances by checking for any non-blank character being adjacent to instances of data fields. This option gives users a one-click check for the data fields they have created.
Additionally, to allow users better visibility of the erroneous fields, navigation between instances of the data field is also provided along with an option to auto-adjust field lengths for all data fields within the selected data region.
To learn more about Data field verification, click here.
Users can now access wildcards and other additional features for patterns in a report model through a context menu by right-clicking on the pattern box.
Now, if need be, users can change a data region to an append region and vice versa within the Model Layout panel.
This allows users flexibility in changing the model layout as they are creating their extraction template. To learn more about this feature, click here.
Astera Dataprep is a dynamic platform designed for rigorous data cleansing, transformation, and preparation activities. With its user-oriented and preview-focused interface, it offers significant functionality and visibility to streamline the preparation process.
The system allows for quick operations by seamlessly interchanging between scripting and point-and-click methodologies. Serving as a crucial intersection of data engineering and data science, Astera Data Prep is an invaluable tool in any data-driven operation.
Astera Dataprep is a data manipulation tool that offers interactive data correction, ATL scripting with auto-completion, and UI-ATL synchronization. It provides smooth navigation, action history tracking, and comprehensive data quality assurance.
The tool promotes scripting efficiency and reusability with template scripts and supports real-time visual insights for data analysis. It enables rich data transformations, including resolving cardinalities and merging datasets. Astera Dataprep streamlines data preparation, enhances productivity, and ensures data accuracy and integrity.
Dataprep offers 60+ ATL commands to enable a comprehensive set of data preparation strategies. ATL is a smart scripting language integrated with IntelliSense, where data engineers benefit from script auto-completion, reducing the need for constant reference to documentation. This propriety language can generate code snippets, enabling users to fill in the required fields effortlessly.
This streamlined approach enhances productivity, enabling data engineers to focus on the specific requirements of their data preparation tasks without the burden of repetitive syntax or command structure.
It hosts all ATL commands and command-related operations. It is a multipurpose artifact that also serves as a preparation process navigation browser.
The Data Source Browser, while not a new feature, is a vital part of Dataprep. It hosts all file sources, catalog sources, cloud sources, and project sources to be imported into the Dataprep artifact.
Note: While the Data Source Browser is essential in Dataprep, it is not specific to it.
Dataprep scripts are reusable and hence can be used as a source as well as a transformation in other artifacts such as dataflow, workflow, and analytics workbench.
The grid view hosts a preview-centric interactive grid that automatically updates in real-time to display the transformed data upon each transformation/modification. It is a dynamic grid that provides instant feedback on data quality.
The Dataprep Profile Browser is a side window providing a comprehensive view of the data with graphs, charts, and field-level profile tables. It keeps a check on data health and highlights the presence of invalid entries, missing values, duplicates, etc.
Within Dataprep, this is a borderless, and headerless 2x2 grid that enhances the experience of data reading, joining, union, and lookup.
Users can navigate smoothly using point-and-click actions in the ATL editor. This includes action history tracking, allowing users to review and backtrack changes made during the data preparation process for transparency, editing, and control.
Astera Data Stack's Data Model component has also introduced a handful of new features for the Astera 10.3 release.
The AI Select feature in Astera Data Stack assists users in identifying potential Fact and Dimension candidates from their selected entities. To do that, first users can select Build Dimensional Model from the main menu bar.
In cases where users are unsure about classifying entities as Facts or dimensions, this feature leverages AI capabilities to automatically determine the appropriate classification, streamlining the data modeling process.
Astera Data Stack incorporates an advanced AI-powered feature enabling users to deduce relationships between entities. This capability extends to recognizing self-referencing relationships as well as associations between the fields of different entities.
By leveraging AI algorithms, Astera Data Stack facilitates automated inference of intricate entity relationships for efficient data modeling.
With the release of Astera 10.3, the area of APIs also brings a plethora of new features and improvements with it. Astera is a robust platform that enables seamless integration and efficient management of APIs.
It provides a comprehensive set of tools to create, publish, secure, and monitor APIs.
The API browser provides a convenient option to import pre-built and pre-tested CAPI connectors directly from Astera’s GitHub repository. These connectors are carefully curated and include a comprehensive list of endpoints that have been thoroughly tested and configured for seamless consumer use.
This option allows users to easily access and integrate these connectors into their projects, ensuring reliable and efficient connectivity with the associated APIs.
A developer portal, also known as an API portal or API developer portal, is a website or platform that serves as a central hub for developers who are interested in consuming or integrating with APIs. It provides documentation, and support for developers to understand, explore, and use APIs effectively.
Astera 10.3 brings a beta release of this portal.
Multipart/form-data is a MIME (Multipurpose Internet Mail Extensions) media type used for sending binary data or files along with other form fields in HTTP requests.
The Request Publish object now supports this format type, allowing the designing of APIs that function to upload files and download files.
Client certificates are digital certificates that are used by clients (such as web browsers or client applications) to authenticate themselves to a server during a secure communication process, typically over HTTPS (HTTP over SSL/TLS).
These generate, import, and export options allow us to manage client certificates effectively and integrate them into our security infrastructure.
We have integrated our tool with the Swagger UI component, allowing us to display the Swagger files of deployed APIs in a well-formatted and user-friendly manner.
This integration provides an enhanced user interface and experience for viewing and interacting with the API documentation.
We have introduced the option to generate the test flows from the server browser. The test flow can now be generated after the API(s) deployment.
We can either create a test flow for a singleton API for the entire group of APIs.
The multipart format is a way of structuring data in an API request or response that allows multiple files or data types to be transmitted together as a single unit.
In Astera for the consumption side, the multipart format can be used to simplify the process of uploading or downloading large files, or when sending a single request that contains both file data and metadata.
Now, you can consume APIs In Astera Data Stack using an API client which supports multipart content.
AWS Signature authentication is the process of verifying the authenticity of requests made to Amazon Web Services (AWS) using the AWS Signature method.
This authentication process involves calculating a digital signature for each request using the requester’s access key and secret access key, along with details about the request being made. AWS verifies the signature against the user’s access credentials and grants access to the requested resources if the signature is valid.
The AWS Signature authentication method ensures that requests are securely transmitted and that only authorized users can access AWS resources.
NTLM (NT LAN Manager) authentication is a Microsoft proprietary authentication protocol used to authenticate users in a Windows-based network.
It provides secure authentication by using a challenge-response mechanism, where the server sends a challenge to the client, and the client sends a response that is encrypted using a hash of the user’s password.
NTLM authentication is used in various Microsoft products, including Windows, Internet Explorer, and Microsoft Office.
A raw preview request and response feature allows API developers to view the exact request and response payloads being exchanged between clients and servers in their APIs.
This feature provides a detailed look at the headers, body, and metadata of the HTTP request and response, which can help API developers debug issues, test APIs, and optimize performance. By using raw preview request and response capabilities, API developers can gain a deeper understanding of how their APIs are being used and troubleshoot issues quickly and efficiently.
Curl is a command-line tool that can be used to send HTTP requests to APIs and retrieve the respective responses.
It allows API developers and testers to easily interact with APIs and perform tasks such as testing, debugging, and troubleshooting. Curl supports various HTTP methods such as GET, POST, PUT, and DELETE, and can handle HTTP headers, cookies, and authentication.
It is a simple yet powerful tool that is widely used in API development and management.
API logging is the process of keeping track of how an application programming interface (API) is being used.
It helps to understand how often the API is being used, how long each request takes, and any errors that occur. API logging can be used for troubleshooting, monitoring performance, and identifying security threats.
The logs can be stored locally or in a cloud-based system, where they can be analyzed to provide insights.
The following feature of Astera Data Stack has had its support added to the product.
XML/Soap APIs in the API Client
The Data Analytics component of the Astera Data Stack brings us the Data Analytics Workbench, alongside quite a lot of new features. From Linear Regression to Distribution Plot objects, this component has quite a lot to offer to the users.
The Analytics Workbench is a powerful tool for designing and visualizing data science models and analytical graphs. With its intuitive drag-and-drop interface, users can easily construct complex analytical workflows and explore data patterns.
This artifact streamlines the process, enabling efficient data analysis and visualization in a user-friendly manner.
The SMD designer offers a grid-based interface that facilitates the design and visualization of models using drag-and-drop functionality. Users can easily arrange components, connect data flows, and configure properties to create sophisticated models.
This intuitive interface enhances the ease and efficiency of model design and visualization tasks.
The Linear Regression object empowers users to establish and model the correlation between a quantitative response variable and one or multiple independent variables. It achieves this by fitting a linear equation to the provided data.
Linear regression is a diagnostic and predictive analytics technique, that offers insights into data relationships and makes future predictions based on observed patterns.
In Analytics Workbench, users have the flexibility to choose between four model estimation types:
Ordinary Least Square
Weighted Least Square
Generalized Least Square
Penalized Least Square
A Decision Tree is a supervised learning algorithm utilized for classifying a dependent variable based on features within a dataset. In the Analytics Workbench, the Decision Tree object offers users a range of options such as test-train split configurations, null value handling, scaling and normalization techniques, decision tree criteria and splitting methods, as well as pruning strategies.
These features enhance the flexibility and customization of decision tree-based classification tasks.
The Generalized Linear Model object enables users to define a flexible generalization of linear regression that allows for response variables with error distribution models other than a normal distribution.
There are two components of the Generalized Linear Model (GLM).
Family Parameter
Gaussian
Binomial
Poisson
Gamma
Link Function
Identity
Log
Probit
Logit
Square Root
Inverse
A Generalized Linear Model uses a specific combination of the link functions and family parameters for a suitable fit to the data.
The Pre-Analytics Testing object wraps several statistical tests, that a user performs on the data, to determine an accurate statistical model to fit the source data.
Hence, the Pre-Analytics Testing object presents users with established parametric and non-parametric tests to evaluate data on these assumptions.
The Pre-Analytics Testing object hosts the following tests and graphs on each screen,
Heteroscedasticity scatter plot
Multicollinearity bar chart
Outlier Detection box plot
Normality Detection histogram
The Correlation Analysis object enables users to compute Covariance and different types of Correlation such as Heterogenous Correlation, Partial Correlation, and Correlation - Significance Level between data fields.
The strength of the association is measured by computing correlation coefficients. In Analytics Workbench, users have the option to compute different types of correlation coefficients.
A contingency table, also known as a cross-tabulation or a two-way frequency table, is a table used to summarize the relationship between two categorical variables. It displays the distribution of the data by counting the number of occurrences of each combination of categories.
In the analytics workbench, we support multiple contingency types,
Frequency
Probability
Percentage
The Distributional Plots object allows users to visualize categorical data variables using mainstream plots such as,
Bar charts,
Pie charts,
Histograms,
Frequency Polygons,
Spike plot
The Distributional Plots object has a drill-down functionality and an interactive interface with several configuration options. It is used to visualize a general profile of the user’s data.
The Basic Plots object allows a user to understand and analyze their data/transformations through visual graphs such as
Line charts.
Scatter plots.
They provide interactive visuals with growing capabilities and features that enable an in-depth understanding of the nature of a user’s data and its trends.
The Predictive Analysis object helps users predict the behavior of the dependent variable on a given test dataset. To make predictions, the object requires information about the analytical model fitted on a particular training dataset.
Once the champion model is selected in the analytics workbench, we can then use this workbench in the dataflow in the predictive analysis object for predictions.
The Undo/Redo manager allows users to undo/redo any actions that they have performed in the analytics workbench.
The Install Manager installs the dependencies required for running auto-generate layout (AGL) and optical character recognition (OCR) on your system/machine.
Auto-generate layout allows for the generation of an extraction template at the click of a button. With optical character recognition, scanned pdfs can be processed by Astera to get extracted data.
AGL was introduced in Astera 10.0, and OCR has been introduced in Astera 10.2.
When you install the client and the server, you’ll see two install managers (one for the client and one for the server) installed.
You can run this from here directly or go inside the client, when it runs under an Admin account, and go to Tools > Run Install Manager and it would run the install manager to install the dependencies.
Note: If the client and server are on the same machine, then you need to run only one Install Manager out of the two (client and server). However, if the client and server are on different machines, then you’ll have to run the install manager for the client and the server on their respective machines.
You can learn more about the install manager and its setup here.
Project refresh has been greatly improved.
Project loading times have been improved.
A new UI has been implemented for the ‘Add New Item’ window in the Project with better-looking icons and a side panel that shows a description.
Improvements in the Job trace window have been made.
Deployment selections have been improved.
Scheduler refresh has been improved to work more efficiently.
This concludes the release notes for Astera 10.3.
Astera 10.5, where precision meets innovation in data governance. Unleash the power of smooth data management with our Governance platform's features: From intuitive UI enhancements to advanced AI-driven data enrichment and profiling.
Astera's Access Management ensures secure data marketplace navigation, while our Business Glossary, generated intelligently by AI, adds clarity to your vocabulary.
Experience a refined user interface, and optimized AI functionalities, setting a new benchmark in data governance.
Elevate your data trek with Astera 10.5 – where efficiency, visibility, and performance converge effortlessly, all within an intuitive drag-and-drop interface.
Advanced search and filtering capabilities to find exactly what the user is looking for. The Astera Governance platform offers these features and much more.
We've integrated AI to automatically generate business titles, descriptions for assets/artifacts, and field descriptions, enhancing clarity and efficiency in our processes.
Data profiling refers to the process of examining, analyzing, reviewing, and summarizing data sets. The Astera Governance platform offers advanced functionality that encompasses Data Profiling.
Data quality is a measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability, and whether it's up to date.
A business glossary is a set of data-related terms and definitions. The Astera Governance platform also supports a business glossary, which is generated using AI.
The Access Management section of the Astera Data Governance platform lets the owner set resources to each of the users, based on their roles.
Enhancements have been made to the Cloud Browser functionality, refining source parameter usage in FTP List, and improving password handling in the File Transfer Task.
Improvements in user experience include better informative messaging during server downtime and streamlined communication for file uploads/downloads.
User interface improvements encompass enhanced homepage pagination. The Edit Toolbar button has a polished appearance, and project explorer panels feature icons for improved visibility.
The Ask AI feature has been enhanced, and UI improvements contribute to a more user-friendly experience. Forward/back buttons in the Job Progress window have been improved, enhancing usability.
Improvements in the AI Mapper performance and error handling have been made. The 'Build Using AI' feature has seen refined UI improvements for seamless user interaction.
AI Mapper optimizations enhance performance, and parsing of Functional Groups in EDI Source has been improved.
UI improvements for the Build Using AI feature have been made. Cluster repository building now features clearer communication.
For existing customers, a repository upgrade and service restart are required. It is recommended that upgrading customers use the Repository Upgrade Utility.
This concludes the Astera 10.5 Release Notes.
Astera 10.4 revolutionizes data management with Google BigQuery and Azure Data Lake connectors, Excel Worksheet Parameterization, and AI-driven mapping precision.
Furthermore, explore advanced API Management capabilities, Cloud Browser in Scheduler, refined AI features in Astea Report Model, and innovations like GraphQL APIs.
Experience a data revolution with Astera 10.4's user-friendly drag-and-drop interface, simplifying complex tasks without the need for coding. Elevate your data journey as the intuitive design puts advanced management capabilities at your fingertips, ensuring a seamless and efficient process.
Unleash the potential of your data management with this intuitive and transformative release.
The Worksheet Parameterization feature empowers users to efficiently extract data from specific worksheets in an Excel file through convenient looping mechanisms. When accessing the configuration parameter screen, the option to define the desired worksheet is displayed.
This selection is made flexible and customizable as the worksheet option can be parametrized using variables or a configuration file. This comprehensive feature enhances data retrieval processes, simplifying and optimizing data handling from multiple worksheets in Excel.
In Astera Data Stack, the AI Automapper utilizes semantic relationships to facilitate field mapping. By analyzing the context and meaning of the fields, it intelligently establishes connections and maps them accordingly.
This advanced approach streamlines the process, enhancing accuracy and efficiency in field mapping tasks.
The grid has been made more aligned by making it easier to sort and filter the columns. Connected to an easy interface and to the expression language, this enhancement is a step closer to making it more streamlined.
MongoDB is a document-oriented database in which one collection holds different documents. The MongoDB Destination object in Astera Data Stack provides functionality to write data onto it. This component provides functionality to control how data should be written in collections.
In Astera Data Stack, users will be able to connect with Google BigQuery as a database source or destination. As a source, both flat and hierarchical data can be read. For destination, flat data can be written to Google BigQuery.
Azure Data Lake Gen 2 is a cloud-based big data storage and analytics solution provided by Microsoft Azure. It offers scalable and cost-effective storage for structured, semi-structured, and unstructured data.
The Azure Data Lake Gen 2 provider will be present within the Cloud Connector object in Astera.
In Astera API Management, the user can now make API requests in any content type by providing the content type along with its serialized content string.
In Astera API Management, the user can now make API requests using the application/x-www-form-urlencoded payload content type.
Define a custom response for your deployed APIs when the flow processing does not give an output. This gives developers flexibility in designing APIs according to the desired standards.
Improvements have been made to the user-friendliness of the Cloud Browser. Cloud Browser’s support has been added to the Scheduler.
After utilizing the AI Feature to generate the layout, we noticed that certain fields were being replicated within the table collection region.
However, through thorough investigation and refining our prompt, we managed to significantly reduce field duplication by around 80% based on our analysis of the enhancements implemented.
The output generated by our AI Feature used to be stored in a system folder, but there has been a modification to this functionality. Now, the cache is stored in a repository table named AICache.
We've now incorporated an option to sort or unsort the layout of the model. This enhancement adds a sorting feature accessible through the context menu in various regions. By clicking the new sort button, users can conveniently arrange the created fields in alphabetical order.
These changes will be immediately reflected in the data preview as well, and this sorting functionality is applicable across all types of regions, ensuring a more organized experience.
We have added an additional document type support for our AI Feature to extract data and create report models from Account Statements.
GraphQL APIs are a query language and runtime for APIs, enabling clients to request the data they need, reducing over-fetching and under-fetching. They offer a more efficient, flexible, and self-documenting approach to data retrieval and manipulation compared to traditional REST APIs.
Astera API Management now supports the use of GraphQL APIs as the input content type for an API. The feature can be seen within the Input Content Type drop-down menu of the API Client object.
Astera now provides the functionality to extract data from PDFs that contain scanned documents using Optical Character Recognition. This option has now made scanned documents available for extraction to users, minimizing the effort of manual data entry from such documents.
Users can select the Resolution for OCR, allowing them to get the best result for their documents.
The Toast Notification feature in Astera Data Stack has seen quite a lot of improvement and enhancement, making it more efficient and seamless.
In Astera Data Stack, the AI mapping approach has been changed. The product is now implementing a Waterfall model for this feature.
In Astera 10.4, many bugs have been addressed user experience has been improved by enabling automatic refresh after actions like adding a catalog item. Other than that, manual refresh is no longer required.
Support for the Report Source object has now been added to the resource catalog.
The Azure SQL Database is a fully managed platform as a service (PaaS) database engine that handles most of the database management functions such as upgrading, patching, backups, and monitoring without user involvement.
In Astera 10.4, the Azure SQL Server authentication is available.
This concludes the Astera 10.4 release notes.
The Astera Data Stack has evolved quite a bit in recent years. Astera 10.1, our latest release, brings an armada of new features and enhancements, including some new connectors, GIT functionality, and more!
Astera 10.1 also brings Astera's API Component to the user. With Astera, users can now enjoy the competitive edge of Astera’s powerful ETL engine to create code-free integrations and publish them through natively designed REST APIs to enable real-time data sharing between different departments within an organization, across platforms, or external partners without compromising security.
To further talk about the details of each component, here is what Astera 10.1 has to offer.
MongoDB Connector (Beta)
MongoDB is a source-available cross-platform document-oriented database program. It lets the user store data in flexible, JSON-like documents with optional schemas. This means that fields can vary from document to document and data structure can be changed over time.
With the release of Astera 10.1 comes the MongoDB Connector to the Astera toolbox. This will allow the user to configure a MongoDB Server as a source when creating ETL pipelines on the Astera designer.
The MongoDB Source object can be brought onto the designer through drag and drop. It looks like this:
Once dropped, it can then be configured through the Properties menu by right-clicking on the object and selecting Properties. As you can see below, the connector asks the user to input a lot of different values,
From entering the address of the Primary Server Cluster connection to allowing the object to have a read preference, the MongoDB Source object allows for a range of functionality.
The object is useful for using MongoDB as a source to perform all kinds of transformations as well as loading onto destinations.
Between the configuration of the MongoDB Source is the presence of an in-built filter using which data can be transformed as well,
Parquet Connector (Beta)
Apache Parquet is a column storage file format used by Hadoop systems such as Pig, Spark, and Hive. The file format is language-independent and has a binary representation. Parquet efficiently stores large data sets and has the extension of “.parquet”.
Astera 10.1 also brings the Parquet Source and Parquet Destination connectors to the Astera toolbox. The user can simply drag and drop the respective objects and configure them to read and write from and onto Parquet format files.
Some key features of Parquet, concerning Astera, are,
It encodes the data.
It stores data in a column layout.
It offers the option of compression with a lesser size post-compression.
Parquet File Source
The Parquet File Source object can be found in the Sources section of the Toolbox.
It can then be configured by opening the properties. Simply right-click on the object and select Properties from the context menu.
Parquet File Destination
Similarly, Astera also brings the Parquet File Destination object to the Destination section of the Toolbox.
It lets the user fetch and map data from various kinds of sources that the Parquet format supports.
Once we open the properties of the Parquet File Destination object, we can view the following:
As you can see above, the object even lets the user select from compression methods which include:
Snappy
Gzip
For more information on the Parquet File Source and Parquet File Destination objects, click on the links below,
GIT in Astera Data Stack
GIT is an essential part of data integration and has a high application in the industry. It allows you to create a repository, clone a repository as well as create branches from the master branch to work on. With that essential a tool, Astera 10.1 saw fit to introduce the functionality of GIT in Astera.
Astera provides GIT options where the user can create and work from branches, push and pull from a remote location and make changes, all by their application.
Within Astera, repositories can be cloned and opened:
As you can see above, Astera offers every essential GIT functionality that the user can employ, including Fetch, Merge, and Clone as well as viewing Branch History.
GIT in Astera also lets the user resolve any conflicts that may arise between branches.
Conflict resolution in GIT makes it more feasible for the user to keep track of what branch contains which information.
Full Client with Built-in Server for Centerprise Student
Astera 10.1 brings Centerprise Student, a full client with a built-in server. This means that students, when using Astera, don’t have to install client and server applications separately.
It certainly makes operations more convenient.
Full Client with Built-in Server for Report Models
Astera 10.1 brings the availability of a full client which has a built-in server for Report Models.
This means that users do not have to install the Server and Client applications separately, but rather just the convenient installation of a single server-client integrated application.
RM Enterprise
With the release of Astera 10.1, we bring you Report Models. It lets the user access the complete functionality of Astera's Report Model module, with a separate server for Client-Server communication.
RM Enterprise offers the full services of this module, including defining a report model to create a reusable extraction template.
Astera is a one-stop platform that allows the user to Consume and Manage APIs in a code-free environment.
With Astera, the user is provided with both a client and a server application to install. Instead of the Integration Server, Astera utilizes an installer called the Astera Server.
Once both are installed, the user can access all features of the product.
API Consumption
Astera 10.1 brings a list of enhancements and new features to the API Consumption.
For existing customers of Astera 10.0 and earlier versions, a repository upgrade is required, alongside the re-generation of the project “.car” files and re-deployment, to upgrade to 10.1. They can update to 10.1 from any other earlier versions too.
This concludes the round-up of new features and improvements in Astera’s 10.1 line-ups.
Astera’s data management platform has grown by leaps and bounds over the past couple of years, and things are no different this time around. The 10.0 release for Astera is focused on improvements and fixes to further enhance user experience. Moreover, we have added some key new features to the platform, including cloud functionality and a beta version for Auto-Generate Layout (AGL). Read on for more details!
The Report Model component in Astera 10.0 has introduced, modified, and enhanced some new and existing features to make the data extraction process even more flexible and user-friendly. The highlights of this release include an AI-enhanced feature of Auto-Generate Layout (Beta) which allows users to create a report model with a single click without having to create data regions or fields manually. Moreover, we’ve also introduced improved options and functionalities for field and region properties. These new additions have made the experience of capturing unstructured data easier than ever before.
Here’s what is new and improved.
Auto-Generate Layout (Beta)
Report Models now provide the functionality of auto-creating the data regions and data fields with just one click. This feature automatically recognizes name-entity pairs and tabular data regions and captures fields in the respective regions. This option makes the extraction process much more efficient as it minimizes the effort of designing report models from scratch. Additionally, to make the extraction template more robust and customized, users can further tweak the auto-generated layout option to fit their requirements.
The following sub-components help in the further customization of the auto-generated layout:
Auto-Generate Table (Beta): Users can now create a tabular data region by selecting the area on the report model’s designer and clicking on this option. The tool automatically identifies the pattern and auto-creates fields within the data region.
Auto-Create Fields (Single Instance) (Beta): Users can create single instance fields by selecting the values on the designer and clicking on this option to automatically create the fields.
Defining Comma Separated Values in Start Position
Users will now have the ability to specify multiple strings to define the Start Position of a data field in the Field Properties panel. This feature is particularly useful when the unstructured document does not have a consistent format or a fixed pattern.
Note: This option is only available when the Follows String In Current Line or the Follows String In Previous Line option is selected.
Case Sensitive and Regular Expression Checkbox
We have introduced two new checkboxes under the Size and Position section of the Field Properties panel:
Case Sensitive: To make the searching for the start position string case sensitive.
Regular Expression: To specify a regular expression as the starting position of the data field.
Note: These checkboxes appear only for the Follows String In Current Line and Follows String In Previous Line options.
Remove Specified String
We have introduced a new checkbox for Specific Strings under the Remove section of the Field Properties panel. Here, you can specify a string that you want to remove from the data field. You can also define multiple strings separated by commas to remove them from all the records for a particular field.
Calculation Box
The UI for the Formula Field has been improved. You can now see the calculation box in the Field Properties panel. You can also make any changes to the expression written inside the box by clicking on the ellipses option on the left. This will take the user to the Calculate Field Properties window where they can choose from the built-in library of functions.
Reorder Field Position
You can re-arrange the positions of data fields within a region under the Model Layout tab using a simple drag-and-drop functionality. This allows users to manage the order of the fields/columns while previewing data or when writing to a destination.
Region End Type - Till Regular Expression/Specific Text
Users can define the endpoint of a region by specifying a particular regular expression or a specific string text. The tool ends the region (by searching the position) on the line where the text/regular expression is found. This allows the user to capture specific areas within an unstructured document in a more robust and flexible manner.
The following is an overview of what is being rolled out in the new and improved version of Astera’s platform, including all of its components. This includes features related to cloud accessibility, security, the user interface.
Users can now browse files from the cloud and write to files that are stored in a cloud destination. 10.0 supports two kinds of cloud connectors, Amazon S3 and Microsoft Azure Blob. There are several ways of connecting to the cloud, such as, by using the Cloud Storage Connection object in a Shared Action (.sact), or by clicking on the Browse Cloud Files option available in certain objects/tabs when selecting a source file.
In case a user forgets their password, they can now utilize the password recovery feature. After verifying an admin email, users can click on the Forgot Password option at the time of login. An OTP is sent to their verified email, allowing them to reset the password.
We’ve made some enhancements to the look and feel of the product. Since it is entirely UI-based, it is of the utmost importance that the UI makes the user feel the ease of access that they expect from it.
Here’s a list of the UI components that have been improved:
Wizards
Job Monitor
Server UI
Job Progress Window
Scheduler UI
Verification Errors/Messages
The Security and User Management area has also been upgraded:
Issues regarding User Roles have been fixed.
User Credential security has been upgraded.
Server configuration and deployment, specifically cloud deployment, was a big focus of this release. We’ve fixed several issues that users were encountering when deploying the server on cloud.
This concludes the round-up of new features and improvements in Astera’s 10.0 line-up.
Note: The overall speed and performance of the application depend on the configuration of your machine. More memory and higher processing speed on the system will result in faster performance, especially when transferring large amounts of data as the application takes advantage of the multicore hardware to parallelize operations.
Astera Data Stack is built on a client-server architecture. The client is the part of the application which a user can run locally on their machine, whereas the server performs processing and querying requested by the client. In simple words, the client sends a request to the server, and the server, in turn, responds to the request. Therefore, database drivers are installed only on the Astera Data Stack server. This enables horizontal scaling by adding multiple clients to an existing cluster of servers and eliminating the need to install drivers on every machine.
The Astera client and server applications communicate on REST architecture. REST-compliant systems, often called RESTful systems, are characterized by statelessness and separate concerns of the client and server, which means that the implementation of both can be done independently if each side knows what format of messages to send to the other. The server communicates with the client using HTTPS commands, which are encrypted using a certified key/certificate signed by an authority. This saves the data from being intercepted by an attacker as the plaintext is encrypted as a random string of characters.
Astera 10 is a major release and not a direct upgrade of Astera 9. This means that migrating from Astera 9 to Astera 10 won’t require you to uninstall Astera 9 since Astera 10 can be installed side by side with Astera 9 on a system. Astera 10 is backward compatible, and therefore, most of the flows created on Astera 9 can run on Astera 10 without modifications. However, the deployment archives (*.car files) created with previous versions are not directly compatible with v10.x. All the deployments must be regenerated and deployed again in the latest version. As with most major releases of any complex software, we recommend that you upgrade your lower-level environment first, so you have an opportunity to test and verify any existing flows. This will make it possible to identify any migration issues you may encounter early in the upgrade cycle.
Note: In this document, we will show how you can migrate from Astera 9 to the all-new Astera 10. However, you can follow these steps to upgrade from version 8 to version 10 as well.
The installation package for Astera 10 (64-bit) contains two setup (.exe) files:
AsteraDataIntegrator.exe – for Astera client
IntegrationServer.exe – for Astera Integration Server
The setup files for Astera 10 can be downloaded from the following location:
to learn how to install Astera 10 client and server.
Like Astera versions 8 and 9, Astera 10 also comes with a single licensing key (for server) rather than two separate keys for Astera server and client. However, the licensing key for Astera 10 has changed which means you cannot use your Astera 9 key to register Astera 10. The single licensing key for Astera 10 is used to register Astera server and it controls how many clients can connect to the server as well as the functionality available to the connected clients.
If you’re planning to migrate to Astera 10, please contact to get your new Astera 10 license key.
Astera 10 client can be configured with multiple different servers, however, it can only connect with one server at a time. The jobs scheduled, queued, or running on the other registered server will continue to run without interruption even if the client is not currently connected to the server.
All servers pointing to a single repository database will form a cluster of servers sharing the common workload of queued jobs. A cluster of v10 servers you configure will be up and running and processing jobs in a similar way to v9. You can see which server in the cluster has actually processed a job by right clicking the DEFAULT node in Server Explorer and opening the Job Monitor window.
Astera provides an option to make an in-place repository upgrade which means that if you don’t wish to set up a new repository when you shift to v10, you can just use your existing one from the previous version and upgrade its cluster database. All the jobs scheduled in the repository used in v9 will appear in v10 after the upgrade.
The following steps explain how to upgrade an existing repository for migration from v9 to v10:
In Astera 10, go to Server > Manage > Upgrade Cluster Database.
Provide credentials of the version 9 repository and click OK:
Now go to Server menu > Manage > Server Properties.
In the Server Connection Properties tab, click on the ellipsis button next to Cluster DB Info.
In the Database Connection window, provide the credentials of the repository you just upgraded and click OK.
After completing this step, we recommend that you restart the Integration service.
If you wish to set up a new repository in Astera 10, go to Server menu > Configure > Build Repository Database and Configure Server.
In case you chose to build a new cluster repository in Astera 10, your next step is to import all the scheduled jobs created in Astera 9.
For this, open the scheduler in Astera 9 from Server > Job Schedules.
You will see all the scheduled jobs listed in the scheduler. Select the jobs you want to migrate. To select all jobs, you can also use the shortcut key Ctrl+A.
Click on the Export Schedule icon in the Scheduler toolbar.
Point to the directory and folder where you want to save the scheduled jobs and click OK.
Now Astera will create a separate XML file with ‘.Sched’ extension for each scheduled job and save it in the designated folder.
A message window will pop up to notify that your scheduled jobs have been exported successfully. Click OK.
Now you need to import the scheduled jobs in Astera 10 to complete the migration process.
For this, open Astera 10 client and go to Server > Job Schedules.
This will open the Scheduler tab. To import the existing jobs, click the ‘Import Schedule’ button in the Scheduler toolbar.
Point the path to the directory where you have saved the schedule files. Select all the scheduled jobs you want to import and click ‘Open.’
You can see that the existing jobs scheduled in Astera 9 have been successfully migrated to Astera 10 and a new Job ID has been assigned to each job.
After building a new cluster repository in Astera 10, an alternate way of shifting all the scheduled jobs from Astera 9 to Astera 10 is to use pre-designed flows.
Prior to any upgrade, we strongly recommend that you take a full backup of your repository database. Also, upgrading a lower-level environment first (such as QA, UAT, etc.) is recommended prior to upgrading the production environment. This will make it possible to sort out/resolve any issues before upgrading production.
Steps to Upgrade:
Using Astera 9 client, run the following dataflow to export existing schedules into a comma-delimited file.
Next, open the downloaded file in Astera 9. The dataflow will look like this:
Note: Prior to running the dataflow, you will need to update the Database Table Source object to point to the database where the Astera repository resides. Also, in the properties of Delimited File Destination, set up an appropriate file path where you want your file to be saved.
Once the objects have been configured, run the dataflow. This will create a CSV file containing data of all the schedules that existed in v9.
Next, take note of any existing Cluster Settings. You can check it by right clicking the cluster in the Server Explorer and selecting Cluster Settings in the context menu. These settings will need to be re-configured manually after the upgrade.
It may be helpful to take screenshots of those settings for later reference. The settings include Staging Directory, Purge Job Frequency Options, Email Notification Setup, etc.
Open Astera 10 client. Go to Server menu > Manage > Build Cluster Database. Point it to the database hosting the Astera repository.
Important Note: This will reset the repository.
Use the dataflow below to import the schedules you exported previously in Astera 9.
The dataflow will look like this:
Note: Prior to running the dataflow, in the properties of Delimited File Source, you must import the CSV file you created in v9 that has data of all the schedules. Also, you will need to change the configuration of the Database Table Destination object to point to the database where Astera 10 repository resides.
Once the objects are properly configured, save and run the dataflow.
Next, open Server Explorer, right-click on DEFAULT, and select Cluster Settings.
Now, manually re-configure the relevant settings. You can use the screenshots of Cluster Settings you previously took for reference in version 9. Optionally, you can manually reconfigure the Server Profiles setting if a non-default profile was used prior to the upgrade.
Now, restart the Astera server.
This completes the upgrade.
You can download the flows by clicking on the links below:
When you are starting the migration process, it is recommended to keep Astera 9 and Astera 10 servers running in parallel. This is to avoid any interruption in jobs that are currently running.
We also recommend that you initiate the migration process with a lower-level, testing environment and then promote your deployment to a higher-level environment as needed. This will help ensure smooth migration process with any flow compatibility issues spotted early in the transition cycle.
Astera 10 is a major release and not a direct upgrade of Astera 7.6. This means that migrating from Astera 7.x to Astera 10 won’t require you to uninstall Astera 7 since Astera 10 can be installed side by side with Astera 7 on a system. Astera 10 is backward compatible, and therefore, most of the flows created on Astera 7 can run on Astera 10 without modifications. However, as with the most major release of any complex software, we recommend that you upgrade your lower level environment first, so you have an opportunity to test and verify any existing flows. This will make it possible to identify any migration issues you may encounter early on in the upgrade cycle. In this document, we will cover how you can migrate from Astera 7.x to the all-new Astera 10.
Installing
The installation package for Astera 10 (64-bit) contains two setup (.exe) files:
AsteraDataIntegrator.exe – for Astera client, and
IntegrationServer.exe – for Astera Integration Server
The setup files for Astera 10 can be downloaded from the following location:
https://www.astera.com/download-center/
to learn how to install Astera 10 client and server.
Licensing
Unlike the previous releases of Astera, Astera 10 comes with a single licensing key (for server) rather than two separate keys for Astera server and client. The licensing key for Astera 10 has changed which means you cannot use your Astera 7 key to register Astera 10. The single licensing key for Astera 10 is used to register Astera server and it controls how many clients can connect to the server as well as the functionality available to the connected clients.
If you’re planning to migrate to Astera 10, please contact to get your new your Astera 10 license key.
Cluster and Server Management in Server Explorer
The Astera 10 client can be configured with multiple different servers, however, it can only connect with one server at a time. The jobs scheduled, queued or running on the other registered server will continue to run without interruption even if the client is not currently connected to the server.
All servers pointing to a single repository database will form a cluster of servers sharing the common workload of queued jobs. A cluster of v10 servers you configure will be up and running and processing jobs in a similar way to 7.6, despite the fact that the v10 client can only connect to and manage one v10 server at a time. You can see which server in the cluster has actually processed a job by right clicking the Cluster and opening Server Jobs window.
Repository
You need to set up a new repository to communicate with the Astera 10 server. While upgrading the previous releases of Astera 7, you would simply go to Server > Upgrade Cluster Database. However, while migrating to Astera 10, you need to set up a repository in a new database from scratch to communicate with the server(s) and store the record of server activity. To set up a repository in Astera 10, go to Server menu > Configure > Build Repository Database and Configure Server.
Once you have built a cluster repository in Astera 10, the next step is to migrate the scheduled jobs you created in Astera 7.
For this, open the Job Scheduler in Astera 7 from Server > Job Schedules.
You will see all the scheduled jobs listed in the Scheduler. Select the jobs you want to migrate.
Click Export Schedule button in the Scheduler toolbar.
Point to the directory and folder where you want to save the scheduled jobs and click OK.
Now Astera will create a separate XML file with ‘.Sched’ extension for each scheduled job and save it in the designated folder.
A message window will pop up to notify that your scheduled jobs have been successfully. Click OK.
Now you have to import the job files in Astera 10 to complete the migration process. For this, open Astera 10 client and go to Server > Job Schedules.
This will open the Scheduler tab. To import the existing jobs, click the Import Schedule button in the Scheduler toolbar.
Point the path to the directory where you have saved the schedule files. Select all the scheduled jobs you want to import and click Open.
You can see that the existing jobs scheduled in Astera 7 have been successfully migrated to Astera 10 and a new Job ID has been assigned to each job.
Prior to any upgrade, we strongly recommend that you take a full backup of your repository database. Also, upgrading lower-level environment first (such as QA, UAT, etc.) is recommended prior to upgrading the Production environment. This will make it possible to sort out/resolve any issues before upgrading Production.
Steps to Upgrade:
Using Astera 7 client, run the following dataflow to export existing schedules into a comma-delimited file.
Note: Prior to running the dataflow, you will need to update the Database Table Source object to point to the database where the Astera repository resides.
Take note of any existing Cluster Settings. You can check it by right-clicking the cluster in Server Explorer and selecting Cluster Settings in the context menu. These settings will need to be re-configured manually after the upgrade.
It may be helpful to take screenshots of those settings for later reference. The settings include: Staging Directory, Purge Job Frequency Options, Email Notification Setup, and optionally, Server Profiles if a non-default profile was used prior to the upgrade.
Open Astera 10 client. Go to Server menu > Manage > Build Cluster Database. Point it to the database hosting the Astera repository.
Important Note: This will reset the repository.
Use the dataflow below to import the schedules you exported previously in Astera 7.
The dataflow will look like this:
Note: Prior to running the dataflow, in the properties of Delimited File Source, you must import the CSV file you created in v7 that has data of all the schedules. Also, you will need to change the configuration of the Database Table Destination object to point to the database where Astera 10 repository resides.
Once the objects are properly configured, save and run the dataflow.
Next, open Server Explorer, right-click on DEFAULT and select Cluster Settings.
Now, manually re-configure the relevant settings. You can use the screenshots of Cluster Settings you previously took for reference in version 7. Optionally, you can manually reconfigure the Server Profiles setting if a non-default profile was used prior to the upgrade.
Now, restart the Astera server.
This completes the upgrade.
When you are starting the migration process, it is recommended to keep Astera 7 and Astera 10 servers running in parallel. This is to avoid any interruption in jobs that are currently running.
We also recommend you initiate the migration process with a lower-level, testing environment, and then promote your deployment to a higher-level environment as needed. This will help ensure smooth migration process with any flow compatibility issues spotted early in the transition cycle.
To learn more about the MongoDB Source connector, click
For more information on GIT in Astera, click
If you have any suggestions or feedback or would like to request a new feature, visit or email at .
To learn more about creating cloud connections and browsing files from cloud storage, click .
Microsoft has recently introduced a new email authentication method called Modern Authentication that uses the to access emails.
This is a newly added feature within the object. Emails can now be read in Astera Data Stack using this new method.
If you have any suggestions or feedback or would like to request a new feature, visit or email at .
Follow through with the rest of the steps explained in to build a cluster database and set up a repository in Astera 10.
Follow this link to download the dataflow:
Next, install Astera 10 client and server on your machine. You can read more on installing Astera 10 client and server .
Download attached example dataflow:
(70 KB)
(60 KB)
A new feature in Astera 10 enables you to create an admin email to access the Astera server. As a result, you will also be able to use the “Forgot Password” option while logging in. Read to learn how to verify the admin email in Astera 10.
Follow through the steps explained in to build a cluster database and set up a repository in Astera 10.
Download the attached dataflow:
Next, install Astera 10 client and server on your machine. You can read more on installing Astera 10 client and server in .
Download the attached example dataflow:
A new feature in Astera 10 enables you to create an admin email to access the Astera server. As a result, you will also be able to use the “Forgot Password” option while logging in. Read to learn how to verify the admin email in Astera 10.
Client Application Processor | Dual Core or greater (recommended); 2.0 GHz or greater |
Server Application Processor | Quad Core or greater (recommended) |
Repository Database | MS SQL Server 2008R2 or newer, or PostgreSQL 9.x or newer for hosting repository database |
Operating System - Client | Windows 10 (recommended) |
Operating System - Server | Windows: Windows 10 or Windows Server 2012 or newer |
Memory | Client: 4GB or greater (recommended) |
Server: 8GB or greater (recommended); 32GB or greater for heavy data processing |
Hard Disk Space | Client: 300 MB – (if .NET Framework is pre-installed) |
Server: 700 MB – (if .NET Framework is pre-installed) |
Additional 300 MB if .NET Framework is not installed |
Other | Requires Microsoft .NET Framework 4.8 for the client, .NET Core Runtime 6.0 for the server |
Adding an XML/JSON File Source object to a dataflow allows you to read and transfer data from an XML or a JSON file.
In this section, we will cover how to get an XML/JSON File Source object on the dataflow designer from the Toolbox.
To get an XML/JSON File Source from from the Toolbox, go to Toolbox > Sources > XML/JSON File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the XML/JSON File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the XML/JSON File Source object, right-click on the header and select Properties from the context menu.
When you select the Properties option from the context menu, a dialog box will open.
This is where you can configure properties for the XML/JSON File Source object.
The first step is to provide the File Path and Schema Location for the XML/JSON Source object. By providing the file path and schema, you are building the connectivity to the source dataset.
Check the JSON Format checkbox if your source file is a JSON.
Check the Provide Inner XML checkbox to get the XML markup representing only the child nodes of the parent node.
Note: In this case we are going to be using an XML/JSON file with Orders sample data in the parent node and Order Details sample data in the child node.
Once you have specified the data reading options in this window, click Next.
On the XML Layout window, you can view the layout of your XML/JSON source file.
After you are done viewing the layout, click Next. You will be taken to a new window, Config Parameters. Here, you can define the parameters for the XML/JSON File Source.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
After you have configured the source object, click OK.
You have successfully configured your XML/JSON File Source object. The fields from the source object can now be mapped to other objects in a dataflow.
Once you have created the repository and configured the server, the next step is to login using your Astera account credentials.
You will not be able to design any dataflows or workflows on the client if you haven’t logged in to your Astera account. The options will be disabled.
Go to Server > Configure > Step 2: Login as admin.
This will direct you to a login screen where you can provide your user credentials.
If you are using Astera 10 for the first time, you can login using the default credentials as follows:
Username: admin Password: Admin123
After you log in, you will see that the options in the Astera Client are enabled.
You can use these options until your trial period is active. For fully activating the options and the product, you’ll have to enter your license.
If you don’t want Astera to show you the server connection screen every time you run the client application, you can skip that by modifying the settings.
To do that go to Tools > Options > Client Startup and select the Auto Connect to Server option. On enabling the option, Astera will store the server details you entered previously and will use those details to automatically reconnect to the server every time you run the application.
The next step after logging in is to unlock Astera using the License key.
You can connect to different servers right from the Server Explorer window in the Client. Go to the Server Explorer window and click on the Connect to Server icon.
A prompt will appear that will confirm if you want to disconnect from the current Server and establish connection to a different server. Click Yes to proceed.
Note: A client cannot be connected to multiple servers at once.
You will be directed to the Server Connection screen. Enter the required server information (Server URI and Port Number) to connect to the server and click Connect.
If the connection is successfully established, you should be able to see the connected server in the Server Explorer window.
In this section we will discuss how to install and configure Astera Server and Client applications.
Run ‘IntegrationServer.exe’ from the installation package to start the server installation setup.
You’ll be directed to the welcome screen. Click Next to continue.
On the next screen you will see the license agreement. You can only continue if you choose to accept the terms of the license agreement. Click Next to continue.
On the next screen, enter the user details and click Next to continue.
Select the type of installation (Complete or Custom) you want to proceed with and click Next.
Select Install to complete the installation.
Select Finish to finish the installation process.
Run the ‘AsteraDataIntegrator’ application from the installation package to start the client installation setup.
You’ll be directed to the welcome screen. Click Next to continue.
On the next screen you will see the license agreement. You can only continue if you choose to accept the terms of the license agreement. Click Next to continue.
On the next screen, enter the user details and click Next to continue.
Select the type of installation (Complete or Custom) you want to proceed with and click Next.
If you select custom installation, you can choose specific component(s) that you want to download.
I want to install the complete package therefore, I’ll select Complete on the Setup Type screen and click Next.
Select Install to complete the installation.
Select Finish to finish the installation process.
This is how you install Astera client and server applications. The next step is to establish a connection between Astera client and server.
Before you start using the Astera server, a repository must be set up. Astera supports SQL Server and PostgreSQL for building cluster databases, which can then be used for maintaining the repository. The repository is where job logs, job queues, and schedules are kept.
To see these options, go to Server > Configure > Step 1: Build repository database and configure server.
The first step is to point to the SQL Server or PostgreSQL instance where you want to build the repository and provide the credentials to establish the connection.
Note: Astera will not create the database itself, just the tables. A database will have to be created beforehand or an existing database can be used. We recommend Astera to have its own database for this purpose.
Go to Server > Configure > Step 1: Build repository database and configure server.
Select SQL Server from the Data Provider drop-down list and provide the credentials for establishing the connection.
From the drop-down list next to the Database option, select the database on the SQL instance where you want to host the repository.
Click Test Connection to test whether the connection is successfully established or not. You should be able to see the following message if the connection is successfully established.
Click OK to exit out of the test connection window and again click OK, the following message will appear. Select Yes to proceed.
The repository is now set up and configured with the server to be used.
The next step is to log in using your credentials.
Go to Server > Configure > Step 1: Build repository database and configure server.
Select PostgreSQL from the Data Provider drop-down list and provide the credentials for establishing the connection.
From the drop-down list next to the Database option, select the database on the PostgreSQL instance where you want to host the repository.
Click Test Connection to test whether the connection is successfully established or not. You should be able to see the following message if the connection is successfully established.
Click OK and the following message will appear. Select Yes to proceed.
The repository is now set up and configured with the server to be used.
The next step is to log in using your credentials.
The Join transformation object joins records from two record sets. The join functionality is similar to standard SQL joins, but the distinguishing advantage of Astera's implementation is that you can join records from any two sources and not just two database tables.
This article covers how you can use Join transformation in Astera.
Suppose we have two database tables - Customers and Orders, as shown in the screenshot below, and we want to join these two tables.
Let’s see how we can join the two tables using the Join transformation object in Astera:
Drag-and-drop the Join transformation object from the Transformations section in the Toolbox. To open the Toolbox, go to View > Toolbox.
Map the fields from the source objects to the Join transformation object.
Note: To quickly add fields to the layout, drag-and-drop the bold node’s output port of the object whose layout you wish to replicate to the bold Join node of the Join object.
To set the properties for the Join transformation, double-click on the object or right-click and select Properties.
The first window is the Layout Builder window. You can manage the layout for your transformation (add or remove fields) from this window. Click Next to go to the next window.
The next window is the Relation Join Transformation Properties window. Select the Join Type from the drop-down menu. Astera supports four types of joins:
Inner Join – Joins records from two record sets based on matching values in key fields. Any unmatched records are discarded.
Left Outer Join – Similar to Inner Join, but unmatched records from the left record set (also called ‘first record set’) are preserved, and null values are written for the unmatched record in the right record set (also called ‘second record set’).
Right Outer Join – Similar to Inner Join, but unmatched records from the right record set (also called ‘second record set’) are preserved, and null values are written for the unmatched record in the left record set (also called ‘first record set’).
Full Outer Join - similar to Inner Join, but unmatched records from either record set are preserved, and null values are written for the unmatched record in the other record set.
Other options in this window:
Join in Database: Check this option if you want to join the tables in the database.
Case Sensitivity: Check this option if you want a case sensitive match of the values in the key fields.
Sort (Left/Right) Input: Specify whether the left input, the right input, or both, need to be sorted.
Select the key fields from the Left Field and Right Field drop-down lists. Click Next, then *OK.
Note: You can add multiple fields to create a composite key field.
You can now preview the output and see the consolidated data.
This window consists of options common to most objects in a dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then feed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect statistics.
The Comments input allows you to enter comments associated with this object.
In order for the features of AGL and OCR to work in Astera, different packages are required to be installed, such as Python, Java etc. To avoid the tedious process of separately installing these, Astera provides a built in Install Manager in your tool.
In this document, we will look at how to use the install manager to install these packages. If the server is on a separate machine than the client, we will also see how to install the packages on the server machine.
Open Astera as an administrator.
Once Astera is open, go to the Tools > Run Install Manager.
The Install Manager welcome window will appear. Click on Next.
If the prerequisite packages are already installed, the Install Manager will inform you about them and give you the option to uninstall or update them as needed. If you want to uninstall them, select the package and click on Uninstall.
Once the uninstallation is complete, the manager will prompt you about it.
If you want to update the packages, select them and click on Update.
If the prerequisite packages are not installed, then the Install Manager will present you with the option to install them. Check the box next to the pre-requisite package, and then click on Install.
The packages being installed for AGL are listed as follows:
Whereas the packages being installed for OCR are listed as follows:
Python packages
During the installation, the Install Manager window will display a progress bar showing the installation progress.
You can also cancel the installation at any point if necessary.
Once the installation is complete, the Install Manager will prompt you. Click on Close to exit out of the Install Manager.
The packages for AGL and OCR usage are now installed, and the features are ready to use.
Note: The packages are ready to use in the case when both the Integration Server and Astera are installed on the same machine.
In case the Integration Server is installed on a separate machine, we will need to install the packages for AGL and OCR there as well. If the client is also installed on the server machine, you can proceed with the installation as described in the previous section.
In case the client is not installed, follow the below mentioned steps.
In order to access the install manager on this machine, open start and search for “Install Manager for Integration Server”.
Run this Install Manager as admin.
Install Manager is now open and can be used as described from step 3 onwards in the previous section.
This concludes our discussion on how to use the install manager for Astera.
is installed
with the following packages:
In some cases, it may be necessary to supply a license key without prompting the end user to do so. For example, in a scenario where the end user does not have access to install software, a systems administrator may do this as part of a script.
One possible solution is to place the license key in a text file. This way, the administrator can easily license each machine without having to go through the licensing prompt for each user.
Here’s a step-by-step guide to supplying a license key without prompting the user:
To get started, create a new text document that will hold the license key required to access the application.
In the text document, enter a valid license key. The key must be the only thing in the document, and it must be on the very first line. Make sure there are no unnecessary leading or trailing spaces, lines, or any characters other than those of the license key.
Name the text document “Serial” and save it in the Integration Server Folder of the application located in Program Files on your PC. For instance, if the application is Astera, save the Text Document in the “Astera Integration Server 10” folder. This folder contains the files and settings for the server application. The directory path would be as follows:
C:\Program Files\Astera Software\Astera Integration Server 10.
Note: This approach works for all Astera applications, except for Astera API Management and Astera Express. For API Management, there is a different server. Thus, for it, you’ll need to locate the corresponding server folder and follow the same steps. Whereas for Astera Express, since there is no server involved, simply copy the Serial text document to the “Astera Express 10” folder, and the remaining steps remain unchanged.
Finally, restart the Astera Integration Server 10 service to complete the process. This step ensures that, from now on, when the user launches Astera or any other application by Astera, they will not be prompted to enter a license key.
Note: There is no need to restart the service for Astera Express as it does not have a corresponding server.
Also, please keep in mind that all license restrictions are still in effect, and this process only bypasses the user prompt for the key.
In conclusion, by following these simple steps, system administrators can easily supply a license key without prompting the end user. This approach is particularly useful when installing software remotely or when licensing multiple machines.
To activate Astera on your machine, you need to enter the license key provided with the product. When the license key is entered, the client sends a request to the licensing server to grant the permission to connect. This action can only be performed when the client machine is connected to the internet.
However, Astera provides an alternative method to activate the license offline by providing an activation code to the users who request for it. Follow the steps given below for offline activation:
1. Go to the menu bar and click on Server > Configure > Step 4: Enter License Key
2. Click on Unlock using a key.
3. Type your Name, Organization and paste the Key provided to you. Then, click Unlock. Do the same if you are changing (instead of activating) your license offline.
4. Another pop-up window will give you an error about the server being unavailable because you cannot connect to the server offline. Click OK.
5. Click on Activate using a key button.
6. Now, copy the Key and the Machine Hash and email it to support@astera.com. The Machine Hash is unique for every machine. Make sure you send the correct Key and Machine Hash as it is very important for generating the correct Activation Code.
7. You will receive an activation code from the support staff via e-mail. Paste this code into the Activation Code textbox and click on Activate.
8. You have successfully activated Astera on your machine offline using the activation code. Click OK.
9. A pop-up window will notify you that the client needs to restart for the new license to take effect. Click OK and restart the client.
You have successfully completed the offline activation of Astera.
R is a programming language needed to run some of the data science related objects in Astera, such as objects in the Testing and Diagnostics and Analytical Models sections of the Toolbox.
The version 4.0.2 of R is the most compatible with the Astera Client.
1. First, go to https://cran.r-project.org/bin/windows/base/old/4.0.2/ and install R-4.0.2 for windows by clicking on its installer with the (.exe) file extension.
2. Locate the directory where you want to save your installer and click Save.
Here, we are saving the installer in the Downloads folder.
3. Run the installer by clicking on it. Leave all the installation settings as default. The R package will be downloaded in your Program Files.
4. Next, launch the Astera client and go to Server > Manage > Server Properties.
The Server Connection Properties tab will open.
5. Provide the path of the bin folder of the R-4.0.2 package in the File Path textbox.
Alternatively, you can also click on the folder icon on the right and select the bin path from the directory.
6. Click on the Save icon in the secondary menu bar to save the changes made in the Server Connection Properties tab.
7. It will prompt you to reboot the server. Click Yes, and then click OK.
8. Click OK to start the installation of the packages.
9. A green bar will show the installation progress. Wait till it is complete.
10. Once the installation is completed, a dialogue box will pop up, notifying you that the installation was successful. Click OK.
You have successfully integrated R-4.0.2 with your Astera client.
This article introduces the role-based access control mechanism in Astera. This means that administrators can grant or restrict access to various users within the organization, based on their role in the entire data management cycle.
In this article, we will look at the user lists and role management features in detail.
Note: When you run the application for the first time, sign in using the default credentials provided on our help site.
Username: admin
Password: Admin123
Once you have logged in, you now have the option to create new users and we recommend you to do this as a first step.
To create/register a new user, right-click on the DEFAULT server node in the Server Explorer window and select User List from the context menu.
This will open the Server Browser panel.
Under the Security node, right-click on the User node and select Register User from the context menu.
This will open a new window. You can see quite a few fields required to be filled here to register a new user.
Note: Astera currently supports three authentication types when registering a user; Astera, Windows, and Azure Authentication.
Once the fields have been filled, click Register and a new user will be registered.
Now that a new user is registered, the next step is assign roles to the user.
Select the user you want to assign the role(s) to and right-click on it. From the context menu, select Edit User Roles.
A new window will open where you can see all roles that are there by default in Astera or are custom created. We haven’t created any custom role, so we’ll see the three default roles that are - Developer, Operator, and Root.
Select the role that you want to assign to the user and click on the arrows in the middle section of the screen. You’ll see that the selected role will get transferred from the All Roles section to the User Roles section.
Note: You can assign multiple roles to a single user.
After you have assigned the roles. click OK and the specific role(s) will be assigned to the user.
Astera lets the admin manage resources allowed to any user. They can assign permissions of resources or they can restrict resources.
To edit role resources, right-click on any of the roles and select Edit Role Resources from the context menu.
This will open a new window. Here, you can see four nodes on the left under which resources can be assigned,
The admin can provide a role with resources from the Url node, the Cmd node, access to deployments from the REST node and access to Catalog aritfacts from the Catalog Node.
Expanding the Url node shows us the following resources,
Expanding the Cmd node will give us the following checkboxes as resources.
If we expand the REST checkbox, we can see a list of available API resources, including endpoints you might have deployed.
Upon expanding the Catalog node, we can see the artifacts that have been added to the Catalog, along with which endpoint permissions are to be given.
This concludes User Roles and Access Control in Astera Data Stack.
Astera Data Stack can read data from a wide range of file sources and database providers. In this article, we have compiled a list of file formats, data providers, and web-applications that are supported for use in Astera Data Stack.
Amazon Aurora
Azure SQL Server
MySQL
Amazon Aurora Postgres
Amazon RDS
Amazon Redshift
DB2
Google BigQuery
Google Cloud SQL
MariaDB
Microsoft Azure
Microsoft Dynamics CRM
MongoDB (as a Source)
MS Access
MySQL
Netezza
Oracle
PostgreSQL
PowerBI
Salesforce (Legacy)
Salesforce Rest
SAP Hana
Snowflake
SQL Server
Sybase
Tableau
Teradata
Vertica
In addition, Astera features an ODBC connector that uses the Open Database Connectivity (ODBC) interface by Microsoft to access data in database management systems using SQL as a standard.
COBOL
Delimited files
Fixed length files
XML/JSON
Excel workbooks
PDFs
Report sources
Text files
Microsoft Message Queue
EDI formats (including X12, EDIFACT, HL7)
Microsoft Dynamics CRM
Microsoft Azure Blob Storage
Microsoft SharePoint
Amazon S3 Bucket Storage
Azure Data Lake Gen 2
PowerBI
Salesforce
SAP
Tableau
AS2
FTP (File Transfer Protocol)
HDFS (Hadoop Distributed File System) n/a
SCP (Secure Copy Protocol)
SFTP (Secure File Transfer Protocol)
SOAP (Simple Object Access Protocol)
REST (REpresentational State Transfer)
Using the SOAP and REST web services connector, you can easily connect to any data source that uses SOAP protocol or can be exposed via REST API.
Here are some applications that you can connect to using the API Client object in Astera Data Stack:
FinancialForce
Force.com Applications
Google Analytics
Google Cloud
Google Drive
Hubspot
IBM DB2 Warehouse
Microsoft Azure
OneDrive
Oracle Cloud
Oracle Eloqua
Oracle Sales Cloud
Oracle Service Cloud
Salesforce Lightning
ServiceMAX
SugarCRM
Veeva CRM
The list is non-exhaustive.
You can also build a custom transformation or connector from the ground up quickly and easily using the Microsoft .NET APIs, and retrieve data from various other sources.
The COBOL File Source object holds the ability to fetch data from a COBOL source file if the user has the workbook file available. The data present in this file can then be processed further in the dataflow and then written to a destination of your choice.
Expand the Sources section of the Toolbox and select the COBOL Source object.
Drag-and-drop the COBOL Source object onto the dataflow. It will appear like this:
By default, the COBOL Source object is empty.
To configure it according to your requirements, right-click on the object and select Properties from the context menu.
Alternatively, you can open the properties window by double-clicking on the COBOL Source object header.
The following is the properties tab of the COBOL Source object.
File Path: Clicking on this option allows you to define a path to the data file of a COBOL File.
Note: File Path registers files with extensions of .dat and .txt (Additionally, it can also register files with an .EBC extension)
For our use case, we will be using a sample file with an .EBC extension.
Encoding: This drop-down option allows us to select the encoding from multiple options.
In this case, we will be using the IBM EBCDIC (US-Canada) encoding.
Record Delimiter: This allows you to select the kind of delimiter from the drop-down menu.
(Carriage Return): Moves the cursor to the beginning of the line without advancing to the next line.
(Line Feed): Moves the cursor down to the next line without returning to the beginning of the line.
<CR><LF>: Does both.
For our use case, we have selected the following.
Copybook: This option allows us to define a path to the schema file of a COBOL File.
Note: Copybook registers files with the extensions of .txt and .cpy
For our use case, we are using a file with the .cpy extension.
Next, three checkboxes can be configured according to the user application. There is also a Record Filter Expression field given under these checkboxes.
Ignore Line Numbers at Start of Lines: This option is checked when the data file has incremental values. It is going to ignore line numbers at the start of lines.
Zone Decimal Sign Explicit: Controls whether there is an extra character for the minus sign of a negative integer.
Fields with COMP Usage Store Data in a Nibble: Checking this box will ignore the COMP encryption formats where the data is stored.
COMP formats range from 1-6 in COBOL Files.
Record Filter Expression: Here, we can add a filter expression that we wish to apply to the records in the COBOL File.
On previewing output, the result will be filtered according to the expression.
Once done with this configuration, click Next, and you will be taken to the next part of the properties tab.
The COBOL Source Layout window lets the user check values which have been read as an input.
Expand the Source node, and you will be able to check each of the values and records that have been selected as an input.
This gives the user data definition and field details on further expanding the nodes.
Once these values have been checked, click Next.
The Config Parameters window will now open. Here, you can further configure and define parameters for the COBOL Source Object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next.
Now, a new window, General Options, will appear.
Here, you can add any Comments that you wish to add. The rest of the options in this window have been disabled for this object.
Once done, click OK.
The COBOL Source object has now been configured. The extracted data can now be transformed and written to various destinations.
This concludes our discussion on the COBOL Source Object and its configuration in Astera Data Stack.
Astera’s Delimited File Destination provides the functionality to write data to a delimited file. Delimited File Destination gives you the ability to control the structure and content of file including numeric, date, and Boolean formats, encodings, text qualifiers (quotes), and character sets. You can choose to create a new file or append data to an existing file.
To get a Delimited File Destination object from the Toolbox, go to Toolbox > Destinations > Delimited File Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Delimited File Destination object onto the designer.
The dragged destination object is empty right now. This is because the object has not been configured yet.
For a Delimited Destination object to work, data fields should be mapped to the object so that the mapped data can be written to the destination.
Configure the source object and place it onto the designer next to the Delimited File Destination object.
Note: In this case a Customers sample table has been used from a Database Table Source to write to the Delimited File Destination.
Now map the source object to the destination object. The following ways can be used for mapping:
By dragging and dropping the parent nodes onto each other for automatic mapping.
By mapping the source parent node by dragging it to the destination parent node manually.
By directly writing the source layout to a Delimited File Destination through the source context menu of its parent node.
The fields are now mapped.
To configure the Delimited File Destination object, right-click on the header, select Properties from the context menu and a dialog box will open.
Select the Properties option from the context menu and a dialog box will open.
Provide the File Path. This is where the delimited destination file will be saved.
The dialog box has some other configuration options. Let’s go over these options:
Options:
If the source File Contains Header, check the box to read headers from the source file.
Field Delimiter - Allows you to select a delimiter from the drop-down list for the fields.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return and line-feed combination, carriage-return and line-feed. You can also type the record delimiter of your choice instead of choosing the available options.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
A Text Qualifier is a symbol that identifies where text begins and ends. It is used specifically when importing data.
Apply Text Qualifier to all Fields will add the specified qualifier to all the fields that have been mapped
Say you need to import a text file that is comma delimited (commas separate the different fields that will be placed in adjacent cells).
Use Null Text to specify a certain value that you do not want in your data and want it to be replaced by a null value.
Choose Append to File (If Exists) to append to an existing file or create a new file. Creating a new file will overwrite any existing file data.
Check on Hierarchical Destination when the data in the source file needs to be sorted into hierarchies in the destination file.
Check on Write to Multiple Files for the data to be saved to multiple files instead of one single file. This can be done within a single dataflow through the destination object and supporting transformations.
Once the data reading options have been specified in this window, click Next.
The next window is the Layout Builder. On this window, the layout of the delimited destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added and select the subsequent properties for it. A new field will be added to the source layout.
Note: Adding a new field (Email) to the layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Note: Deleting the Fax field from the layout.
Right-click on the highlighted line, a context menu will appear in which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down keys.
Note: Find the Move up/Move down icons on the top left of the builder.
For example: To move the Country field right below the Region field, select the row and use the Move up key from the 9th row to the 8th row.
Once the object layout is configured, click Next. A new window will appear, Config Parameters, which allows you to further configure and define parameters for the delimited destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: If you want to specify parametrized values for Field/Record Delimiter, make sure to follow the same syntax of delimiters as in the dropdown menus of these options on the Properties window.
Parameters left blank will use their default values assigned on the properties page.
Next a General Options window will appear. In this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The DelimitedDest object is now configured according to the changes that were made in the properties window.
Note: The changes that were made in this case are:
Added a new field Email in the layout.
Moved the Country field below the field Region field.
The Delimited File Destination object is successfully configured, and the destination file can now be created by running the dataflow.
Each source on the dataflow is represented as a source object. You can have any number of sources in the dataflow, and they can feed into zero or more destinations.
The following source types are supported by the dataflow engine:
Flat File Sources:
Tree File Sources:
Database Sources:
Data Model
All sources can be added to the dataflow by picking a source type on the Toolbox and dropping it on the dataflow. File sources can also be added by dragging-and-dropping a file from an Explorer window. Database sources can be drag-and-dropped from the Data Source Browser. For more details on adding sources to the dataflow, see Introducing Dataflows.
Adding a Delimited File Source object allows you to transfer data from a delimited file. An example of what a delimited file source object looks like is shown below.
To configure the properties of a Delimited File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding a Fixed-Length File Source object allows you to transfer data from a fixed-length file. An example of what a Fixed-Length File Source object looks like is shown below.
To configure the properties of a Fixed-Length File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding an Excel Workbook Source object allows you to transfer data from an Excel file. An example of what an Excel Workbook Source object looks like is shown below.
To configure the properties of an Excel Workbook Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding a COBOL File Source object allows you to transfer data from a COBOL file. An example of what a COBOL File Source object looks like is shown below.
To configure the properties of a COBOL File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Adding an XML/JSON File Source object allows you to transfer data from an XML file. An example of what an XML/JSON File Source object looks like is shown below.
To configure the properties of an XML/JSON File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
General Properties window:
File Path – Specifies the location of the source XML file. Using UNC paths is recommended if running the dataflow on a server.
Schema File Path – Specifies the location of the XSD file controlling the layout of the XML source file.
Note: Astera can generate a schema based on the content of the source XML file. The data types will be assigned based on the source file’s content.
Optional Record Filter Expression – Allows you to enter an expression to selectively filter incoming records according to your criteria. You can use the Expression Builder to help you create your filter expression. For more information on using Expression Builder, see Expression Builder.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.
Adding a Database Table Source object allows you to transfer data from a database table. An example of what a Database Table Source object looks like is shown below.
To configure the properties of a Database Table Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your source, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected source.
Pick Source Table window:
Select a source table using the Pick Table dropdown.
Select Full Load if you want to read the entire table.
Select Incremental Load Based on Audit Fields to perform an incremental read starting at a record where the previous read left off.
Incremental load based on Audit Fields is based around the concept of Change Data Capture (CDC), which is a set of reading and writing patterns designed to optimize large-scale data transfers by minimizing database writing in order to improve performance. CDC is implemented in Astera using Audit Fields pattern. The Audit Fields pattern uses create time or last update time to determine the records that have been inserted or updated since the last transfer and transfers only those records.
Advantages
Most efficient of CDC patterns. Only records that were modified since the last transfer are retrieved by the query thereby putting little stress on the source database and network bandwidth
Disadvantages
Requires update date time and/or create date time fields to be present and correctly populated
Does not capture deletes
Requires index on the audit field(s) for efficient performance
To use the Audit Fields strategy, select the Audit Field and an optional Alternate Audit Field from the appropriate dropdown menus. Also, specify the path to the file that will store incremental transfer information.
Where Clause window:
You can enter an optional SQL expression serving as a filter for the incoming records. The expression should start with the WHERE word followed by the filter you wish to apply.
For example, WHERE CreatedDtTm >= ‘2001/01/05’
General Options window:
The Comments input allows you to enter comments associated with this object.
Adding a SQL Query Source object allows you to transfer data returned by the SQL query. An example of what an SQL Query Source object looks like is shown below.
To configure the properties of a SQL Query Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your SQL Query, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected database.
SQL Query Source window:
Enter the SQL expression controlling which records should be returned by this source. The expression should follow SQL syntax conventions for the chosen database provider.
For example, select OrderId, OrderName, CreatedDtTm from Orders.
Source or Destination is a Delimited File
If your source or destination is a Delimited File, you can set the following properties
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Field Delimiter - Allows you to select the delimiter for the fields. The available choices are , and . You can also type the delimiter of your choice instead of choosing the available options.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Quote Char - Allows you to select the type of quote character to be used in the delimited file. This quote character tells the system to overlook any special characters inside the specified quotation marks. The options available are ” and ’.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Microsoft Excel Worksheet
If the Source and/or the Destination chosen is a Microsoft Excel Worksheet, you can set the following properties:
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Worksheet - Allows you to select a specific worksheet from the selected Microsoft Excel file.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Fixed Length File
If the Source and/or the Destination chosen is a Fixed Length File, you can set the following properties:
First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Using the Length Markers window, you can create the layout of your fixed-length file, The Length Markers window has a ruled marker placed at the top of the window. To insert a field length marker, you can click in the window at a particular point. For example, if you want to set the length of a field to contain five characters and the field starts at five, then you need to click at the marker position nine.
In case the records don’t have a delimiter and you rely on knowing the size of a record, the number in the RecordLength box is used to specify the character length for a single record.
You can delete a field length marker by clicking the marker.
Source or Destination is an XML file
If the source is an XML file, you can set the following options:
Source File Path specifies the file path of the source XML file.
Schema File Path specifies the file path of the XML schema (XSD file) that applies to the selected source XML file.
Record Filter Expression allows you to optionally specify an expression used as a filter for incoming source records from the selected source XML file. The filter can refer to a field or fields inside any node inside the XML hierarchy.
The following options are available for destination XML files.
Destination File Path specifies the file path of the destination XML file.
Encoding - Allows you to choose the encoding scheme for the XML file from a list of choices. The default value is Unicode (UTF-8).
Format XML Output instructs Astera to add line breaks to the destination XML file for improved readability.
Read From Schema File specifies the file path of the XML schema (XSD file) that will be used to generate the destination XML file.
Root Element specifies the root element from the list of the available elements in the selected schema file.
Generate Destination XML Schema Based on Source Layout creates the destination XML layout to mirror the layout of the source.
Root Element specifies the name of the root element for the destination XML file.
Generate Fields as XML Attributes specifies that fields will be written as XML attributes (as opposed to XML elements) in the destination XML file.
Record Node specifies the name of the node that will contain each record transferred.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.
When importing from a fixed-width, delimited, or Excel file, you can specify the following advanced reading options:
Header Spans x Rows - If your source file has a header that spans more than 1 row, select the number of rows for the header using this control.
Skip Initial Records - Sets the number of records which you want skipped at the beginning of the file. This option can be set whether or not your source file has a header. If your source file has a header, the first record after the specified number of rows to skip will be used as the header row.
Raw Text Filter - Only records starting with the filter string will be imported. The rest of the records will be filtered.
You can optionally use regular expressions to specify your filter. For example, the regular expression ^[12][4] will only include records starting with 1 or 2, and whose second character is 4.
Note: Astera supports Regular Expressions implemented with the Microsoft .NET Framework and uses the Microsoft version of named captures for regular expressions.
Raw Text Filter setting is not available for Excel source files.
If your source is a fixed-length file, delimited file, or Excel spreadsheet, it may contain an optional header row. A header row is the first record in the file that specifies field names and, in the case of a fixed-length file, the positioning of fields in the record.
If your source file has a header row, you can specify how you want the system to handle the differences between your actual source file, and the source layout specified in the setting. Differences may arise due to the fact that the source file has a different field order from what is specified in the source layout, or it may have extra fields compared to the source layout. Conversely, the source file may have fewer fields than what is defined in the source layout, and the field names may also differ, or may have changed since the time the layout was created.
By selecting from the available options, you can have Astera handle those differences exactly as required by your situation. These options are described in more detail below:
Enforce exact header match – Lets Astera Data Stack proceed with the transfer only if the source file’s layout matches the source layout defined in the setting exactly. This includes checking for the same number and order of fields and field names.
Columns order in file may be different from the layout – Lets Astera Data Stack ignore the sequence of fields in the source file, and match them to the source layout using the field names.
Column headers in file may be different from the layout – This mode is used by default whenever the source file does not have a header row. You can also enable it manually if you want to match the first field in the layout with the first field in the source file, the second field in the layout with the second field in the source file, and so on. This option will match the fields using their order as described above even if the field names are not matched successfully. We recommend that you use this mode only if you are sure that the source file has the same field sequence as what is defined in the source layout.
The Field Layout window is available in the properties of most objects on the dataflow to help you specify the fields making up the object. The table below explains the attributes you can set in the Field Layout window.
The table below provides a list of all the attributes available for a particular layout type.
Astera supports a variety of formats for each data type. For example, for Dates, you can specify the date as “April 12” or “12-Apr-08”. Data Formats can be configured independently for source and for destination, giving you the flexibility to correctly read source data and change its format as it is transferred to destination.
If you are transferring from a flat file (for example, Delimited or Fixed-Width), you can specify the format of a field so that the system can correctly read the data from that field.
If you do not specify a data format, the system will try to guess the correct format for the field. For example, Astera is able to correctly interpret any of the following as a Date:
April 12
12-Apr-08
04-12-2008
Saturday, 12 April 2008
and so on
Astera comes with a variety of pre-configured formats for each supported data type. These formats are listed in the Sample Formats section below. You can also create and save your own data formats.
To select a data format for a source field, go to Source Fields and expand the Format dropdown menu next to the appropriate field.
Sample Formats
Dates:
Booleans:
Integers:
Real Numbers:
Numeric Format Specifiers:
Delimited files are one of the most commonly used data sources and are used in a variety of situations. The Delimited File Source object in Astera provides the functionality to read data from a delimited file.
In this article, we will cover how to use a Delimited File Source object.
To get a Delimited File Source object from the Toolbox, go to Toolbox > Sources > Delimited File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Delimited File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Delimited File Source object, right-click on its header and select Properties from the context menu.
As soon as you have selected the Properties option from the context menu, a dialog box will open.
This is where you can configure the properties for the Delimited File Source object.
The first step is to provide the File Path for the delimited source file. By providing the file path, you are building the connectivity to the source dataset.
Note: In this case, we are going to be using a delimited file with sample Orders data. This file works with the following options:
File Contains Headers
Record Delimiter is specified as CR/LF:
The dialog box has some other configuration options:
If the source file contains headers, and you want Astera to read headers from the source file, check the File Contains Header option.
If you want your file to be read in portions, upon selecting the Partition File for Reading option, Astera will read your file according to the specified Partition Count. For instance, if a file with 1000 rows has a Partition Count of 2 specified, the file will be read in two partitions of 500 each. This is a back-end process that makes data reading more efficient and helps in processing data faster. This will not have any effect on your output.
The Record Delimiter field allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination <CR/LF>, carriage-return - CR and line-feed - LF. You can also type the record delimiter of your choice instead of choosing from the available options.
In case the records do not have a delimiter and you rely on knowing the size of a record, the number in the Record Length field can be used to specify the character length for a single record.
The Encoding field allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8).
A Text Qualifier is a symbol that identifies where text begins and ends. It is used specifically when importing data. For example, if you need to import a text file that is comma delimited (commas separate the different fields that will be placed in adjacent cells).
To define a hierarchical file layout and process the data file as a hierarchical file, check the This is a Hierarchical File option. Astera IDE provides extensive user interface capabilities for processing hierarchical structures.
Use the Null Text option to specify a certain value that you do not want in your data, and instead want it to be replaced by a null value.
Check the Allow Record Delimiter Inside a Field Text option when you have the record delimiter as text inside your data and want that to be read as it is.
Advanced File Options
In the Header spans over field, specify the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
Check the Enforce exact header match option if you want the header to be read as it is.
Check the Column order in file may be different from the layout option, if the field order in your source layout is different from the field order in Astera’s layout.
Check the Column headers in file may be different from the layout option if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Astera’s layout. You can create a Synonym Dictionary file to store values for alternate headers. You can also use the Synonym Dictionary file to facilitate automapping between objects on the flow diagram that use alternate names in field layouts.
To skip any unwanted rows at the beginning of your file, you can specify the number of records that you want to omit through the Skip initial records option.
Raw text filter
If you do not want to apply any filter and process all records, check No filter. Process all records.
If there is a specific value which you want to filter out, you can check the Process if begins with option and give the value that you want Astera to read from the data, in the provided field.
If there is a specific expression which you want to filter out, you can check the Process if matches this regular expression option and give the expression that you want Astera to read from the data, in the provided field.
String Processing
String Processing options come in use when you are reading data from a file system and writing it to a database destination.
Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Astera will omit those accordingly in the output.
Check the Trim strings option when you want to omit any extra spaces in the field value.
Once you have specified the data reading options on this window, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of the delimited source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear where you will have the option to Delete.
Selecting this option will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
Note: Modifying the layout (adding or deleting fields) from the Layout Builder in Astera will not make any changes to the actual source file. The layout is specific to Astera only.
After you are done customizing the layout, click Next. You will be directed to a new window, Config Parameters. Here, you can define parameters for the Delimited File Source object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Once you have configured the source object, click OK.
The Delimited File Source object is now configured according to the changes made.
The Delimited File Source object has now been modified from its previous configuration. The new object has all the modifications that were made in the builder.
In this case, the modifications that were made are:
Added the CustomerName column.
Deleted the ShipCountry column.
You have successfully configured your Delimited File Source object. The fields from the source object can now be mapped to other objects in the dataflow.
The Excel File Source object in Astera supports all formats of Excel. In this article, we will be discussing:
Various ways to get the Excel Workbook Source object on the dataflow designer.
Configuring the Excel Workbook Source object according to our required layout and settings.
In this section, we will cover the various ways to get an Excel Workbook Source object on the dataflow designer.
To get an Excel File Source object from the Toolbox, go to Toolbox > Sources > Excel Workbook Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Excel Workbook Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet. We will discuss the configuration properties for the Excel Workbook Source object in the next section.
If you already have a project defined and excel source files are a part of that project, you can directly drag-and-drop the excel file sources from the project tree onto the dataflow designer. The Excel File Source objects in this case will already be configured. Astera Data Stack detects the connectivity and layout information from the source file itself.
Note: In this case we are using an excel file with Customers data. The file is a part of an existing project folder.
To get an Excel File Source object from the Project Explorer, go to the Project Explorer window and expand the project tree.
Select the Excel file you want to bring in as the source and drag-and-drop it on the designer. In this case, we are working with Customers -Excel Source.xls file so we will drag-and-drop it onto the designer.
If you expand the dropped object, you will see that the layout for the source file is already built. You can even preview the output at this stage.
To get an Excel Workbook Source directly from the file location, open the folder containing the Excel file.
Drag-and-drop the Excel file from the folder onto the designer in Astera.
If you expand the dropped object, you will see that the layout for the source file is already built. You can even preview the output at this stage.
To configure the Excel Workbook Source object, right-click on its header and select Properties from the context menu.
As soon as you have selected the Properties option from the context menu, a dialog box will open.
This is where you can configure your properties for the Excel Workbook Source object.
The first step is to provide the File Path for the excel source. By providing the file path, you are building the connectivity to the source dataset.
Note: In this case we are going to be using an excel file with sample Customers data.
The dialog box has some other configuration options:
If your source file contains headers and you want your Astera source layout to read headers from the source file, check the File Contains Header box.
If you have blank rows in your file, you can use the Consecutive Blank Rows to Indicate End of File option to specify the number of blank rows that will indicate the end of the file.
Use the Worksheet option to specify if you want to read data from a specific worksheet in your excel file.
In the Start Address option, you can indicate the cell value from where you want Astera to start reading the data.
Advanced File Options
In the Header spans over option, give the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
Check the Enforce exact header match option if you want the header to be read as it is.
Check the Column order in file may be different from the layout option if the field order in your source layout is different from the field order in Astera layout.
Check on Column headers in file may be different from the layout if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Astera layout. You can create a Synonym Dictionary file to store the values for alternate headers. You can also use Synonym Dictionary file to facilitate automapping between objects that use alternate names in field layouts.
String Processing
String processing options come in use when you are reading data from a file system and writing it to a database destination.
Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Astera will omit those accordingly in the output.
Check the Trim strings option when you want to omit any extra spaces in the field value.
Once you have specified the data reading options on this screen, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of your Excel source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column) and double-click on it. A blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line and select Delete from the context menu.
This will Delete the entire row from the layout.
Note: Modifying the layout (adding or deleting fields) in the Layout Builder window in Astera will not make any changes to the actual source file. The layout is specific to Astera only.
If you want to change the position of any field and want to move it below or above another field in the layout, you can do this by selecting the row and using Move up/move down keys.
Note: You will find the Move up/Move down icons on the top left of the Layout Builder.
For example: To move the Country field right below the Region field, we will select the row and use the Move up key to move this field from the 9th row to the 8th.
Other options that the Layout Builder provides:
After you are done customizing the Object Builder, click Next. You will be taken to a new window,Config Parameters. Here, you can further configure and define parameters for the Excel source.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Once you have been through all the configuration options, click OK.
The ExcelSource object is now configured according to the changes made.
You have successfully configured your Excel Workbook Source object. The fields from the source object can now be mapped to other objects in the dataflow.
The Excel Workbook Report object in Astera is designed to tabulate information from selected fields and present the results in a one- or two-dimensional matrix. This feature enables deeper analysis of data by organizing it in a way that facilitates the identification of trends, patterns, and insights.
To get the object from the Toolbox, go to Toolbox > Destinations > Excel Workbook Report. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Excel Workbook Report object onto the designer.
The dragged report object is empty right now. This is because the data fields are not mapped to it yet. While any source can be used, for this particular use case, we will demonstrate using a Report Source that is extracting data from a PDF source file.
Configure the source object and place it onto the designer next to the Excel Workbook Report object.
Note: We are using the data that we extracted from a customer invoice with the help of Astera's report modeling tool.
Now map the data fields from the source object to the report object.
To configure the Excel Workbook Report object, right-click on the header, select Properties from the context menu and a dialog box will open.
Provide the File Path. This is where the excel report file will be saved.
The dialog box has some other configuration options such as worksheet and start address, which work just like that of Excel Workbook Destination.
Once The File Path and Data reading options have been specified on this screen, click Next.
The next window is the Layout Builder. On this window, the layout of the excel report file can be modified.
Here, you can write names of fields as you want them to appear in your destination in the Header column and specify the relevant Aggregate Functions for them.
Aggregate Functions define how the data will be summarized in the report:
Group By: Groups records based on unique values in the specified field.
Sum: Calculates the total sum of the specified field.
Count: Counts the number of records.
Average: Calculates the average value of the specified field.
Max: Finds the maximum value in the specified field.
Min: Finds the minimum value in the specified field.
First: Returns the first record in a sorted list based on the specified field.
Last: Returns the last record in a sorted list based on the specified field.
Variance: Calculates the variance of the specified field.
Standard Deviation: Calculates the standard deviation of the specified field.
None: Includes the field in the report without applying any aggregation. This is useful when you want certain field values in the data lines but don’t want to apply any aggregation on them.
For this case:
AccountID: We will select the Group By option from the Aggregate Function drop-down list for this field as we want to group the records based on individual accounts.
OrderID: We will select the Group By option from the Aggregate Function drop-down list for this field want to see orders within each account.
TOTAL: For this field we will select the Aggregate Function Sum, to calculate the total amount per order.
QUANTITY: For this field we will select the Aggregate Function Sum, to calculate the total quantity per order.
ITEM: Since we want to show item names in the data lines but do not want to apply any aggregates on them, we will select Aggregate Function None.
The same approach will be applied to the ITEM CODE, DESCRIPTION, and PRICE fields. We will select Aggregate Function None for each of these fields to ensure that their specific values are displayed in separate data lines without any aggregation.
Click Next. The Report Options window will now open.
Report Type: You can select from three report types. Summary, Cross Tab or Time Series.
Note: For this use case we have chosen Summary as the Report Type.
Title: To provide a meaningful title to your report, enter a new title into the Title field.
Subtotal Text: You can specify the name for Subtotal field.
Grand Total Text: You can specify the name for Grand total field.
Enable Case Sensitive Match – Check this option if you want to Group your data on a case sensitive basis. For example, if you have customer names like "john" and "John," enabling this option will treat them as distinct groups rather than combining them into a single group.
Style
You can also modify the style of your report.
Show Data Lines: If you want to see the subtotals and grand totals along with the actual records of data then you check this option.
Insert Blank Line Before Grand Total: Inserts a blank line before the grand total in the report.
Write Grand Total: Adds the grand total to the report. If unchecked, the grand total won't be included.
Insert Blank Line Before Subtotal: Inserts a blank line before each subtotal in the report.
Insert Blank Line After Subtotal: Inserts a blank line after each subtotal in the report.
Click Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
The Excel Report object is successfully configured, and the report file can now be created by running the dataflow.
Below, you can see a sample of how the summary report appears.
Note: If you want to see only the subtotals and the grand total you can uncheck the Show Data Lines option in the Report Options wizard. This will display only the summarized totals without individual records.
A Crosstab Summary displays summarized information about two fields in a two-dimensional matrix. The values for one field are displayed down the left-most column of the matrix and the values for the other key field are displayed across the top row as columns. This two-dimensional arrangement displays only a single measure at a time.
Let’s see how we can make a Cross Tab Summary using Excel Report Source.
Since we want to use information from two tables (order and order details) we have joined them and to calculate the total we used an expression object. We can then map data fields we needed from both to the Excel Report Object.
For this case:
CustomerID: We will select the Group By option from the Aggregate Function drop-down list for this field as we want to group the records based on individual customers.
ProductID: We will select the None option from the Aggregate Function drop-down list for this field as we want to spread the Product Values to spread across the top row(columns).
Total: We will select the Sum option from the Aggregate Function drop-down list for this field as we want totals for each type of product and the totals for each customer.
Note: For this use case we have chosen Cross Tab as the Report Type.
Title: To provide a meaningful title to your report, enter a new title into the Title field.
Subtotal Text: You can specify the name for Subtotal field.
Row Total Text: You can specify the name for Row total field.
Grand Total Text: You can specify the name for Grand total field.
Enable Case Sensitive Match – Check this option if you want to Group your data on a case sensitive basis.
Crosstab Options
Column Field: You can select which field or attribute you want to use as the column headers in the resulting cross-tabulation report.
Note: The selected column field should have Aggregate Function None selected in the layout builder.
Row Totals – Check this Option if you want to add each rows total to your report.
Style
You can also modify the style of your report.
Add Grand Total: Inserts the grand total in the report.
Add Blank Line Before Grand Total: Inserts a blank line before the grand total in the report.
Add Row Total: Inserts the row totals in the report.
Add Blank Line After Subtotal: Inserts a blank line after each subtotal in the report.
Click on Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
After defining the options, Click Ok.
The Excel Report object is successfully configured, and the report file can now be created by running the dataflow.
Below, you can see a sample of how the summary report appears. The summary in the table shows the sales data for different products purchased by various customers, identified by their CustomerID.
Time Series summary displays summarized information about two key fields in a two - dimensional matrix. The values for one field are displayed down the left most column of the matrix and the time intervals (such as days, months, quarters, or years) are displayed across the top row as columns.
Let’s see how we can make a Time Series Summary using Excel Report Source.
Since we want to use information from two tables (order and order details) we have joined them and to calculate the total we used an expression object. We can then map data fields we needed from both to the Excel Report Object.
For this case:
CustomerID: We will select the Group By option from the Aggregate Function drop-down list for this field as we want to group the records based on individual customers.
OrderDate: We will select the None option from the Aggregate Function drop-down list for this field as we want to use this date field across the top row(columns).
Total: We will select the Sum option from the Aggregate Function drop-down list for this field as we want the totals for each customer.
Note: For this use case we have chosen Time Series as the Report Type.
Title: To provide a meaningful title to your report, enter a new title into the Title field.
Subtotal Text: You can specify the name for Subtotal field.
Row Total Text: You can specify the name for Row total field.
Grand Total Text: You can specify the name for Grand total field.
Enable Case Sensitive Match – Check this option if you want to Group your data on a case sensitive basis.
Timeseries Report Options
Time Unit Drop-down: You can specify the time interval for the time series analysis. Available options include:
Year: Analyze data on a yearly basis.
Month: Analyze data on a monthly basis.
Day: Analyze data on a daily basis.
Week: Analyze data on a weekly basis.
Quarter: Analyze data on a quarterly basis.
Start Date: You can specify the start date for the time series analysis. This defines the beginning of the time period for which data will be analyzed.
End Date: You can specify the end date for the time series analysis. This defines the end of the time period for which data will be analyzed.
Date Field: Field from the dataset that contains the date or timestamp information. The selected date field will be used to create the time series.
Style
You can also modify the style of your report.
Add Grand Total: Inserts the grand total in the report.
Add Blank Line Before Grand Total: Inserts a blank line before the grand total in the report.
Add Row Total: Inserts the row totals in the report.
Add Blank Line After Subtotal: Inserts a blank line after each subtotal in the report.
Click on Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
After defining the options, Click Ok.
The Excel Report object is successfully configured, and the report file can now be created by running the dataflow.
Below, you can see a sample of how the summary report appears. This summary table shows the number of sales across different years for customers, identified by their CustomerID.
The Email Source object in Astera enables users to retrieve data from emails and process the incoming email attachments.
In this section, we will cover how to get the Email Source object onto the dataflow designer from the Toolbox.
To get an Email Source object from the Toolbox, go to Toolbox > Sources > Email Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Email Source object onto the designer.
You can see some built-in fields and an Attachments node.
Double-click on the header of the Email source object to go to the Properties window
A configuration window for the Email Source object will open. The Email Connection window is where you will specify the connection details.
Url: The address of the mail server on which the connection will be configured.
Login Name: The Hostname
Password: Password of the user.
Port: The port of the mail server on which to configure. Some examples of SMTP provider ports are; 587 for outlook, 25 for Google, etc.
Connection Logging: Connection logging is used to log different types of messages or events between the client and the server. In case of error or debugging purposes, the user can see them.
Astera supports 4 types of Connection Logging methods:
Verbose: Captures everything.
Debug: Captures only the content that can be used in debugging.
Info: Captures information and general messages.
Error: Captures only the errors.
If you have configured email settings before, you can access the configured settings from the drop-down list next to the Recent option. Otherwise, provide server settings for the mailing platform that you want to use. In this case, we are using an Outlook server.
Test your connection by clicking on Test Connection, this will give you the option to send a test mail to the login email.
Click Next. This is the Email Source Properties window. There are two important parts in this window:
Download attachment options
Email reading options
Check the Download Attachments option if you want to download the contents of your email. Specify the directory where you want to save the email attachments, in the provided field next to Directory.
The second part of the Email Source Properties window has the email reading options that you can work with to configure various settings.
Read Unread Only – Check this option if you only want to process unread emails.
Mark Email as Read – Check this option if you want to mark processed emails as read.
Folder – From the drop-down list next to Folder, you can select the specific folder to check, for example, Inbox, Outbox, Sent Items etc.
Filters - You can apply various filters to only process specific emails in the folder.
From Filter: Filters out emails based on the sender’s email address.
Subject Filter: Filters out emails based on the text of the subject line.
Body Filter: Filters out emails based on the body text.
Click OK.
Right-click on the Email Source object’s header and select Preview Output from the context menu.
A Data Preview window will open and will show you the preview of the extracted data.
Notice that the output only contains emails from the email address specified in the Filter section.
The File System Items Source in Astera Data Stack is used to provide metadata information to a task in a dataflow or workflow. In a dataflow, it can be used in conjunction with a source object, especially in cases where you want to process multiple files through the transformation and loading process.
In a workflow, the File System Items Source object can be used to provide input paths to a subsequent object such as a RunDataflow task.
Let’s see how it works in a dataflow.
Here we have a dataflow that we want to run on multiple source files that contain Customer_Data from a fictitious organization. We are going to use the source object as a transformation and provide the location of the source files using a File System Items Source object. The File System Items Source will provide the path to the location where our source files reside and the source will object pick the source files from that location, one by one, and pass it on for further processing in the dataflow.
Here, we want to sort the data, filter out records of customers from Germany and write the filtered records into a database table. The source data is stored in delimited (.csv) files.
First, change the source object into a Transformation object. This is because the data is stored in multiple delimited files and we want to process all of them in the dataflow. For this, right-click on the source object’s header and click Transformation in the context menu.
You can see that the color of the source object has changed from green to purple which indicates that the source object has been changed into a transformation object.
Notice that the source object now has two nodes: Input and Output. The Input node has an input mapping port which means that it can take the path to the source file from another object.
Now we will use a File System Items Source object to provide a path to Customer_Data Transformation object. Go to the Sources section in the Toolbox and drag-and-drop the File System Items Source object onto the designer.
If you look at the File System Items Source object, you can see that the layout is pre-populated with fields such as FileName, FileNameWithoutExtension, Extension, FullPAth, Directory, ReadOnly, Size, and other attributes of the files.
To configure the properties of the File System Items Source object, right-click on the File System Items Source object’s header and go to Properties.
This will open the File System Properties window.
The first thing you need to do is point the Path to the directory or folder where your source files reside.
You can see a couple of other options on this screen:
Filter: If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “*.csv” in the Filter field to filter and read delimited files only.
Include items in subdirectories: Check this option if you want to process files present in the sub-directories
Include Entries for Directories: Check this option if you want to include all items in the specified directory
Once you have specified the Path and other options, click OK.
Now right-click on the File System Items Source object’s header and select Preview Output.
You can see that the File System Items Source object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the FileName, FileNameWithoutExtension, Extension, FullPath, Directory, and other attributes such as whether the file is ReadOnly, FileSize, LastAccessed, and other details in the output.
Now let’s start mapping. Map the FullPath field from the File System Items Source object to the FullPath field under the Input node in the Customer_Data Transformation object.
Once mapped, when we run the dataflow, the File System Items Source will pass the path to the source files, one by one, to the Customer_Data Transformation object. The Customer_Data Transformation object will read the data from the source file and pass it to the subsequent transformation object to be processed further in the dataflow.
In a workflow, the File System Items Source object can be used to provide input paths to a subsequent task such as a RunDataflow task. Let’s see how this works.
We want to design a workflow to orchestrate the process of extracting customer data stored in delimited files, sorting that data, filtering out records of customers from Germany and loading the filtered records in a database table.
We have already designed a dataflow for the process and have called this dataflow in our workflow using the RunDataflow task object.
We have multiple source files that we want to process in this dataflow. So, we will use a File System Items Source object to provide the path to our source files to the RunDataFlow task. For this, go to the Sources section in the Toolbox and drag-and-drop the File System Items Source onto the designer.
If you look at the File System Items Source, you can see that the layout is pre-populated with fields such as FileName, FileNameWithoutExtension, Extension, FullPAth, Directory, ReadOnly, Size, and other attributes of the files. Also, there is this small blue icon with the letter ‘s’, this indicates that the object is set to run in Singleton mode.
By default, all objects in a workflow are set to execute in Singleton mode. However, since we have multiple files to process in the dataflow, we will set the File System Items Source object to run in loop. For this, right-click on the File System Items Source and click Loop in the context menu.
You can see that the color of the object has changed to purple, and it now has this purple icon over the header which denotes the loop function.
It also has these two mapping ports on the header to map the File System Items Source object to the subsequent action in the workflow. Let’s map it to the RunDataflowTask.
To configure the properties of the File System Items Source, right-click on the File System Item Source object’s header and go to Properties.
This will open the File System Items Source Properties window.
The first thing you need to do is point the Path to the directory or folder where your source files reside.
You can see a couple of other options on this window:
Filter: If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “*.csv” in the Filter field to filter and read delimited files only.
Include items in subdirectories: Check this option if you want to process files present in the sub-directories.
Include Entries for Directories: Check this option if you want to include all items in the specified directory.
Once you have specified the Path and other options, click OK.
Now right-click on the File System Items Source object’s header and click Preview Output.
You can see that the File System Items Source object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the FileName, FileNameWithoutExtension, Extension, FullPath, Directory, and other attributes such as whether the file is ReadOnly, FileSize, LastAccessed, and other details in the output.
Now let’s start mapping. Map the FullPath field from the File System Items Source object to the FilePath variable in the RunDataflow task.
Once mapped, upon running the dataflow, the File System Items Source object will pass the path to the source files, one by one, to the RunDataflow task. In other words, the File System Items Source acts as a driver to provide source files to the RunDataflow tasks, which will then process them in the dataflow.
When the File System Items Source is set to run in a loop, the dataflow will run for ‘n’ number of times; where ‘n’ = the number of files passed by the File System Items Source to the RunDataflow task. For instance, you can see that we have six source files in the specified folder. The RunDataflow task object will pass these six files one by one to the RunDataflow task to be processed in the dataflow.
This concludes using the File System Items Source object in Astera Data Stack.
The Database Table Source object provides the functionality to retrieve data from a database table. It also provides change data capture functionality to perform incremental reads, and supports multi-way partitioning, which partitions a database table into multiple chunks and reads these chunks in parallel. This feature brings about major performance benefits for database reads.
The object also enables you to specify a WHERE clause and sort order to control the result set.
In this article, we will be discussing how to:
Get a Database Table Source object on the dataflow designer.
Configure the Database Table Source object according to the required layout and settings.
We will also be discussing some best practices for using a Database Table Source object.
To get a Database Table Source from the Toolbox, go to Toolbox > Sources > Database Table Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Database Table Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Database Table Source object, right-click on its header and select Properties from the context menu.
A dialog box will open.
This is where you can configure the properties for the Database Table Source object.
The first step is to specify the Database Connection for the source object.
Provide the required credentials. You can also use the Recently Used drop-down menu to connect to a recently connected database.
You will find a drop-down list next to the Data Provider.
This is where you select the specific database provider to connect to. The connection credentials will vary according to the provider selected.
Test Connection to make sure that your database connection is successful and click Next.
Next, you will see a Pick Source Table and Reading Options window. On this window, you will select the table from the database that you previously connected to and configure the table from the given options.
From the Pick Table field, choose the table that you want to read the data from.
Note: We will be using the Customers table in this case.
Once you pick a table, an icon will show up beside the Pick Table field.
View Data: You can view data in a separate window in Astera.
View Schema: You can view the schema of your database table from here.
View in Database Browser: You can see the selected table in the Database Source Browser in Astera.
Table Partition Options
This feature substantially improves the performance of large data movement jobs. Partitioning is done by selecting a field and defining value ranges for each partition. At runtime, Astera generates and runs multiple queries against the source table and processes the result set in parallel.
Check the Partition Table for Reading option if you want your table to be read in partitions.
You can specify the Number of Partitions.
The Pick Key for the Partition drop-down will let you choose the key field for partitioning the table.
If you have specific key values based on which you want to partition the table, you can use the Specify Key Values (Separated by comma) option.
The Favor Centerprise Layout option is useful in cases where your source database table layout has changed over time, but the layout built in Astera is static. And you want to continue to use your dataflows even with the updated source database table layout. You check this option and Astera will favor its own layout over the db layout.
Incremental Read Options
The Database Table Source object provides incremental read functionality based on the concept of audit fields. Incremental read is one of the three change data capture approaches supported by Astera. Audit fields are fields that are updated when a record is created or modified. Examples of audit fields include created date time, modified date time, and version number.
Incremental read works by keeping a track of the highest value for the specified audit field. On the next run, only the records with value higher than the saved value are retrieved. This feature is useful in situations where two applications need to be kept in sync and the source table maintains audit field values for rows.
Select Full Load if you want to read the entire table.
Select Incremental Load Based on Audit Fields to perform an incremental read. Astera will start reading the records from the last read.
Checking the Perform full load on next run option will override the incremental load function from the next run onwards and will perform a full load on it.
Use Audit Field to compare when the last read was performed on the dataset.
Specify the path to the file in the File Path, that will store incremental transfer information.
The next window is the Layout Builder. In this window you can modify the layout of your database table.
Note: By default, Astera reads the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear in which you will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
Note: Modifying the layout (adding or deleting fields) from the Layout Builder in Astera will not make any changes to the actual database table. The layout is only specific to Astera.
If you want to change the position of any field and want to move it below or above another field in the layout, you can do this by selecting the row and using the Move up/Move down keys.
Note: You will find the Move up/Move down icons on the top left of the builder.
For example: We want to move the Country field right below the Region field. We will select the row and use the Move up key to move the field from the 9th row to the 8th.
After you are done customizing the Layout Builder, click Next. You will be taken to a new window, Where Clause. Here, you can provide a WHERE clause, which will filter the records from your database table.
Note: If the wizard is left blank, Astera will use the default values of the database table.
For instance, if you add a WHERE clause that selects all the customers from the country “Mexico” in the Customers table.
Your output will be filtered out and only the records that satisfy the WHERE condition will be read by Astera.
Once you have configured the Database Table Source object, click Next.
A new window, Config Parameters will open. Here, you can define parameters for the Database Table Source object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters can be changed in the Config Parameters wizard page. Parameters left blank will use their default values assigned on the properties page.
Click OK.
You have successfully configured your Database Table Source object. The fields from the source object can now be mapped to other objects in a dataflow.
To get the Database Table Source object from the Data Source Browser, go to View > Data Source Browser or press Ctrl + Alt + D.
A new window will open. You can see that the pane is empty right now. This is because we are not connected to any database source yet.
To connect the browser to a database source, click on the first icon located at the top left corner of the pane.
A Database Connection box will open.
This is where you can connect to your database from the browser.
You can either connect to a Recently Used database or create a new connection.
Note: In this case we will use one of our recent connections.
To create a new connection, select your Data Provider from the drop-down list.
Note: We will be using the SQL Server in this case.
The next step is to fill in the required credentials. Also, to ensure that the connection is successfully made, select Test Connection.
Once you test your connection, a dialog box will indicate whether the test was successful or not.
Click OK.
Once you have connected the browser, your Data Source Browser will now have the databases that you have on your server.
Select the database that you want to work with and then choose the table you want to use.
Note: In this case we will be using the Northwind database and Customers table.
Drag-and-drop Customers table onto the designer in Astera.
If you expand the dropped object, you will see that the layout for the source file is already built. You can even preview the output at this stage.
Right-clicking on the Database Table Source object will also display options for the database table.
Show in DB Browser - Will show where the table resides in the database in the Database Browser.
View Table Data - Builds a query and displays all the data from the table.
View Table Schema - Displays the schema of the database table.
Create Table - Creates a table on a database based on the schema.
Once you have logged into the Astera client, you can set up an admin email to access the Astera server. This will also allow you to be able to use the “Forgot Password” option at the time of log in.
In this document, we will discuss how to verify admin email in Astera.
1. Once logged in, we will now proceed to enter an email address to associate with the admin user by verifying the email address.
Go to Server > Configure > Step 3: Verify Admin Email
2. Unless you have already set up an email address in the Mail Setup section of Cluster settings, the following dialogue box will pop up asking you to configure your email settings.
Click on Yes to open your cluster settings.
Click on the Mail Setup tab.
3. Enter your email server settings.
4. Now, right-click on the Cluster Settings active tab and click on Save & Close in order to save the mail setup.
5. Re-visit the Verify Admin Email step by going to Server > Configure > Step 3: Verify Admin Email.
This time, the Configure Email dialogue box will open.
6. Enter the email address you previously set up and click on Send OTP.
7. Use the OTP from the email you received and enter it in the Configure Email dialogue and proceed.
On correct entry of the OTP, an email successfully configured dialogue will appear.
8. Click OK to exit it. We can confirm our email configuration by going to the User List.
Right click on DEFAULT under Server Connections in the Server Explorer and go to User List.
9. This opens the User List where you can confirm that the email address has been configured with the admin user.
The feature is now configured and can be utilized when needed by clicking on Forgot Password in the log in window.
This opens the Password Reset window, where you can enter the OTP sent to the specified e-mail for the user and proceed to reset your password.
This concludes our discussion on verifying admin email in Astera.
The ETL and ELT functionality of Astera Data Stack is represented by Dataflows. When you open a new Dataflow, you’re provided with an empty canvas knows as the dataflow designer. This is accompanied with a Toolbox that contains an extensive variety of objects, including Sources, Destinations, Transformations, and more.
Using the Toolbox objects and the user-friendly drag-and-drop interface, you can design ETL pipelines from scratch on the Dataflow designer.
The Dataflow Toolbar also consists of various options.
These include:
Undo/Redo: The Dataflow designer supports unlimited Undo and Redo capability. You can quickly Undo/Redo the last action done, or Undo/Redo several actions at once.
Auto Layout Diagram: The Auto Layout feature allows you to arrange objects on the designer, improving its visual representation.
Zoom (%): The Zoom feature helps you adjust the display size of the designer. Additionally, you can select a custom zoom percentage by clicking on the Zoom % input box and typing in your desired value.
Auto-Size All: The Auto-Size All feature resizes all the object in a manner where all fields of the expanded nodes are visible and empty area inside the object is cropped out.
Expand All: The Expand All feature expands or enlarges the objects on the designer, improving the visual representation.
Collapse All: The Collapse All feature closes or collapses the objects on the designer, improving the visual representation and reducing clutter.
Use Orthogonal Links: The Use Orthogonal Links feature replaces the links between objects with orthogonal curves instead of straight lines.
Data Quality Mode: Data Quality Mode in Astera enhances Dataflows with advanced profiling and debugging by adding a Messages node to objects. This node captures statistical information, such as, TotalCount, ErrorCount, and WarningCount etc.
Safe Mode: The Safe Mode option allows you to study and debug your Dataflows in cases when access to source files or databases is not available. You can open a Dataflow/Subflow and then proceed to debug or understand it after activating Safe Mode.
Show Diagram Overview: This feature opens a Diagram Overview panel, allowing you to get an overview of the whole Dataflow designer.
Link Actions to Create Maps Using AI: The AI Auto-mapper semantically maps fields between different data layouts, automatically linking related fields, for example, "Country" to "Nation."
In the next sections, we will go over the object-wise documentation for the various Sources, Destination, Transformations, etc., in the Dataflow Toolbox.
Silent installation refers to the installation of software or applications on a computer system without requiring any user interaction or input. In a silent installation, the installation process occurs in the background, without displaying any user interfaces, prompts, or dialog boxes that would normally require the user to make choices or provide information. This type of installation is particularly useful in scenarios where an administrator or IT professional needs to deploy software across multiple computers or systems efficiently and consistently.
Obtain the installer file you want to install silently. This could be an executable (.exe), Microsoft Installer (MSI), or any other installer format.
For example, in this article we will be using the ReportMiner.exe file to perform the silent installation.
To initiate the silent installation, you'll need to use a command-line interface. Open the Command Prompt as an administrator.
To achieve this, first search for “Command Prompt” in the Windows search bar then right-click the Command Prompt app, and select Run as administrator from the context menu. This will launch the Command Prompt with administrative privileges.
Locate the installation file and open its location in Windows Explorer. Once you have located the file in Windows Explorer, the full path will be displayed in the address bar at the top of the window. The address bar shows the complete path from the drive letter to the file's location.
For example, this file is located at "C:\Users\muhammad.hasham\Desktop\Silent Installation Files" as evident with the full path displayed in the address bar.
Alternatively, you can also right-click the file and select Properties from the context menu. In the Properties Window, you'll find a Location field that displays the full path to the file.
To silent install the file, change your current location to the specific folder containing the installer using the Command Prompt. To do so, enter the following command in the Command Prompt:
For example:
Use the appropriate command to run the installer in silent mode. This command might involve specifying command-line switches that suppress dialogs and prompts.
General File:
Example:
General File:
Example:
Note: INSTALLDIR=“path\toInstall\files” during installation is entirely optional. If you choose to provide this parameter, the software will be installed in the designated location. However, if you omit this parameter, the software will be installed by default in the “Program Files” folder on the C drive.
To run it in this manner:
The silent installation might take some time. Wait for the installation process to finish. Depending on the software, you might receive an output indicating the progress and success of the installation.
After the installation is complete, verify that the software is installed as expected. You might want to check the installation directory, program shortcuts, or any other relevant indicators.
Use the provided command in the Command Prompt to remove the silently installed file.
General File:
Example:
This concludes our discussion on Silent Installations.
The Fixed-Length File Source object in Astera provides a high-speed reader for files containing fixed length records. It supports files with record delimiters as well as files without record delimiters.
In this section, we will cover how to get Fixed Length File Source object on the dataflow designer from the Toolbox.
To get a Fixed Length File Source object from the Toolbox, go to Toolbox > Sources > Fixed Length File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Fixed Length File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Fixed Length File Source object, right-click on its header and select Properties from the context menu.
When you select the Properties option from the context menu, a dialog box will open.
This is where you configure the properties for Fixed Length File Source object.
The first step is to provide the File Path for the Fixed Length File Source object. By providing the File Path you are building the connectivity to the source dataset.
Note: In this case we are going to be using a fixed length file that contains Orders sample data. This file works with the following options:
File Contains Headers
Record Delimiter is specified as
The dialog box has some other configuration options:
If the source File Contains Header and you want the Astera source layout to read headers from the source file, check this option.
If you want the file to be read in portions, for instance, your file has data over 1000 rows, upon selecting Partition File for Reading, Astera will read your file according to the specified Partition Count. For example, a file with 1000 rows, with the Partition Count specified as 2, will be read in two partitions of 500 rows each. This is a back-end process that makes data reading more efficient and helps in processing data faster. This will not have any effect on your output.
Record Delimiter field allows you to select the delimiter for the records in the source file. The choices available are carriage-return line-feed combination , carriage-return and line-feed**. You can also type the record delimiter of your choice instead of choosing from the available options.
In case the records do not have a delimiter and you rely on knowing the size of a record, the number in the Record Length field is used to specify the character length for a single record.
The Encoding field allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Check the This is a COBOL data file option if you are working with COBOL files and do not have COBOL copybooks, you can still import this data by visually marking fields in the layout builder and specifying field data types. For more advanced parsing of COBOL files, you can use Astera’s COBOL File Source.
To define a hierarchical file layout and process the data file as a hierarchical file check the This is a Hierarchical File option. Astera IDE provides extensive user interface capabilities for processing hierarchical structures.
Advanced File Options
In the Header spans over field, give the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
Check the Enforce exact header match option if you want the header to be read as it is.
Check the Column order in file may be different from the layout option, if the field order in your source layout is different from the field order in Astera’s layout.
Check the Column headers in file may be different from the layout option if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Astera’s layout. You can create a Synonym Dictionary file to store the values for alternate headers. You can also use the Synonym Dictionary file to facilitate automapping between objects on the flow diagram that use alternate names in field layouts.
To skip any unwanted rows at the beginning of your file, you can specify the number of records that you want to omit through the Skip initial records option.
Raw text filter
If you do not want to apply any filter and process all records, check the No filter. Process all records option.
If there is a specific value which you want to filter out, you can check the Process if begins with option and specify the value that you want Astera to read from the data, in the provided field.
If there is a specific expression which you want to filter out, you can check the Process if matches this regular expression option and give the expression that you want Astera to read from the data, in the provided field.
String Processing
String processing options come in use when you are reading data from a file system and writing it to a database destination.
Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Astera will omit those accordingly in the output.
Check the Trim strings option when you want to omit any extra spaces in the field value.
Once you have specified the data reading options on this window, click Next.
The next window is the Length Markers window. You can put marks and specify the columns in your data.
Using the Length Markers window, you can create the layout of your fixed-length file. To insert a field length marker, you can click in the window at any point. For example, if you want to set the length of a field to contain five characters and the field starts at five, then you need to click at the marker position nine.
Note: In this case we are using a fixed length file with Orders sample data.
If you point your cursor to where the data is starting from, (in this case next to OrderID) and double-click on it, Astera will automatically detect columns and put markers in your data. Blue lines will appear as markers on the columns that will get detected.
You can modify the markers manually. To delete a marker, double-click on the column which has been marked.
In this case we removed the second marker and instead added a marker after CustomerID and EmployeeID.
In this way you can add as many markers as the number of columns/fields there are in the data set.
You can also use the Build from Specs feature to help you build destination fields based on an existing file instead of manually specifying the layout.
After you have built the layout by inserting the field markers, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of your fixed length source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
Note: Make sure to specify the length of the field that you have added in the properties of the field.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will open where you will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
Note: Modifying the layout (adding or deleting fields) from the Layout Builder in Astera will not make any changes to the actual source file. The layout is specific to Astera only.
Other options that the Layout Builder provides are:
After you are done customizing the layout in the Object Builder window, click Next. You will be taken to a new window, Config Parameters. Here, you can define parameters for the Fixed Length File Source.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Once you have been through all configuration options, click OK.
The FixedLengthFileSource object is now configured.
The Fixed Length File Source object has now been modified from its previous configuration. The new object has all the modifications that we specified in the Layout Builder.
In this case, the modifications that we made were:
Separated the EmployeeID column from the OrderDate column.
Added the CustomerName column.
You have successfully configured your Fixed Length File Source object. The fields from the source object can now be mapped to other objects in a dataflow.
The Data Profile feature provides a complete data field statistic – basic and detailed – containing information such as the data type, minimum/maximum values, data count, error count etc. The statistics are collected for each of the selected fields at the time the dataflow runs.
In this document, we will learn how to create a Data Profile in Astera.
We want to collect statistics on these fields of data. For this purpose, we will use Astera’s Data Profile feature.
To get the Data Profile object from the Toolbox, go to Toolbox > Data Profiling > Data Profile. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Data Profile object onto the dataflow designer.
You can see that the Data Profile object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map the fields from the source object onto the profile object.
Note: A Data Profile object is designed to capture statistics for an entire field layout. For this reason, it should be linked to the main Output port of the object whose field statistics you wish to collect.
To configure the Data Profile object, right-click on its header and select Properties from the context menu.
A configuration window will open. The first screen you will see is the Layout Builder. This is where we can create or delete fields, change field names, and their data type.
Click Next. This is the Properties window.
Here we will provide the Profile File path to specify where the profile should be stored.
Specify the type of Field Statistics to be collected.
Field Statistics dropdown allows you to choose detail levels of statistics to collect. Select among the following detail levels:
Basic Statistics: This is the default mode. It captures the most common statistical measures for the field’s data type.
No Statistics: No statistics is captured by the Data Profile.
Detailed Statistics – Case Sensitive Comparison: Additional statistical measures are captured by the Data Profile, for example Mean, Mode, Median etc. using case-sensitive comparison for strings.
Detailed Statistics – Case Insensitive Comparison: Additional statistics are captured by the Data Profile, using case insensitive comparison for strings.
In this case, we are collecting a Detailed Statistics – Case Sensitive Comparison
Click OK.
A Job Progress window will open at this instant and will show you the trace of the job.
2. Click on the Profile link provided in the Job Progress window and the profile will open in Astera. Expand the Profile node to see each field inside the object. Click on these fields to see the collected statistical values.
The Data Quality Rules object found in the Data Profiling section of the Toolbox is used to apply one or more conditions, called Data Quality Rules, against incoming records. Records that do not meet the data quality rule criteria will be assigned the ‘Error’ status and may be optionally excluded from processing by the downstream objects.
Data Quality Rules is a record-level component which means that it does not require the entire dataset to flow through it. In other words, you can map a single or a couple of fields to the Data Quality Rules component to set up quality validation criteria and the transformed records can be mapped further in the dataflow.
Let’s understand the application and usage of Data Quality Rules with the following example.
Here we have sample data of employees of a fictitious organization which we have retrieved using an Excel Workbook Source.
If we look at the preview of the Employee_Report dataset, the values in the SalariedFlag column specify whether an employee is salaried in terms of 0 and 1.
1 = the employee is salaried
0 = the employee is non-salaried and therefore is eligible for overtime.
We can apply data quality rules to these values and identify which employees are not salaried and therefore, are eligible for overtime. The Data Quality Rules object will process all records and those that do not match the criteria will be returned with an error. This means that in this example, the salaried employees with the salary flag ‘True’ will return an error, whereas the records of employees with the salary flag ‘False’ will pass the data quality rule.
To do this, drag the Data Quality Rules object from the Data Profiling section in the Toolbox and drop it onto the dataflow designer.
Now, map the SalariedFlag field to the Data Quality Rules object.
Right-click on the Data Quality Rules object and select Properties from the context menu.
This will open a new window. This is the Layout Builder, where you can see the ‘SalariedFlag’ field we have mapped from our source.
Click Next to proceed to the Data Quality Rules window.
Once a new rule is added, the options on this window will activate and the rule will be added to the grid.
Let’s explore these options one by one:
Description: The Description field contains the name or description of the rule. By default, the rules are termed as Rule1, Rule2 and so on, depending on the number of rules you add. But you can also rename the rules for better understanding and convenience.
In our case, as we want to set a data quality criteria to identify non-salaried employees, we can rename the rule as “NonSalariedEmployeesRule.”
Attach rule to the field: This is a drop-down list using which you can attach a rule to a particular field. You can see that there is a root node named Data Quality Rules.
Listed within the Data Quality Rules node are the fields mapped to the Data Quality Rules object. Here we have only one field mapped to which we want to apply this rule. In case you want to apply a rule to the whole dataset, you can simply double-click on the Data Quality Rules root node and the rule will be applied to all fields mapped to the Data Quality Rules object.
In this case, we will map the rule to the SalariedFlag field.
Expression box: This is where you can type in the expression for your rule.
In this example, we want to validate records with the Salary Flag ‘False.’ To do this we will write the expression:
‘SalariedFlag = 0’ in the Expression field.
Observe that, simultaneously, Astera shows you a compile status of your expression below the expression box.
It says ‘Successful’ so we can click OK. Alternatively, it will give you an error if the expression is incorrect and you will have to correct the expression before clicking OK.
Show Message: We can also write a message to show up with the errors, which can also be written to the error log. Let’s write a message:
‘Salaried employees are not eligible for overtime.’
This message will help identify why a particular record was marked erroneous. And in case multiple rules are applied, the message will point out which rule was not qualified by a particular record.
Next, we have two checkboxes:
Active – to activate a rule.
Is Error – when this is checked, all records that return an error will not be written to a target. Which means that only the records that have passed the data quality rule will flow further in the dataflow pipeline.
However, if we uncheck this option, it will automatically check the Warning checkbox. This will return the records that failed to match the rule with a warning and will be written to a target.
In this case, let’s keep the errors as errors by checking the Is Error box.
Now we have set up a data quality rule.
Now, let’s look at the preview. Right-click on the Data Quality Rules object and select Preview Output from the context menu.
You can see that the records that have matched the rule, the records with ‘False’ salary flag, have been validated. On the other hand, the records that failed to match the rule, the records with the ‘True’ flag, have returned an error, denoted by a red warning sign.
If you move the cursor over this warning sign, it will show the error message in the tooltip. This is especially useful in cases where you have applied more than one rule and you want to track which records have failed to match which rule or when you want to store the erroneous records in an error log.
So now that we have validated the records against our data quality rule, we can map it to a target which is a Delimited File Destination in this case. We will name this file ‘Employees eligible for overtime,’ so the records of employees with the ‘False’ salaried flag will be passing through the Data Quality Rules object and consequently be mapped to the destination file. Let’s do the mapping.
Now, if we open the Properties window of the destination file, you can see the option, Do Not Process Records With Errors on the last window. It is checked by default in all target formats in Astera. Therefore, when we run this dataflow, all records that have matched the data quality rule will be written to the destination file, whereas records that failed to match the rule and returned an error will be omitted.
A PDF Form Source object provides users with the functionality of extracting data from a fillable PDF document. A fillable PDF document comprises of certain data points or digital fields which are editable by a user using any modern PDF viewers. They are often used instead of official documents on the web. The PDF Form Source object detects those points, extracts the written data, and creates relative fields for them.
In this article, we will explore how to make use of the PDF Form Source object in Astera to retrieve data.
Note: This is a Scholarship Application Form with fillable data fields for Personal Information, Contact Details, and Education Qualifications.
Select the PDF Form Source object from the Toolbox and drag-and-drop it onto the dataflow designer.
Right-click on the PDF Form Source object’s header and select the Properties option from the context menu.
A configuration window will open, as shown below.
Provide the File Path for the fillable PDF document.
Owner Password: If the file is protected, then enter the password that is configured by the owner of the fillable pdf document. If the file is not protected, this option can be left blank.
Use UTF-8 Encoding: Check this option if the file is UTF-8 i.e., Unicode Transformation Format – 8-bit, encoded.
Click Next.
This is the Layout Builder window, where you can see the data fields extracted from the fillable PDF document. Click Next.
This is the Config Parameters window. Click Next.
This is the General Options window. Click OK.
Right-click on the PDF Form Source object’s header and select Preview Output from the context menu.
View the data through the Data Preview window.
The data is now available for mapping. For simplicity, we will delete the non-required data fields and store the output in a separate file. To store the data, we must write it to a destination file.
We are using a Delimited Destination object. Drag-and-drop the Delimited Destination object onto the dataflow designer and map the fields from the PDF Form Source object to the destination object.
Right-click on the fields that you do not want to store and select the Remove Element option.
Note:
Do not delete the data fields from the PDF Form Source object, as it will disturb the layout that has been generated for the detected data fields.
Simply double-click or right-click on the Delimited Destination object’s header and select the Properties option from the context menu. Specify the File Path where you want to store the destination file. Click OK.
To preview the data, right-click on the destination object’s header and select Preview Output from the context menu.
Here, you can see data of the selected fields.
This is how a PDF Form Source object is used in Astera Data Stack to mine data point/digital fields from fillable PDF documents.
The Data Model Query object in Astera Data Stack allows you to extract multiple tables from a deployed data model. This is especially useful when you’re writing data to a fact table via the Fact Loader object, since the fact table contains attributes from multiple source tables.
In this article, we’ll be looking at how you can configure the Data Model Query object and use it to extract data from a source model.
Let’s assume that we have the following source model.
In this example, we’ll extract all of these tables as a source via the Data Model Query object.
To get the Data Model Query object from the toolbox, go to Toolbox > Sources > Data Model Query.
Drag and drop the Data Model Query object onto the dataflow designer.
The object is currently empty because we are yet to configure it.
To configure the Data Model Query object, right-click on its header and select Properties from the context menu. Alternatively, you can double-click on the object header.
A configuration window will pop up.
Using this window, you can configure the properties of the Data Model Query object.
On the Database Connection screen, you’ll notice that the Data Provider dropdown menu is limited to just one option: Astera Data Model. This option represents the data models that are deployed on the server and are available for usage.
Once you’ve provided your Astera Data Stack credentials and a server connection, you can select a deployed model from the Database dropdown menu.
Note: The default username is admin, and the default password is Admin123.
In this case, we’ll select DW_Source, which represents the source model that was shown earlier.
Once you’ve chosen a deployed model, click Next.
The Query Source Layout screen will appear.
On the Query Source Layout screen, you can select a root entity from a list of entities present in the source model, via the Root Entity dropdown menu.
The root entity serves as a starting point for a tree layout that includes all of the entities that you need to extract data from.
In this case, the root entity in the source data model is InvoiceLines.
Once you’ve chosen a root entity, a tree/hierarchical layout starting from the root entity will appear on the left side of the screen. You can expand the inner nodes to reveal the fields present in other entities of the source model.
Checking the Allow Collections option will enable collection nodes in the tree layout.
In the Where Clause textbox, you can add an optional SQL statement that will serve as a filter for incoming records.
Click OK once you’ve chosen a root entity. You’ve now configures the Data Model Query object. The tree layout, starting from the root entity, will be visible in the object.
The fields present in this layout can now be mapped further to other objects in the dataflow.
This concludes our discussion on the Data Model Query object.
The SQL Query Source object enables you to retrieve data from a database using an SQL query or a stored procedure. You can specify any valid SELECT statement or a stored procedure call as a query. In addition, you can parameterize your queries dynamically, thereby allowing you to change their values at runtime.
In this article, we will be looking at how you can configure the SQL Query Source object and use it to retrieve data in Astera Data Stack.
Before moving on to the actual configuration, we will have to get the SQL Query Source object from the Toolbox onto the dataflow designer. To do so, go to Toolbox > Sources > SQL Query Source. In case you are unable to view the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the SQL Query Source object onto the designer.
The source object is currently empty as we have not configured it yet.
To configure the SQL Query Source object, right-click on its header and select Properties from the context menu. Alternatively, you can double-click the header of the source object.
A new window will pop up when you click on Properties in the context menu.
In this window, we will configure properties for the SQL Query Source object.
On this Database Connection window, enter information for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you want to connect to. The required credentials will vary according to your chosen provider.
Provide the required credentials. Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A separate window will appear, showing whether your test was successful. Close this window by clicking OK, and then, click Next.
The next window will present a blank page for you to enter your required SQL query. Here, you can enter any valid SELECT statement or stored procedure to read data from the database you connected to in the previous step.
The curly brackets located on the right side of the window indicate that the use of parameters is supported, which implies that you can replace a regular value with one that is parameterized and can be changed during runtime.
In this example, we will be reading the Orders table from the Northwind database.
Once you have entered the SQL query, click Next.
The following window will allow you to check or uncheck certain options that may be utilized while processing the dataset, if needed.
When checked, The Trim Trailing Spaces option will refine the dataset by removing extra whitespaces present after the last character in a line, up until the end of that line. This option is checked by default.
The Dynamic Layout option is unchecked by default. When checked, it will automatically enable two other sub-options.
o Delete Field In Subsequent Objects: When checked, this option will delete all fields that are present in subsequent objects.
o Add Fields In Subsequent Objects: When checked, this option will add fields that are present in the source object to subsequent objects.
Choose your desired options and click Next.
The next window is the Layout Builder. Here, you can modify the layout of the table that is being read from the database. However, these modifications will only persist within Astera and will not apply to the actual database table.
To delete a certain field, right-click on its serial column and select Delete from the context menu. In this example, we have deleted the OrderDate field.
To change the position of a field, click its serial column and use the Move up/Move down icons located in the toolbar of the window. In this example, we have moved up the EmployeeID field using the Move up icon, thus shifting the CustomerID field to the third row. You can move other fields up or down in a similar manner, allowing you to modify the entire order of the fields present in the table.
Once you are done customizing your layout, click Next.
In the Config Parameters window, you can define certain parameters for the SQL Query Source object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
Enter your desired values for these parameters, if any, and click Next.
Finally, aGeneral Options window will appear. Here, you are provided with:
A text box to add Comments.
A set of General Options that have been disabled.
To conclude the configuration, click OK.
You have successfully configured the SQL Query Source object. The fields are now visible and can be mapped to other objects in the dataflow.
Apache Parquet is a column storage file format used by Hadoop systems such as Pig, Spark, and Hive. The file format is language-independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension of .parquet.
The key features of Parquet with respect to Astera Data Stack are:
It offers the option of compression with a lesser size post-compression.
It encodes the data.
It stores data in a column layout.
In Astera Data Stack, you can use a Parquet file in which the cardinality of the data is maintained, i.e., all columns must have the same number of fields.
Note: There should only be one row for each data field.
Drag and drop the Parquet File Source from the Sources section of the Toolbox onto the dataflow designer.
Right-click on the Parquet File Source object and select Properties from the context menu.
This will open a new window.
Let’s have a look at the options present here.
File Location
File Path: This is where you will provide the path to the .parquet file.
Data Load Option
If you wish to control memory consumption and increase read time, then the Data Load option can be used.
Batch Size: This is where the size of each batch is defined.
Advanced File Processing: String Processing
Treat empty string as null value: Checking this will give a null value on every empty string.
Trim strings: Checking this box will trim the strings.
Once done, click Next and you will be led to the Layout Builder screen.
The layout will be automatically built. Otherwise, you can build it using the Build Layout from layout spec option at the top of the screen.
Once done, click Next and you will be taken to the Config Parameters screen.
This allows you to further configure and define dynamic parameters for the Parquet source file.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next and you will be taken to the General Options screen.
Here, you can add any comments that you wish to add.
Click OK and the Parquet File Source object will be configured.
You can now map these fields to other objects as part of the dataflow.
Integer
Time/Timestamp
Date
String
Float
Real
Decimal
Double
Byte Array
Guid
Base64
Integer96
Image
Hierarchy is not supported.
This concludes our discussion on the definition and configuration of the Parquet File Source object in Astera Data Stack.
Report Model extracts data from an unstructured file into a structured file format using an extraction logic. It can be used through the Report Source object inside dataflows in order to leverage the advanced transformation features in Astera Data Stack.
In this section, we will cover how to get the Report Source object onto the dataflow designer from the Toolbox.
To get a Report Source object from the Toolbox, go to Toolbox > Sources > Report Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Report Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
To configure the Report Source object, right-click on its header and select Properties from the context menu.
A configuration window for Report Source will open.
First, provide the File Path of the unstructured file (your report) for which you have created a Report Model.
Then, specify the File Path for the associated Report Model.
Click OK, and the fields added in the extraction model will appear inside the Report Source object with a sub-node, Items_Info, in our case.
Right-click on the Report Source object’s header and select Preview Output from the context menu.
A Data Preview window will open and shows you the data extracted through the Report Model.
Astera Data Stack gives the user the ability to use a MongoDB Source as part of the ETL pipeline. MongoDB is a fully cloud-based application data platform.
It is also a NoSQL platform that provides a mechanism of storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
It can be configured in accordance with the user application in Astera.
To start, drag and drop the MongoDB Source object from the Sources section of the toolbox onto the dataflow.
To configure the MongoDB Source object, right-click on it and select Properties from the context menu.
This will open the Properties screen.
User Name: The name of the user connecting to the MongoDB cluster.
Password: The password of the user connecting to the MongoDB cluster.
Primary Server Name: The address of the primary server cluster for connection.
Database: The database to be selected from the MongoDB server.
Authentication Database: The database used for authentication.
Enable set of replica: Allow the server to access the secondary cluster in case of unavailability of the primary server.
Use TLS: Check this option if you are using TLS authentication.
Secondary Server Name: The address of the secondary server cluster for connection.
Read Preference –
This drop-down menu allows the user to select which server to be given preference to while reading the data.
Primary: Means that data will only be read from the primary server.
Secondary: Means that data will only be fetched from the secondary server.
Primary Preferred: Means that preference will be given to the primary server but in case of its unavailability, it will fetch data from secondary server.
Secondary Preferred: Means that preference will be given to the secondary server but in case of its unavailability, it will fetch data from the primary server.
Nearest: Means that the preference will be given to the server closest to the connection in region and IP.
Note: You can also select some advanced connection info through the option next to Test Connection.
Once the credentials have been filled, you can test the connection by selecting Test Connection.
Once done, click Next and you will be led to the MongoDB Collection screen.
Here, you can pick a collection that you wish to fetch the data from using the Pick Collection drop-down menu.
Once the collection is selected, the layout will be built.
There are three ways to generate the layout,
Astera auto generates the layout with respect to the first 100 records by default.
The user can provide a JSON schema and Astera will generate the layout.
The user can manually create the layout.
Once the layout has been built, click Next and you will be led to the MongoDB Filter screen.
Here, you can provide a query to filter out your records based on some criteria.
Note: Functions cannot be applied to MongoDB fields in the filter screen and we cannot filter records based on a criterion dependent upon array type fields.
Click Next and you will be taken to the MongoDB SortBy screen.
Here, you can set the limit to fetch the specified number of records or provide a number to skip the first ‘n’ number of records.
You can also sort your data based on a single field or a collection of fields.
Note: We cannot sort the data based on array-type fields.
Click Next and you will be taken to the Config Parameters screen.
Parameters can provide deployment of flows by eliminating hardcoded values and can also provide a dynamic way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next and you will be faced with the General Options screen.
Here, you can add any comments that you wish to add.
Click OK and the MongoDB Source object will be configured.
The source data can now be further used in an ETL pipeline with transformation or destination objects.
This concludes our discussion on the MongoDB Source object in Astera Data Stack.
Note: To open the source file for editing in a new tab, click icon next to the File Path input, and select Edit File.
To generate the schema, click icon next to the Schema File Path input, and select Generate.
To edit an existing schema, click icon next to the Schema File Path input, and select Edit File. The schema will open for editing in a new tab.
Note: Astera makes it possible to generate an XSD file from the layout of the selected source XML file. This feature is useful when you don’t have the XSD file available. Note that all fields are assigned the data type of String in the generated schema. To use this feature, expand the control and select Generate.
Attribute | Description |
---|---|
Layout Type | Attributes Available |
---|---|
To open the Data Formats window, click icon located in the Toolbar at the top of the designer.
Format | Sample Value |
---|---|
Format | Sample Value |
---|---|
Format | Sample Value |
---|---|
Format | Sample Value |
---|---|
Format specifier | Name | Description |
---|---|---|
In this case, we will use data from a Customers .
After configuring the settings for the Data Profile object, click on the Start Dataflow icon from the toolbar located at the top of the window.
Here, we will set rules or the data quality criteria. Click this button to add a new rule.
Or you can click this button to enter the Expression Builder window where you can choose an expression from Astera's library of built-in expressions, or you can write one of your own.
You can add as many rules as you want by clicking this button and similarly, you can also delete a rule by pointing it in the grid and then right-click > Delete. In this example, we will work with a single rule which has been set, so let’s go ahead and click OK.
The records that fail to match the data quality rule can be written and stored in a separate error log. Click to learn how you can store erroneous records using a Record Level Log object.
You can also delete the data fields in the destination file by using the Layout Builder. Or map only the relevant fields onto the nodes of the destination object. You can refer to this to learn more about the Delimited Destination object.
Column Name
Description
Alternate Header
Assigns an alternate header value to the field.
Data Type
Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean.
Allows Null
Controls whether the field allows blank or NULL values in it.
Output
The output checkbox allows you to choose whether or not you want to enable data from a particular field to flow through further in the dataflow pipeline.
Calculation
Defines functions through expressions for any field in your data.
Name
The system pre-fills this item for you based on the field header. Field names do not allow spaces. Field names are used to refer to the fields in the Expression Builder or tools where a field is used in a calculation formula.
Header
Represents the field name specified in the header row of the file. Field headers may contain spaces.
Data Type
Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean.
Format
Specifies the format of the values stored in that field, depending on the field’s data type. For example, for dates you can choose between DD-MM-YY, YYYY-MM-DD, or other available formats.
Start Position
Specifies the position of the field’s first character relative to the beginning of the record. Note: This option is only available for fixed length layout type.
Length
Specifies the maximum number of characters allotted for a value in the field. The actual value may be shorter than what is allowed by the Length attribute. Note: This option is only available for fixed length and database layout types.
Column Name
Specifies the column name of the database table. Note: This option is only available in database layout.
DB Type
Specifies the database specific data type that the system assigns to the field based on the field's data. Each database (Oracle, SQL, Sybase, etc) has its own DB types. For example, Long is only available in Oracle for data type string. Note: This option is only available in database layout.
Decimal Places
Specifies the number of decimal places for a data type specified as real. Note: This option is only available in database layout.
Allows Null
Controls whether the field allows blank or NULL values in it.
Default Value
Specifies the value that is assigned to the field in any one of the following cases:- The source field does not have a value - The field is not found in the source layout- The destination field is not mapped to a source field. Note: This option is only available in destination layout.
Sequence
Represents the column order in the source file. You can change the column order of the data being imported by simply changing the number in the sequence field. The other fields in the layout will then be reordered accordingly.
Description
Contains information about the field to help you remember its purpose.
Alignment
Specifies the positioning of the field’s value relative to the start position of the field. Available alignment modes are LEFT, CENTER, and RIGHT. Note: This option is only available for fixed length layout type.
Primary Key
Denotes the primary key field (or part of a composite primary key) for the table. Note: This option is only available in database layout.
System Generated
Indicates that the field will be automatically assigned an increasing Integer number during the transfer. Note: This option is only available in database layout.
Source Delimited file and Excel worksheet
Name, Header, Data type, Format
Source Fixed Length file
Name, Header, Data type, Format, Start position, Length
Source Database Table and SQL query
Column name, Name, Data type, DB type, Length, Decimal places, Allows null
Destination Delimited file and Excel worksheet
Name, Header, Data type, Format, Allows null, Default value
Destination Fixed Length file
Sequence, Name, Header, Description, Data type, Format, Start position, Length, Allows null, Default value, Alignment
Destination Database Table
Column name, Name, Data type, DB type, Length, Decimal places, Allows null, Primary key, System generated
dd-MMM-yyyy
12-Apr-2008
yyyy-MM-dd
2008-04-12
dd-MM-yy
12-04-08
MM-dd-yyyy
04-12-2008
MM/dd/yyyy
04/12/2008
MM/dd/yy
04/12/08
dd-MMM-yy
12-Apr-08
M
April 12
D
12 April 2008
mm-dd-yyyy hh:mm:ss tt
04-12-2008 11:04:53 PM
M/d/yyyy hh:mm:ss tt
4/12/2008 11:04:53 PM
Y/N
Y/N
1/0
1/0
T/F
T/F
True/False
True/False
######
123456
####
1234
####;0;(####)
-1234
.##%;0;(.##%)
123456789000%
.##%;(.##%)
1234567800%
$###,###,###,###
$1,234,567,890,000
$###,###,###,##0
$1,234,567,890,000
###,###
123450
#,#
1,000
##.00
35
###,###.##
12,345.67
##.##
12.34
$###,###,###,###
$1,234,567,890,000
$###,###,###,##0
$1,234,567,890,000
.##%;(.##%);
.1234567800%
.##%;0;(.##%)
.12345678900%
0
Zero placeholder
If the value being formatted has a digit in the position where the '0' appears in the format string, then that digit is copied to the result string; otherwise, a '0' appears in the result string. The position of the leftmost '0' before the decimal point and the rightmost '0' after the decimal point determines the range of digits that are always present in the result string. The "00" specifier causes the value to be rounded to the nearest digit preceding the decimal, where rounding away from zero is always used. For example, formatting 34.5 with "00" would result in the value 35.
#
Digit placeholder
If the value being formatted has a digit in the position where the '#' appears in the format string, then that digit is copied to the result string. Otherwise, nothing is stored in that position in the result string. Note that this specifier never displays the '0' character if it is not a significant digit, even if '0' is the only digit in the string. It will display the '0' character if it is a significant digit in the number being displayed. The "##" format string causes the value to be rounded to the nearest digit preceding the decimal, where rounding away from zero is always used. For example, formatting 34.5 with "##" would result in the value 35.
.
Decimal Point
The first '.' character in the format string determines the location of the decimal separator in the formatted value; any additional '.' characters are ignored.
,
Thousand separator and number scaling
The ',' character serves as both a thousand separator specifier and a number scaling specifier. Thousand separator specifier: If one or more ',' characters is specified between two digit placeholders (0 or #) that format the integral digits of a number, a group separator character is inserted between each number group in the integral part of the output. Number scaling specifier: If one or more ',' characters is specified immediately to the left of the explicit or implicit decimal point, the number to be formatted is divided by 1000 each time a number scaling specifier occurs. For example, if the string "0,," is used to format the number 100 million, the output is "100".
%
Percentage placeholder
The presence of a '%' character in a format string causes a number to be multiplied by 100 before it is formatted. The appropriate symbol is inserted in the number itself at the location where the '%' appears in the format string.
E0E+0E-0e0e+0e-0
Scientific notation
If any of the strings "E", "E+", "E-", "e", "e+", or "e-" are present in the format string and are followed immediately by at least one '0' character, then the number is formatted using scientific notation with an 'E' or 'e' inserted between the number and the exponent. The number of '0' characters following the scientific notation indicator determines the minimum number of digits to output for the exponent. The "E+" and "e+" formats indicate that a sign character (plus or minus) should always precede the exponent. The "E", "E-", "e", or "e-" formats indicate that a sign character should only precede negative exponents.
'ABC'"ABC"
Literal string
Characters enclosed in single or double quotes are copied to the result string, and do not affect formatting.
;
Section separator
The ';' character is used to separate sections for positive, negative, and zero numbers in the format string. If there are two sections in the custom format string, the leftmost section defines the formatting of positive and zero numbers, while the rightmost section defines the formatting of negative numbers. If there are three sections, the leftmost section defines the formatting of positive numbers, the middle section defines the formatting of zero numbers, and the rightmost section defines the formatting of negative numbers.
Other
All other characters
Any other character is copied to the result string, and does not affect formatting.
Column Name
Description
Data Type
Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean.
Start Position
Specifies the position from where that column/field starts.
Length
Defines the length of a column/field.
Alignment
Specifies the alignment of the values in a column/field. The options provided are right, left, and center.
Allows Null
Controls whether the field allows blank or NULL values in it.
Expressions
Defines functions through expressions for any field in your data.
Transformations are used to perform a variety of operations on data as it moves through the dataflow pipeline. Astera Data Stack provides a full complement of built-in transformation enabling you to build sophisticated data maps. Astera Data Stack transformations are divided into two types— record level and set level.
Record level transformations are used to create derived values by applying a lookup, function, or expression to fields from a single record. Example of record level transformations include lookups, expression, and function transformations.
Set level transformations, on the other hand, operate on a group of records and may result in joining, reordering, elimination, or aggregation of records. Set transformations include join, sort, route, distinct, etc. Data sources and destinations are also considered set transformations.
In Astera, data records flow between set transformations. Record level transformations are used to transform or augment individual fields during these movements.
A record level transformation can be connected to only one set transformation. For instance, a lookup or expression cannot receive input from two different set transformations.
Other than transformations that enable combining multiple data sources—such as join, merge, and union—transformations can receive input from only one set transformation. Transformations, however, can receive input from any number of record level transformations as long as all these record level transformations receive input from the same transformation.
The Expression transformation object in Astera defines an expression or logic to process an incoming value (or values). As a result, it may return a new set of values which do not depend on any user-provided input data. Expressions can also be modified and used as variables for various other calculations.
The Expression Transformation object uses an expression as a logic to transform data. You can write an expression of your own or use different functions or operations from Astera’s extensive library of built-in functions, such as string manipulations, data conversion, date and time manipulation, etc. You can also perform various operations such as mathematical calculations and comparisons, etc. using the Expression transformation object.
In this example, we have a sample dataset, Customers, stored in an Excel file. The address information in this source is split into multiple fields such as Address, Region, Country, and PostalCode. We want to concatenate the information in these fields and return it as full address in a new data field. For this, we will use Expression transformation object.
Retrieve your data in the dataflow by using one of the source objects from the Sources section in the Toolbox. In this example, we will work with an Excel Workbook Source.
Drag-and-drop the Expression transformation object from Toolbox>Transformations>Expression onto the designer.
Map the fields to be concatenated from the source object to the Expression transformation object. In this example, we have mapped the Address, Region, City, Country and PostalCode fields.
Now right-click on the Expression transformation object and select Properties from the context menu.
This will open the Layout Builder window where you can add or remove fields and modify your layout.
These are the following options on the Layout Builder window:
Name: This is where the field name is specified. You can change the name of existing fields if required.
Data Type: Specifies data type of the mapped fields.
Input: When checked, the field will be mapped as an input, with an input mapping port, to take data input from a source.
Output: When checked, the field will be mapped as an output. If an expression is present, the expression will be applied to this output.
Variable: Turns the field into a variable which can then be applied to other fields. These expressions are calculated first and then assigned to other expressions using it. Once a field turns into a Variable, it cannot be assigned for input or output mapping.
Expression: This is where the expression used for modifying the field or group of fields is specified.
Since we want to write the address details from multiple fields into a single field, let’s create a new field named Full_Address, and specify the Data Type as String and check the Output option.
You will find the following options in the Expression Builder window:
Functions: An extensive library of built-in functions from where you can select any function according to your requirement.
Expressions: Here, you can write an expression rule or choose one from the built-in functions in Astera.
Objects: In this panel, you can find all the fields in your layout listed under the Expression node. You can double click on any field name to map it to your expression.
In this example, we can either use a concatenate function from the built-in functions or write an expression of our own to return the complete address information in a single field.
Address + ‘ ‘ + Region + ‘ ‘ + City + ‘ ‘ + Country + ‘ ‘ + PostalCode
Note: Once you’ve built the expression, click on the Compile button to check whether or not the expression was compiled successfully. If the Compile Status is ‘Successful’, click OK.
You can now see your expression appear in the Expression field. Click Next.
A General Options screen will now open where you can add Comments and specify other General Options. Once you’re through with these general settings, click OK.
General Options screen: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example, a destination file that will capture record messages or a log that will capture messages and collect statistics as well.
In the Comments input section, you can write comments related to the object.
To preview the output, right-click on the Expression transformation object and select Preview Output from the context menu.
Here’s a preview of the concatenated output:
You can now map your output to a destination or some other transformation in the dataflow . In this example, we will be writing our output to a Delimited File destination.
You may rename your Destination object from the context menu options for this object. Here, we will rename it as Full_Address.
This concludes using the Expression transformation object in Astera.
The Aggregate transformation object provides the functionality to create aggregations of your dataset, using aggregate functions such as Sum, Count, First, Last, Min, Max, Average, Var or Standard Deviation. The dataset can be split into groups so that the aggregate value(s) can be generated for the group instead of the whole dataset. For example, calculate product count by month of year, or get average sales price by region and year.
Aggregate Transformation can be applied to unsorted data or data sorted on group by values. When applied to an input stream that is sorted on group by fields, Aggregate Transformation performs substantially better and consumes very little memory. Alternatively, when applied to unsorted datasets, Aggregate Transformation may consume substantial memory resources for large data sets and may slow down the performance of the server.
In this scenario, we have products data stored in a csv file. The source file contains information such as ProductID, Supplier ID, UnitPrice of the various products, QuantityPerUnit of products available etc. We want to derive the following information from our source data:
Number of products per category
Total price of all the products per category
Minimum price per category
Maximum price per category
We will use the Aggregate Transformation object to derive the required information.
From the sources section in the Toolbox, drag-and-drop a Delimited File Source object to the dataflow designer.
To work with the Aggregate Transformation, drag-and-drop the Aggregate Transformation object from Toolbox > Transformations > Aggregate.
Right-click on the transformation object and select Properties. The Layout builder window will now open.
Here, you can write names of fields that you want to map to the transformation object in the Name column and specify the relevant Aggregate Functions for them.
For this case:
CategoryID: We will select the Group-By option from the Aggregate Function drop-down list for this field as we want to group the records based on the available product categories.
ProductID: For this field we will select the Aggregate Function Count, in order to calculate the number of products per category.
UnitPrice: We will map this field thrice.
To calculate TotalPricePerCategory, select the function Sum function.
To calculate MaxPricePerCategory, select the Max function.
To calculate MinPricePerCategory, select the Min function.
Click on Next. The Aggregate Transformation Properties window will now open.
There are three sorting options in Aggregate transformation:
Incoming data is pre-sorted on group by fields: This option Aggregate requires data to be sorted by the specified Group-By field.
Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
Click on Next. The Config Parameters window will now open, where you can further configure and define parameters for the Aggregate transformation.
7. Click Next. This is the General Options window. Click OK.
General Options Window:
This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages
When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors
When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.
The Comments input allows you to enter comments associated with this object.
After you have configured the properties, click OK.
You will see the fields in the object that were added in the Layout Builder window.
10. Map the data fields from the source object to the transformation object. You can auto-map the entire dataset from the source to the transformation object, or only map selected fields that you want to work with. In this case, we will map CategoryID, ProductID and UnitPrice as those are the fields we want to find aggregations for*.*
Note: UnitPrice field has been mapped three times as these will determine TotalPricePerCategory, MaximumPricePerCategory and *MinimumPriceperCategory.
Right-click on the Aggregate transformation object and click Preview Output.
You will see that the specified Aggregate Functions have been applied.
The Distinct transformation object in Astera removes duplicate records from the incoming dataset. You can use all fields in the layout to identify duplicate records, or specify a subset of fields, also called key fields, whose combination of values will be used to filter out duplicates.
Consider a scenario where we have data coming in from an Excel Workbook Source and the dataset contains duplicate records. We want to filter out all the duplicate records from our source data and create a new dataset with distinct records from our source data. We can do this by using the Distinct transformation object in Astera Data Stack. To achieve this, we will specify data fields with duplicate records as Key Values.
In order to add a separate node for duplicate records inside the Distinct transformation object, we will check the option: Add Duplicate Records. Then we will map both distinct and duplicate outputs to a Delimited File Destination.
Let’s see how to do that.
Drag-and-drop an Excel Workbook Source from the Toolbox to the dataflow as our source data is stored in an Excel file.
To apply the Distinct transformation to your source data, drag-and-drop the Distinct transformation object from the Transformations section in the Toolbox. Map the fields from the source object by dragging the top node of the ExcelSource and to the top node of the Distinct transformation object. To do this, go to Toolbox>Transformations>Distinct.
Now, right-click on the Distinct transformation object and select Properties. This will open the Layout Builder window where you can modify fields (add or remove fields) and the object layout.
Click Next. The Distinct Transformation Properties window will now open.
Data Ordering:
Data is Presorted on Key Fields: Select this option if the incoming data is already sorted based on defined key fields.
Sort Incoming Data: Select this option if your source data is unsorted and you want to sort it.
Work with Unsorted Data: When this option is selected, the Distinct transformation object will work with unsorted data.
On this window, the distinct function can be applied on the fields containing duplicate records by adding them under Key Field.
Note: In this case, we will specify the Name and Type fields as Key Fields
You can now write the Distinct output to a destination object. In this case, we will write our output into a Delimited destination object.
Right-click on Delimited Destination object and click Preview Output.
Your output will look like this:
To add duplicate records in your dataset check the Add Duplicates Output option in the Distinct Transformation Properties window.
When you check this option, three output nodes would be added in the Distinct transformation object.
Input
Output_Distinct
Output_Duplicate
Note: When you check the Add Duplicate Records option, mappings from the source object to the Distinct transformation object will be removed.
Now, map the objects by dragging the top node of ExcelSource object to the Input node of the Distinct transformation object.
You can now write the Output_Distinct and Output_Duplicate nodes to two different destination objects. In this case we will write our output into a Delimited destination object.
Distinct output:
Duplicate output:
As evident, the duplicate records have been successfully separated from your source data.
The Passthru Transformation object creates a new dataset based on the elements that were passed to the transformation. This is useful for organizing datasets for better readability and grouping of values that are otherwise calculated over and over again (e.g. a Sequence Generator Transformation).
In this document, we will learn how to use the Passthru Transformation object in Astera.
In this case, we are using an XML/JSON File Source.
The source file contains customers’ information in the parent node and their order and shipping details in the collection/child node.
Preview data by right-clicking on source object’s header and select Preview Output.
A Data Preview window will open, showing you the preview of the hierarchical data.
Now, we want to create a field in the parent node that contains the count of orders that arrived late for each customer and write these records to a destination file. This new field in the parent node will depend on two fields, RequiredDate and ShippedDate, that are already present in the collection/child node.
For this purpose, we will use the Passthru Transformation object.
To get a Passthru Transformation object from the Toolbox,* go to Toolbox > Transformations > Passthru and drag-and-drop the Passthru object onto the designer.
You can see that the dragged transformation object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map source fields to the transformation object by dragging-and-dropping the top node of the source object, SampleData to the top node of transformation object, Passthru.
Now the mapping is done. Let’s configure the Passthru Transformation object.
Right-click on the transformation object’s header and select Properties from the context menu. A configuration window for the Passthru Transformation object will open. The first window is the Layout Builder window. This is where we can create or delete fields, change their name or data types, mark any field as a variable, or attach expressions to any field.
Create a new field under the parent node. Let’s call it LateOrders and mark it as an Output field.
Click Next, and you will be directed to a Layout Modifications window.
Select the LateOrders field. Here you can modify any field by applying expressions to it.
Write the expression that counts the number of late arriving orders. Click “…” , located next to the expression box to open the Expression Builder window where you can make use of Astera’s extensive library of built-in functions and expressions.
Click on the Orders node and a panel (Options for Collection) will appear. These options are only available for collection nodes. This is where we will specify the rule for routing late arriving orders. Select Route Based on Rules and a new section for adding rules will appear on the screen.
Add a new rule by clicking on the Add switch condition icon
Now, write an expression to route late arriving orders and name this rule as “LateArrivals”.
Click OK. Now observe that a new collection node, Orders_LateArrivals, has been added to the Passthru Transformation object.
To preview data, right-click on the header of the transformation object and select Preview Output from the context menu.
A Data Preview window will open. On expanding records, you will get corresponding order details and appended details of late arriving orders.
To store the output for late arriving orders, write it to a destination file.
Right-click on the collection node, Orders_LateArrivals, and go to Write to > Excel Workbook Destination. An Excel Workbook Destination object will be added to the dataflow designer with auto-mapped fields from the collection field.
Configure settings for Excel Workbook Destination object.
Click on the Start Dataflow icon located in the toolbar at the top of the designer window to create the destination file.
Upon clicking this icon, an excel file will successfully be created. You can find the link to this file in the Job progress window.
Merge transformation in Astera is designed to merge data fragments from disparate sources, based on some predefined logic, and present it in a consolidated form to draw actionable insights.
Let’s assume that there is an organization that maintains customers’ data in two different departments – Marketing and Sales. Marketing department stores information in a database table and Sales department maintains an Excel sheet for storing customers information. We want to merge the information from both the sources, so that we have consolidated data.
Drag-and-drop the relevant source objects from the Toolbox to the designer. (Click here to find how to set up sources.)
Note: In this case, the marketing department has the customer information stored in a database, whereas the sales department records customer information in an Excel file. Therefore, we will use a Database Table Source and an Excel Workbook Source as source objects.
The Merge transformation object merges data from a single source only, we will first combine both the records using the Union transformation object. We will then map fields from the data sources to the Union transformation object and add a new field DataSource to keep track of which information is coming from which source.
Drag the Merge transformation object from the transformations section in the Toolbox and drop it on the dataflow designer.
This is what a Merge transformation object looks like:
Map the Union transformation object’s output to the Merge transformation object.
Right-click on the Merge transformation object to set up transformation properties in the Layout Builder window. This is what the Layout Builder window looks like:
In the Layout builder window, specify the Primary Key. This is a common identifier that identifies similar records from various sources and merges the information against these records.
(Since we are consolidating different customer records, we will set up CustomerID as the Primary Key in this case.)
Next, you have to specify the field to be used as Version. If your data is coming from multiple sources, the Version field shows which source the data is coming from in the final merged output. In this case, we will use the Data Source field we added in the Union transformation object as the Version field.
Next, specify the Survivor Type for each field. Survivor Type allows you to choose the survivor values – the values you want to retain from your data sources – for each field. Survivor Types are set as First by default. However, depending on your case, you can choose the Survivor Type from the following options:
Since CustomerID, CompanyName, and ContactName records are common in both the source files (Customers_Marketing and Customers_Sales), we will set the Survivor Type as First for these fields. For the other fields with missing records, the Survivor Type will be set as follows:
Once you have set the Survivor Type, specify Precedence for each field. Precedence is the order in which you want the source data to be assessed. For instance, we have common data fields in both the sources, but different and missing records. We can set appropriate Precedence values to bring data from the desired data source.
Next, you can set a specific Condition, and the Merge transformation will process records based on the criteria specified for a particular field.
(In this case, we have specified ‘IsNotNull’ for Address and Region fields since we want none of these fields to be empty or have missing records.)
Depending on the requirements of the business case, you can add a logical expression in the Expression field to process the incoming data value and transform it into the output according to the logic defined. The Expression field can be used for mathematical and financial calculations, date and time manipulations, comparisons and conversion functions.
Click Next to proceed to the Merge Transformation Properties window. Here, you will see the following three checkboxes:
Case Sensitive – Check if data is to be assessed on a case-sensitive basis
Sort Input – Check if the incoming data is not already sorted
Version Order Descending – Check if you want the data to be sorted in a descending version order
Click Next to proceed to the General Options window. Here, you can add Comments, instructions, or any relevant information about the transformation. This will not change or alter your transformation action in any way.
You may also skip this step by clicking OK in the previous step (on the Merge Transformation window) to close the Transformation Properties window.
To get the output, right-click on the Merge transformation object, and click on Preview Output. You will get the merged records based on your specified transformation properties.
Data Preview before applying Merge transformation:
Data Preview after applying Merge transformation:
Merge transformations can be applied in cases where data is sorted into different records. Centerprise makes it extremely convenient for users to get consolidated data that is stored in different sources, while also allowing them the flexibility to choose how the output should appear, through the various transformation properties.
The List Lookup transformation object is a type of lookup that stores information in the metadata. Which means that your lookup data is stored in the dataflow itself. List Lookup uses a list of values for both the input and output fields. You can use it to look up certain values in your source data and replace them with other desired information. Or you can define a list of values in the lookup grid in properties, and the value is then looked up in the grid when you run your dataflow.
Let’s see how this object functions in Astera.
In this example, we are working with a Customers Fixed-Length File Source that contains customer information for a fictitious organization. The Customers data contains information about customers belonging to different countries. We want to convert the country names in this data into CountryCodes by using the List Lookup transformation object.
Drag-and-drop a Fixed-Length File Source from the Sources section in the Toolbox to the designer window.
To preview the incoming data, right-click on the source object’s header and select Preview Output.
To start working, drag-and-drop the List Lookup object from Toolbox>Transformations>List Lookup.
This is what a List Lookup object looks like:
Map the field from the source dataset you want to look up values for, to the Value field in the List Lookup object.
Now, right-click on the List Lookup object and select Properties from the context menu. The List Lookup Map Properties window will now open.
Here, the first option we have is the Case Sensitive Lookup checkbox which is always checked by default. When this option is checked, the List Lookup will look up values on a case sensitive basis. If you do not want to perform a case sensitive look up, you can uncheck this option.
Next, you can see that there is a table where we can specify the Source Value and the Destination Value. Source Values are the values from your source data, and Destination Values are values which you want to replace with the source values.
For example, if we write the Destination Value as ‘DE’, against the Source Value ‘Germany’, Centerprise will write ‘DE’ in place of ‘Germany’ in the output.
This is one way of specifying the lookup values. However, there can be a lot of source values and typing them manually can be a tedious task. There is a more efficient way of doing this in Centerprise.
If you right-click on the List Lookup object, you can see that there is an option called Fill Lookup List with Unique Input Values.
Selecting this option prepopulates the source values in the Source Value column with unique source values.
Now, all you have to do is type in the Destination Values, that is, the codes corresponding to each country name.
Once you have populated the lookup list, click Next to proceed to the Lookup Options window.
In case, the lookup field does not return any value for a given source value, one of the following options should be selected:
No Message – Will not mark the unmatched source value as an error or warning
Add Error – The List Lookup table will trigger an error for the records that found no match in the lookup field
Add Warning – The List Lookup will generate a warning and return a null value for records from the source that do not have any matches in the lookup table
Additionally, when the value is not found in the lookup list, you can choose from the following options to assign it a value:
Assign Source Value – Will return the original value from the source.
Assign Null – Will return a null value for each unmatched source record.
This Value – You can type in a specific value in the given field, and the List Lookup will return the same value for each unmatched source value.
In this example, we will add an error and return the source value if the lookup value is not found. We will select the Add Error and Assign Source Value options. You can choose your preferred option and click *OK.
Now, if we preview the output, you can see that for each country name from the source table, the List Lookup has returned a corresponding code value.
These CountryCodes will flow through the annotated output port if you want to write your data to a destination.
This is how we can map the lookup values to a target or a transformation in the dataflow using the output port.
This concludes using the List Lookup transformation object in Astera.
You can create many-to-one mappings with the help of a Denormalize transformation object in Astera. Denormalizing, also known as pivoting, allows you to combine a number of records into a single record (simply stating, it brings data from rows to columns). It is useful for reducing the number of tables in the schema, which simplifies querying and possibly improves reading performance.
The TaxInfo source data contains information about TaxType (City Tax, County Tax, State Tax, and Federal Tax), Tax Amount, and SSN (Social Security Number) of taxpayers.
We want to reduce the number of rows and create separate fields for City tax, County tax, State tax, and Federal tax.
Let’s see how we can use the Denormalize transformation object to achieve this.
First, we will use the Sort object to sort our source data based on the key field, SSN in our case.
Drag-and-drop the Denormalize transformation object from the Transformations section in the Toolbox.
Right-click on the Denormalize transformation object and select Properties from the context menu.
Following are the properties available for the Denormalize transformation object:
Layout Builder Window:
The Layout Builder window is used to add and/or remove fields, as well as to select their data type. The fields added in the Layout Builder will show in the Output node inside the object, as well as in all Input nodes corresponding to the number of mapping groups created (see below), with the exception of the key field(s).
Denormalize (Many-to-One) Transformation Properties Window:
Select Keys: Using the Select Keys dropdown, select the field or fields that uniquely identify the record. These keys will be used to match records between the normalized source and the denormalized destination.
Sort Input: Check this option only if values in the matching field (or fields) are not already sorted.
Driver Field Value: Enter the pivot values for your Denormalize transformation object. Using the example below, the pivot values would be City, State, Federal, and County.
Note: Entering Driver Key Values is required prior to mapping the Denormalize object. For each entry in the Driver Field Value grid, a new input mapping group is created in the object box.
General Options Window: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.
The Comments input allows you to enter comments associated with this object.
After you have configured the properties, click OK.
An Input mapping node will be created for each value previously specified in the Driver Field Value grid.
Map the fields and preview the output to view the denormalized data.
Constant Value Transformation returns a single, predefined value for all records in a dataset.
In this example, we have an excel worksheet containing Employees’ data. The data is being written to a database table. The database table contains an additional field that stores department information. We want to pass a constant value to the Department field. To do this, we will use the Constant Value transformation object in Astera. Constant Value transformation will be mapped to the Department field to append the name of the department with the final output.
Drag-and-drop an Excel Workbook source from the Sources section in the Toolbox.
Select a destination object from the Destinations section in the Toolbox. This is where the transformed data will be written and stored. We will use a Database Table Destination.
Map the Employees dataset from the source object to the destination table, Employees_Database.
Note: There is an additional field (Department) in the destination object, but there is no such field in the incoming dataset. To append the Department field to the final output, we will use the Constant Value transformation object.
Now, drag-and-drop the Constant Value transformation object from Toolbox > Transformations > Constant Value.
Right-click on the Constant Value transformation object and select Properties.
The Constant Value Map Properties window will now open. Here you will see a Constant Value section where you can write any value to be appended to your output dataset.
In this case, the value will be ‘Marketing’, to specify the department of the employees in the source dataset.
Click Next. A General Options window will now open. Click OK.
General Options window:
This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages
When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors
When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages and collect statistics as well.
Your transformation object will now look like this:
Now map the Value field from the Constant Value object with to the Department field in the Employees_Database destination object.
Right-click on the Employees_Database destination object and click Preview Output.
Your final output will look like this:
The Department field from our source dataset has been successfully appended with specified records in the final output, through the use of a Constant Value transformation object.
The Reconcile Transformation object in Astera enables users to identify and reconcile new, updated, or deleted information entries within the existing data source. It can be applied in a wide variety of business scenarios that require a user to identify changes in multiple data records and capture them efficiently to drive critical business decisions.
Consider an example where we have a sample data of complaints filed by customers regarding the products and services provided by a company. Assume that source file 1 contains details and status of complaints on January 1st, 2018, and source file 2 contains details and status of complaints on February 1st, 2018. We want to track the progress of the resolved complaints during that one month.
To do so, we will reconcile the information contained in the source data files and capture changes using the Reconcile Transformation object.
Drag-and-drop the Reconcile Transformation object from Toolbox> Transformations> Reconcile on the data flow designer.
This is what a Reconcile Transformation object looks like:
You can see the transformation object contains three child nodes (Output, Input_1, and Input_2) under the parent node, Reconcile.
Expand the input nodes to map fields from the source files.
Map the data fields from the source objects that you want to reconcile to the respective input node in the Reconcile Transformation object.
Right click on the Reconcile Transformation object’s header and select Properties.
This will open the Reconcile Transformation Properties window where you will see the following options:
Case Sensitive – Check this option, if you want to derive a case sensitive output
Sort Input 1 – Check this option, if the incoming data from source 1 is not sorted
Sort Input 2– Check this option, if the incoming data from source 2 is not sorted
You can choose the Reconcile Output Type from the following options:
Side By Side Element With Change Flag – If you want to get values from both sources presented side by side, with a separate column presenting the reconciled output by putting a flag – true, in case of an update, and false if it remains unchanged.
Original Layout – If you want to get the reconciled output for each record and corresponding information in the reconciled field.
Original Layout With Changed Element Collection – Applies when working with hierarchical data, to reconcile the information contained in child nodes.
Once you have selected the preferred Output Type, you can specify the records to be shown in the output by applying the Record filter and Inner Node Filter. You may choose one, multiple, or all of the following options by check marking the box.
Click Next to proceed to the Layout Builder window. Here you will have to specify a Key. This will be the common identifier in both the source files that will identify and reconcile records. In this case, we want to reconcile the progress on complaints made against each complaint_ID; therefore, we will select Complaint_ID as our Key.
Now go to the Survivor Value drop-down list to set the Survivor Value for each data field. Survivor Values are the values from your source datasets which you want to retain in the output.
You may select from the following Survivor Value options:
Second – If you want to derive the output value from the second source
First – If you want to derive the output value from the first source
First If Not Null, Otherwise Second – If you want to output a value from the first source if the record is not null, otherwise from the second source.
Second If not Null, Otherwise First – If you want to output a value from the second source if it is not null, otherwise from the first source.
Higher – If the input values are integers, and you want to choose the higher value
Lower – If the input values are integers, and you want to select the lower value
Expression – If you want to derive the output value based on a formula expression
Note: You will only need to specify the Survivor Value if you want to get the Original Layout or Original Layout With Changed Element Collection as output. The Survivor Value option does not apply if you want to get Side by Side Element with Change Flag as your output, since both of the source values are retained when this option is selected.
Click Next to proceed to the General Options window, then click OK.
General Options window - This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages - When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors - When this option checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect their statistics.
Now, right-click on the Reconcile Transformation object’s header and select Preview Output to get the reconciled output.
You will get one of the following outputs according to the output type selected in the Reconcile Transformation Properties window.
Side by Side Element with Change Flag
Original Layout
Original Layout With Changed Element Collection
Reconcile Transformation objects can be applied in a variety of business cases, particularly those where monitoring the changes in assorted data records is crucial in driving critical business decisions. Here are some of the benefits and uses of the Reconcile Transformation object:
Reconciles data by deriving old and new values for specific fields in the source data
Allows users to choose from various layout options to reconcile changes in the most appropriate way
Works effectively with structured and unstructured (hierarchical) data formats
Offers the flexibility to select the information to be retained through different survivor value options
The Sort Transformation object in Astera is used to sort an incoming data stream. It also provides the option to remove duplicate values from the input.
It is a blocking transformation which means that input records are accumulated until the end of input. Blocking transformations affect the performance of overall dataflow because subsequent steps cannot be executed until all the records have been received and processed by the blocking transformation.
The Sort Transformation uses storage on the server for temporary data during sorting. The server must have enough capacity to store the entire data set and index.
We have retrieved the OrderDetails data from a database table. The dataset contains fields such as OrderID, ProductID, UnitPrice, Quantity, and Discount. This data is unsorted and we want to sort it in the ascending order of UnitPrice.
Drag the Sort Transformation object from the Transformations section in the Toolbox and drop it on the dataflow designer.
Map fields from the source object to the Sort Transformation object.
To configure the properties of the Sort Transformation object, right-click on its header and select Properties from the context menu.
A Layout Builder window will appear.
In this window you can either:
Add Member Objects or Collection Objects to the layout.
Edit the elements of the Sort object. The Layout Builder allows you to add or remove fields in the layout, as well as select their data type. The fields added in the Layout Builder will be added to the Input node inside the object box. Once you’re done making changes to the layout, click Next.
The next window is the Sort Transformation Properties window.
Here, you can specify the sorting criteria. You will see the following options on this screen:
Return Distinct Values Only: Check this option if you want to remove duplicate values from the output.
Treat Null as the Lowest Value: Check this option if you want a null value to be returned first in the ascending sort order, and conversely, have the null value returned last in the descending sort order.
Case Sensitive: Check this option if you require case sensitive comparison for strings.
On the same screen, you need to select the sorting Field from the drop-down list and set the Sort Order as Ascending or Descending.
Note: In this case, the sorting Field is UnitPrice and the Sort Order is Ascending.
The last window is the General Options window. Here you can add Comments or specify some General Options. Once done, click OK and the window will close.
You can now map the Sort Transformation object to a destination and preview the output.
The output now shows the entire source data sorted in the ascending order of UnitPrice.
This is how the Sort Transformation can be used in Astera.
The Subflow transformation object is used to call a Subflow that will run as part of your Dataflow. The Subflow acts like a wrapper for the objects it contains. Subflows can be seen as ‘black boxes’ inside your Dataflow, simplifying and streamlining the Dataflow design, increasing reusability, achieving an easier-to-understand view of your Dataflow, and possibly eliminating the need to know what is going on inside the Subflow so that you can focus on the output it creates. Over time, if the logic inside your Subflow changes, you can modify the Subflow, and the modified Subflow can now be used by the Dataflow calling your Subflow.
Subflows can be nested, meaning that a Subflow can call other Subflows. The output of the Subflow can be fed into downstream objects on your Dataflow, just like the output of any Dataflow object.
To add a Subflow transformation object, drag the Subflow object from the Transformations section in the Toolbox and drop it on to the dataflow designer.
An example of what a Subflow object might look like is shown below.
To configure the properties of a Subflow object after it was added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Subflow Properties window:
Enter the file path of your subflow in the Path input. Using UNC paths is recommended to allow for remote execution of your dataflow on a server.
General Options Window: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object, and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect their statistics.
The Comments input allows you to enter comments associated with this object.
Creating a Subflow is similar to creating a regular Dataflow, because a Subflow is essentially a Dataflow or a sub-Dataflow. The difference between the two, however, is that a Subflow may optionally have an input and output.
The Subflow input makes it possible to feed data into the Subflow from an upstream object on the Dataflow that calls the Subflow. The Subflow’s output is used to send data to the downstream Dataflow object connected to the Subflow.
To create a new Subflow, go to File > New > Subflow on the main menu.
Designing a Subflow is similar to designing a Dataflow. For more information on working with Dataflows, see the Creating Dataflow chapter.
When a Subflow tab is active, the flow Toolbox has an additional group labeled Subflow. This group has two objects that control the input and output properties of your subflow.
Subflow Input object is a connector controlling the input layout of your Subflow. Any data feeding into the Subflow will pass through the Subflow Input when the Subflow is called by a Dataflow or another Subflow.
To add the Subflow Input, drag the Subflow Input object from the Subflow group in the Toolbox and drop it on to the Subflow designer.
To configure the properties of a Subflow Input object after it is added to the Subflow, right-click on it and select Properties from the context menu. The following properties are available:
Layout Builder screen:
Meta Object Builder screen allows you to add or remove fields in the field layout, as well as select their data type. The fields added in Meta Object Builder will show in the SubflowInput1 node inside the object box.
General Options screen:
This screen shares the options common to most objects on the dataflow.
Subflow Output object is a connector controlling the output layout of your subflow. Any data leaving the subflow will pass through the Subflow Output when the subflow is called by a dataflow or another subflow.
To add the subflow output, drag the Subflow Output object from the Subflow group in the Flow toolbox and drop it on to the subflow.
To configure the properties of a Subflow Output object after it was added to the Subflow, right-click on it and select Properties from the context menu. The following properties are available:
Layout Builder window:
The Meta Object Builder window allows you to add or remove fields in the field layout, as well as select their data type. The fields added in the Meta Object Builder will show in the SubflowOutput1 node inside the object box.
General Options window:
This screen shares the options common to most objects on the Dataflow.
Some examples of using Subflows are shown below:
Astera provides an array of source options to read and extract data from. Different source objects can be found in Toolbox > Sources.
Look for the supported sources and data providers
For a detailed overview of different source objects in Astera's dataflows, see
Transformations in dataflows are used to perform a variety of operations on data as it moves through the dataflow pipeline. Astera provides an extensive library of built-in transformations enabling you to cleanse, convert, and transform data as per your business needs. Transformations can be found in Toolbox > Transformations. For a detailed review on transformations, see .
In this article, we will discuss:
How various sources in dataflows can be used as a transformation.
Some common scenarios where you could use a source as a transformation.
While the basic function of source objects in dataflows is to extract data and bring it to the designer for further integration, a source object can also be used as a transformation function.
To use a source object as a transformation, you will need to:
Select the relevant source object from Toolbox > Sources and drag-and-drop it onto the designer.
Right-click on the source header and select Transformation from the context menu.
As soon as the Transformation option is selected from the context menu, the header color of the source object will change from green to purple. This is because, by default, Source objects in Astera's dataflows are indicated by a green header and Transformation objects are indicated by a purple header. Hence, the change in color.
Listed below are the source objects that can be used as a transformation:
Note: Some sources in Astera cannot be used as transformations. These sources are: ADO.Net Metadata Collections, COBOL Source, SQL Query Source, Multi-table Query Source, and FTP List Directory Contents.
Generally, source objects are used as transformations when the source file path is dynamic.
In the next section of the article, we will discuss how to use a Delimited File Source object as a transformation.
A Delimited File Source object can be used as a transformation when it is taking a dynamic file path; therefore, it will have multiple files of the same layout processing in a single dataflow or workflow.
Drag-and-drop the Delimited File Source object onto the designer.
Go to the object’s properties and provide the File Path for the delimited source file.
Once you have provided the File Path and configured the properties of the source object, click OK. Now, right-click on the header and select Transformation from the context menu, to change it to a Transformation object.
The header of the Delimited Source object will change to purple indicating that the source object is now converted into a Transformation object.
The transformed DelimitedSourceTrans object will now have two nodes:
Input node: To map the file path of the folder that contains delimited files that are to be processed.
Output node: On expanding the Output node, you will see the data fields in the delimited source file. Map these fields to other objects in a dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the Delimited source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, point the path to the directory and folder where the delimited files are located.
Map the FullPath field from FileSystem to the DelimitedSource object’s input node (FilePath).
Now our Delimited Source Transformation object is ready. To preview the data, right-click on the DelimitedSourceTrans object and select Preview Output.
Once you select Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed preview format. Click on the +icon right next to the root node of theDelimitedSourceTran object to expand the node and preview your data.
You now have an expanded version of your data:
Root Node: Object Path – DelimitedSourceTran
Sub Node:
Input: Displays the path of the file that is being used as the input for this data.
Output: Displays the fields in the source data.
This is how you use a Delimited File Source as a transformation.
Next, we will see how to use the Excel Workbook Source object as a transformation.
The Excel Workbook Source can be used as a transformation when you have multiple Excel files with the same layout, and want to process them together in a dataflow or workflow.
Drag-and-drop the Excel Workbook Source object onto the designer.
Go to the object’s properties and provide the File Path for the excel source file.
The header of the ExcelSource object will change to purple indicating that the ExcelSource object is now converted to a transformation object.
The transformed ExcelSource object will now have two nodes:
Input node:
FilePath: To map the path of the folder that contains excel files that are to be processed.
Output node: On expanding this node, you will be able to see the data fields in the excel source file. Map these fields to other objects in the dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the Excel source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, provide the path of the directory and folder where the excel files are located.
Map the FullPath field from FileSystem to the ExcelSource object’s Input node (FilePath).
Map the Value field from ConstantValue to the ExcelSource object’s Input node (Worksheet).
Now our Excel Source transformation object is ready. To preview the data, right-click on the ExcelSourceTrans object and select Preview Output.
On selecting Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed preview format. Click on the +icon right next to the root node ExcelSourceTran to expand the node and preview your data.
You will see the following nodes:
Root Node: Object Path – ExcelSourceTran
Sub Node:
Input: Gives the file path of the file that is being used as the input for this data.
Output: Displays the fields in the source data.
This is how you use an Excel Workbook Source as a transformation.
Now we will discuss how to use an XML/JSON File Source as a transformation in Astera.
The XmlJson File Source object can be used as a transformation when you have multiple XML or JSON files with the same layout, and want to process them in a dataflow or a workflow.
Drag-and-drop the XML/JSON File Source object onto the designer.
Go to the object’s properties and provide the File Path for the XML source file and its schema.
The header of the XmlJsonSource object will change to purple indicating the conversion from a source object to a transformation object.
The transformed XmlJsonSource object will now have two nodes:
Input node: To map the file path of the folder that contains XmlJson files that are to be processed.
Output node: Once expand, you will be able to see the data fields that are in the XmlJson source file. You can map these fields to other objects in a dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the XmlJson source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, provide the Path of the directory and folder where the XML/JSON files are located.
Map the FullPath field from FileSystem to XmlJsonSource object’s Input node (FilePath).
Now our XmlJson source transformation object is ready. To preview the data, right-click on the XmlJsonSourceTrans object and select Preview Output.
On selecting Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed form. To expand the data and preview your output, you need to click on the +icon right next to the root node – XmlJsonSourceTran.
You now have an expanded version of your data:
Root Node: Object Path – XmlJsonSourceTran
Sub Node:
Input: Gives the file path of the file that is used as the input for this data.
Output: Displays the fields in the source data.
This is how you use an XmlJson File Source as a transformation.
In the next section of the article, we will discuss how to use Report Source as a transformation in dataflows.
The Report Source object can be used as a transformation when you have multiple report models with the same layout, and process them in a dataflow or a workflow.
Drag-and-drop the Report Source object onto the designer.
Go to the properties and provide the File Path for the report source and its report model.
The header of the ReportSource object will change to purple indicating the conversion from a source object to a transformation object
The transformed ReportSource object will now have two nodes:
Input node: Map the file path of the folder that contains report files that are to be processed.
Output node: When expanded, you will be able to see the data fields that are in the report source file. You can map these fields to other objects in the dataflow through the output mapping ports.
Use the File System Item Source to pass the file path information in the input node of the Report source-transformation object. Drag-and-drop it from the Toolbox > Sources section.
In the File System Source Properties, provide the path of the directory folder where report files are located.
Map the FullPath field from FileSystem to the ReportModel object’s Input node (FilePath).
Now our Report Source Transformation object is ready. To preview the data, right-click on the report source object and select Preview Output.
On selecting Preview output, you will be able to view the data in the Data Preview pane. The data that you see in the above image is in its condensed form. To expand the data and preview your output, you need to click on the +icon right next to the root node – ReportModelTran. And then to further expand the data, click on the +icon right next to the sub node – Output.
You now have an expanded version of your data:
Root Node: Object Path – ReportModelTrans
Sub Node:
Input: Gives the file path of the file that is being used as the input for this data.
Output: On further expansion it will show the fields/data that is there in the report model.
This is how you use the Report Source object as a transformation object.
The Normalize transformation object in Astera Data Stack is used to create one-to-many mappings. It allows users to create multiple records from a single record by transposing the columns in a dataset into rows. In other words, you can take a dataset that has many columns and turn it into one that has many rows.
In this use case, we have a sample Taxpayers Excel dataset that contains information on the types and amounts of taxes paid by taxpayers. This includes taxpayers’ Social Security Number (SSN) and the different types of taxes that they have paid. These types are divided into different fields, such as City, County, State, and Federal, with each column containing the amount paid by each customer for a particular tax type. Our goal is to reduce the number of fields and increase the number of records by specifying the tax type in one consolidated field. To do this, we will use the Normalize object in Astera.
Drag the relevant source object from the Toolbox and drop it onto the designer. In this case, we will select the Excel Workbook Source object from Toolbox > Sources > Excel Workbook Source and configure it so that it reads data from the Taxpayers’ dataset.
To preview the data, right-click on the object header and select Preview Output from the context menu. Here is a look at the dataset:
Drag the Normalize object from Toolbox > Transformations > Normalize and drop it onto the designer.
You will notice that the object header contains one Output node and two Input nodes by default.
Any field mapped as a new member to one Input node will appear in all of the input nodes as well as the Output node. In this case, we will map the SSN field from the source object to an Input node.
Right-click on the header of the Normalize object and select Properties from the context menu.
A dialogue box will appear.
This dialogue box is used to configure the Normalize object.
In the Layout Builder window, create the layout of your normalized dataset by providing field names. In this case, we have already mapped SSN from the source and will create two new fields, one for the TaxAmount and the other for the TaxType.
Once you have created the layout, click Next.
In the Normalize (One to Many) Transformation Properties window, make appropriate selections for the following options:
Number of Mapping Groups: Here, you can specify the number of mapping groups that are required. Increasing this number from 2 will also increase the number of Input nodes in the object. In this case, there are four tax types. Hence, we will increase the number to 4.
Omit Record If this Element is Null: From this drop-down menu, you can select a field from your layout. If an element in this field is null, the entire record containing that null element will be omitted from the dataset. In this case, we will select , the default option, which denotes that this option will not apply to any field.
Once you have made the required selections, click Next.
On the last window, which is the General Options window, you will be provided with an empty text box for Comments. Moreover, you can also select a few options that are common to most objects in Astera.
Clear Incoming Record Messages: When this option is checked, any messages coming in from the preceding object will be cleared.
Do Not Overwrite Default Values with Nulls: When this option is checked, actual values are not overwritten with null values in the output.
In this case, we will leave the options unchecked. Once you are done, click OK.
Now that you have configured the Normalize object, you will notice that new input nodes have been added to the object based on our selection for the Number of Mapping Groups option. Each node contains the layout we specified in the Layout Builder window.
The next step is to make the required mappings from the source object to the Normalize object. These are the mappings needed for this particular use case:
Map SSN from the Excel Workbook Source object to SSN in all four input nodes of the Normalize object.
Map City to TaxAmount in the first input node, County to TaxAmount in the second input node, State to TaxAmount in the third input node, and Federal to TaxAmount in the fourth input node.
Map the City Field Name to TaxType in the first input node, the County Field Name to TaxType in the second input node, the State Field Name to TaxType in the third input node, and the Federal Field Name to TaxType in the fourth input node. To map field names, right-click on the mapping link, hover over Change Map Type, and select Field Name.
Here is what the final dataflow should look like:
Preview the output to have a look at the normalized dataset.
You can map these fields further to other objects in the dataflow using the output node of the Normalize object.
This concludes using the Normalize object in Astera.
The Sequence Generator Transformation object in Astera is used to add sequences of integer values to a dataflow. The sequences can start with any number and have any step, for example, 50, 55, 60, 65 etc.
The Astera Data Stack can either create a sequence instantly at the dataflow’s run-time (this is called in-memory sequence), or it can read sequence control data from a database table as your dataflow is executed.
In the case of in-memory sequences, a sequence always starts at the Start Value provided in the SeqGenerator: Context Information Properties window. In the case of database sequences, the last value used is recorded in the control database, and a new start value is used every time the sequence is invoked.
This makes it possible to generate ever-increasing values for the sequence each time the dataflow runs. In effect, such a sequence is a chain of sequences with non-overlapping values.
Here, we have retrieved data from the an Orders table using a Database Table Source object. We will use the Sequence Generator Transformation object to generate a sequence for the OrderNo field in our source data. Let’s see how this works.
Drag the Sequence Generator Transformation object from the Transformations section in the Toolbox and drop it on to the dataflow designer.
Map the required fields from the source object to a destination object.
To configure the properties of the Sequence Generator Transformation object, right-click on its header and select Properties from the context menu.
This will open the Context Information Properties window.
In this window, you can choose between three different types of sequence generations and specify the Sequence Details.
A description of these three methods is given below:
In Memory: The sequence will be created in memory at the dataflow run-time. The sequence always starts at the specified Start Value in the sequence properties.
Sequence Details:
Start Value – The initial value for the sequence
Step – The increment value
Database Table: The sequence control information for the database table can be managed within Astera through the Manage Sequences option.
Connection: Specify the connection to the database where the sequences will be stored
Sequence: Select the sequence from the list of available sequences in database.
Note: To manage database sequences, go to Menu > Tools > Sequences.
Batch Size: Specifies the minimum number of values to be allocated to the sequence.
Use Memory Sequence during preview: Prevents the user from breaking the sequence cycle during a data preview by substituting with an in-memory sequence, which does not affect (i.e. increase) the database sequence’s current value.
Sequence Object - The sequence control information is read from a special database table coming from SQL Server or Oracle database.
Connection: Specify the connection to the database that stores your sequences.
Sequence: Select the sequence from the list of available sequences.
Use Memory Sequence during previews: Prevents the user from breaking the sequence cycle during a data preview by substituting with an in-memory sequence.
Note: In this case we will use the In-Memory sequence generator option.
Let’s specify the Sequence Details as follows:
Start Value: 0
Step: 1
In the destination object, a new field will be created where the sequence generator value will be mapped.
Note: In this case, the OrderNo field has been created in the Excel Workbook Destination object.
The NextVal field will be mapped to the OrderNo field in the destination object.
You can see the output of the excel destination object in the Data Preview window.
The sequence has been generated in the new field, OrderNo.
This is how the Sequence Generator Transformation is used in Astera.
A Route transformation object invokes one or more paths in the dataflow, in accordance with some decision logic expressed as a set of Rules. Using the Route transformation object, you can create some custom logic involving multiple paths and adapt it to suit your data processing scenario.
For example, a record passing some rule will be written to Destination1, a record passing another rule will be written to Destination2, and a record which fails to pass any rules can still be written to a Destination, and be fed to a downstream transformation object.
To add a Route transformation object, drag the Route object from the Transformations section in the Toolbox, and drop it on the dataflow designer.
An example of what a Route object might look like is shown below:
To configure the properties of a Route object after it was added to the dataflow, right-click on the object’s header, and select Properties from the context menu. The following properties are available:
Layout Builder window:
The Layout Builder window allows you to add or remove fields in the field layout, as well as select their data type. The fields added in the Layout Builder will be added to the Input node inside the object box, as well as in all Rule nodes corresponding to the number of rules created (see below).
Route Transformation Properties window:
The Route Transformation Properties window allows the interface to manage Route rules.
Type a descriptive name for the rule in the Description field.
Click Compile to check for any syntax errors in your rule. The Compile Status should read “Successful” for a successful compilation.
To activate the rule, check the Active checkbox.
Note: Each Route rule you add here will have its own Rule node in the Route object box once you close the Properties window. You can then map it to downstream objects in the dataflow as needed.
Note: All Route transformation objects have a Default node. This node handles records not passing any rules defined in your Route transformation object.
General Options window: This window shares options common to most objects in the dataflow.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then be fed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect their statistics.
The Comments input allows you to enter comments associated with this object.
An example of using the Route transformation object is shown below.
After you have successfully installed Astera client and server applications, open the client application and you will see the Server Connection screen as pictured below.
Enter the Server URI and Port Number to establish the connection.
The server URI will be the IP address of the machine where Astera Integration server is installed.
Server URI: (HTTPS://IP_address)
Note: You can get help of your network administrator to get the IP address of the machine where Astera Integration server is installed. Or you can launch the command prompt and type the command ipconfig to get the IP configuration details for the machine and use that information to provide Server URI.
The default port for the secure connection between the client and the Astera Integration server is 9262.
If you have connected to any server recently, you can automatically connect to that server by selecting that server from the Recently Used drop-down list.
Click Connect after you have filled out the information required.
The client will now connect to the selected server. You should be able to see the server listed in the Server Explorer tree when the client application opens.
To open Server Explorer go to Server > Server Explorer or use the keyboard shortcut Ctrl + Alt + E.
The yellow icon with exclamation mark means that the server is not configured. Before you can start working with the Astera client, you will have to create a repository and configure the server.
Existing Astera customers can upgrade to the latest version of Astera Data Stack by executing an exe. script, which automates the repository update to the latest release. This streamlined approach enhances the efficiency and effectiveness of the upgrade process, ensuring a smoother transition for users.
Note: This upgrade applies to v10.0 and later ones. Previous versions cannot be upgraded and will need a clean repository as part of the upgrade.
To start, download and run the latest server and client installers to upgrade the build.
Note: Depending on the build, the user can upgrade any of the respective client and server.
Run the Repository Upgrade Utility to upgrade the repository.
Note: If you do not perform this step, you might encounter an error.
Once run, you will be faced with the following prompt.
Click OK and the repository will be upgraded.
Once done, you will be able to view all jobs, schedules, and deployments that you previously worked with in the Job Monitor, Scheduler, and Deployment windows.
This concludes the working of the Repository Upgrade Utility in Astera Data Stack.
You can either write an expression directly in the Expression box in the Layout Builder window, or you can build an expression in the Expression Builder. To access the Expression Builder, click on option. An Expression Builder window will open.
Drag-and-drop the appropriate source objects and point them towards the files that you want to reconcile. In this example, we will be working with an
Note: In this case we will write the data to an .
Or you can expand the dropdown located in the main toolbar and select Subflow as shown below.
·
·
·
·
Once you have provided the file path and configured the properties of the object, click OK. Now right-click on the header and select Transformation from the context menu, to change it into a transformation object.
Worksheet: This option can be used when you have more than one worksheet in an excel source file and want to use any particular worksheet in the dataflow/workflow. This can be done by specifying the worksheet name using a object which you can find in Toolbox > Transformation > Constant Value.
Once you’ve provided both paths and configured the source object, click OK. Now, right-click on the header and select Transformation from the context menu to change it into a transformation object.
Once you’ve provided both the paths and configured the properties of the object, click OK. Now right-click on the header and select Transformation from the context menu, to change it to a transformation object.
To learn how you can configure an Excel Workbook Source object, click
Note: We have the Orders table as our source from a . We will map the fields OrderDate, RequiredDate, ShippedDate, ShipVia and Freight to an object.
Click theicon to create a new rule.
In the Expression input, enter an expression for the rule. For example, LTV > 60 and LTV <= 80, or any rule or condition you want to apply to your data. Alternatively, you can click on the button to open the Expression Builder window - a tool that allows you to visually build your rule using Record tree and IntelliSense.
Add other Route rules if necessary. To delete an existing Route rule, select it and click the icon.
Survivor Type
Description
First
Returns data from the first data source for that field
Last
Returns data from the last data source for that field
Maximum
Returns the maximum value from all available input data sources
Minimum
Returns the minimum value from all available input data sources
Count
Returns the total count number of all values that exist in the field
Sum
Aggregates the values that exist in that field in all the input sources and then returns the arithmetic sum of those values
Comma Separated Values
Separates the values that exist in that field in all the input sources with a comma and then return that representation. This option is only available when the output field is assigned the 'String' Data Type.
Field
Survivor Type
ContactTitle
First
Address
First
City
First
Region
Last
PostalCode
First
Country
First
Phone
Last
Fax
Last
DataSource
Comma Separated Values
The File Lookup Transformation object in Astera is used to look up values coming from a source. It uses an excel or delimited file which contains the lookup values as well as the output values to perform the lookup. File lookup can be performed based on a single lookup field as well as a combination of fields.
Similarly, a File Lookup Transformation object can return a single output field from a lookup table or a combination of fields. In either case, the output field or fields are returned from the records in which the lookup values match the incoming values.
In this scenario, we have a Customers dataset from a fictitious organization stored in a database source. It contains information of customers from different countries. We want to replace the country names by country codes in our database, by switching them with the lookup values (country codes) stored in an excel file. To achieve this, we will use the File Lookup Transformation object.
Select the relevant source object from the Sources section in the Toolbox. In this example, we will use Customers data stored in a database table.
Right-click on the source object’s header and select Preview Output. You can see the country names in the Country field which we want to convert into country codes.
Drag-and-drop the File Lookup Transformation object from Tooblbox>Transformations>FileLookup onto the dataflow designer.
Now, let’s configure the Transformation object.
Right-click on the header of File Lookup Transformation object and select Properties.
A File Lookup Map Properties window will open where you can see an option for Source File Type.
Source File Type: Here, you need to specify the type of your lookup file.
You can perform the task using an Excel or Delimited lookup file.
Select the Source File Type from the dropdown menu. In this example, our country codes are stored in an Excel file so we will specify the Source File Type as Excel.
Click Next to proceed to the File Lookup Details window. You will see two options:
File Location: Here, you need to specify the File Path to the lookup source file.
Options:
First Row Contains Header: You can check this option if your lookup file contains a header in the first row. Otherwise, you can leave it unchecked.
Worksheet: If your lookup file contains multiple worksheets, you can select the worksheet you want to use to perform the lookup.
Click Next to proceed to the Layout Builder. Here, you can make changes to the object’s layout by modifying the existing fields or creating new fields.
Once you are done, click Next.
On the next window, you will see various Lookup Options.
If Multiple Values Are Found
Multiple Matches Found Option: This option provides the flexibility to choose the output value, if more than one matches are found for a single value in your lookup file. The option expands into a drop-down list where you can select one from the following three options:
Return First: Will return the first matched value found.
Return Last: Will return the last value among all the matched values.
Return All: Will return all the values in the lookup file that match a source value.
If Value Is Not Found In The Lookup List:
In case no lookup value is found against a source value, you can choose one of the following three options to be appended with your output.
No Message: There will be no message and the output will be the same as the input value.
Add Error: An error message will appear with the output.
Add Warning: A warning will appear with the output.
If Value Is Not Found In The Lookup List, Assign Value:
If no lookup value is found against a source value, you can assign an output value of your choice.
Assign Source Value: Will return the source value in the output.
Assign Null: Will assign null to your output value.
This Value: You can select this option and assign any value of your choice.
Click Next. This will take you to the Config Parameters window, where you can further configure and define parameters for the File Lookup Transformation object.
Once you have configured the File Lookup Transformation object, click OK.
Map the Country field from source object to the Country field in the File Lookup Transformation object. Now map the Code field from the transformation object to the Country field in the Database Table Destination object.
This is what your dataflow will look like:
Map the remaining fields from the source object to the destination object.
Right-click on the destination object’s header and select Preview Output.
You can see that the country names in the database table have been successfully converted into country codes.
This concludes using the File Lookup Transformation object in Astera.
The Tree Transform object in Astera enables users to transform data in a hierarchical structure. Users can create new fields in the parent node based on the data in the collection/child node. Tree Transform supports rule-based routing, filtering, merging and sorting of data while maintaining its hierarchy.
In this document, we will learn to use the Tree Transform object in Astera.
In this case, we are using the XML/JSON File Source to extract our source data. You can download this sample data from here.
The source file contains customers’ information in the parent node and their orders and shipping details in the collection/child node.
You can preview this data by right-clicking on source object’s header > Preview Output.
A Data Preview window will open, displaying a preview of the hierarchical source data.
Now, we want to create a new field in the parent node that contains the count of orders that arrived late for each customer and route these records to a destination file while maintaining the hierarchical format of the dataset.
This new field in the parent node will depend on two fields: RequiredDate and ShippedDate, that are already present in the collection/child node.
In other words, we are trying to transform a hierarchical data without flattening its structure. For this purpose, we will use the Tree Transform Transformation object.
To get a Tree Transform object from the Toolbox, go to Toolbox > Transformations > Tree Transform and drag-and-drop the Tree Transform object onto the designer.
The transformation object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map source fields onto the transformation by dragging and dropping the root node of the source object CustomerOrders onto the root node of the transformation object – TreeXform.
Now that the mapping is done, let’s configure the TreeXfrom object.
Right-click on the object’s header and select Properties from the context menu. A configuration window for Tree Transform will open. The first window is the Layout Builder window. This is where we can create or delete fields, change their name or data types, mark any field as a variable or attach expressions to fields.
Create a new field under the parent node. Let’s name it LateOrders.
Click Next, and you will be directed to a Layout Modifications window.
Select the LateOrders field. Here you can modify any field by applying expressions to it.
Write the expression that counts the number of late arriving orders.
Click “…” next to the expression box to open the Expression Builder where you can make use of the extensive library of built-in functions and expressions in Astera.
This creates the field values for LateOrders based on the field values of RequiredDate and ShippedDate fields.
Click on Orders node and a panel (Options for Collection) will appear. These options are only available for collection nodes.
Options for Collection:
Show – Shows collection node in theTree Transform object.
Flatten With Item Count – Flattens data based on the record count in the collection node against each item in the parent node.
Flatten Based on Rules – Flattens a part of hierarchical data based on predefined rules.
Route Based on Rules – Routes and creates subsets of data based on predefined rules.
Merge With Item Count – Merges data based on the record count in the collection node against each item in the parent node.
Hide – Hides collection node from the Tree Transform object.
Calculation Formula – An expression box used for writing rules to route or flatten hierarchical data.
Sort Based On Keys – Sorts hierarchical data based on the field in the collection node. Only available for Show and Flatten With Item Count options.
This is where we will specify the rule for routing late arriving orders. Select Route Based on Rules. A new section for adding rules will appear in the window.
Now, write an expression to route late arriving orders and name this rule “LateArrivals”.
Click OK. Now observe that a new collection node Orders_LateArrivals has been added to the Tree Transform object.
To preview data, right-click on the header of the transformation object and select Preview Output from the context menu.
A Data Preview window will open. On expanding the records, you will get corresponding order details and appended details of late arriving orders.
To store the output for late arriving orders, you can write it to a destination file or use that data further in the dataflow.
This concludes using the Tree Transform transformation object in Astera.
The Tree Join Transformation object in Astera enables users to create complex, hierarchical data structures such as EDI or XML documents with ease. Unlike the standard relational join which combines left and right elements to create a new record, the Tree Join Transformation object allows users to create collection and member nodes. It also enables users to join datasets in parent-child hierarchies using a key field. It is a set level transformation that operates on a group of records.
In this document, we will learn to use the Tree Join Transformation object in Astera.
In this use case we have two different source datasets. The first data contains information about Customers and has fields such as CustomerName, CustomerID, Address, etc.
The second dataset contains details of Orders placed by customers. It includes fields such as OrderID, CustomerID, and order details such as RequiredDate, ShippedDate and other shipping details.
We will join these two datasets using the Tree Join Transformation object and create a hierarchical dataset in which all orders placed by a customer along with the order details are represented in a parent-child hierarchy.
Contrary to the regular Join that joins two datasets in a flat layout, the Tree Join Transformation joins data from two different datasets into a hierarchical data structure.
In this use case, each record from the first dataset that contains Customer details will be a parent node, and under the parent node, the details of Orders placed by that customer will be returned in a child node.
1. To get the Tree Join Transformation object from the Toolbox, go to Toolbox > Transformations > Tree Join and drag-and-drop the Tree Join Transformation object onto the designer.
2. Now map fields from the Customer source dataset to the TreeJoin object.
3. Right-click on the Tree Join object header and go to Properties from the context menu. In the Tree Join Layout Builder window, you can see the fields from the Customer dataset listed under the root node.
4. Next, click on the TreeJoin node, you will see that the small icons or buttons at the top of the screen will become active. If you click on the icon, you will get two options:
Add Member Object – To add a new member node to your layout
Add a Collection Object – To add a new collection node under the parent node. It will return all corresponding records as a collection under the parent node.
In this case we will Add a Member Object to create a separate record for each order placed by a customer, under a separate customer record node.
5. Add a Member Object to this root node. A dialogue box will open to name your member object.
In this case, let’s name it ‘Orders’ and click OK. A member object has been added to our parent node.
6. Click OK, to close the properties window. Now map the Orders dataset to the member node that we created in the previous step to complete the layout.
7. Go to the properties of the TreeJoin object again. We have already created the layout, so we will proceed to the next window.
8. In the TreeJoin Transformation Properties window, we must specify the Join Key.
The join key is a common field or a common identifier in both the datasets which will identify and join records in a tree-like structure. The parent and child fields are the same fields which are common in both the source datasets and serves as a key identifier to join records.
Parent Field – Join field from the first dataset.
Child Field – Same field as the parent field, selected from the second dataset.
In this case, the CustomerID field is common in both the datasets, so we will use it as the join Key.
9. Click on the Parent field dropdown button. Expand the TreeJoin node and select the CustomerID field.
10. Click on the Child field column and expand the TreeJoin root node. Scroll down to your member node, expand this node and select the CustomerID field from the second dataset.
Let’s discuss the other options on the properties window:
Join In Database – Lets you join the tables in the database itself rather than in-memory. However, it applies only when both the tables are sourced from the same database.
Case Sensitive – To process and join records on a case sensitive basis.
11. We have our layout and the TreeJoin properties ready, click OK.
12. Right-click on the TreeJoin object and select Preview Output.
The TreeJoin object has returned the customer records in parent nodes. Upon expanding the node, you can see the order placed by the customer listed as its member unit under the parent node.
If we choose to Add a Collection Object in the Layout Builder, all the records for orders placed by a customer will be returned in a collection under a single parent node for each customer.
13. The joined dataset can now be written to a desired destination. In this case we will write it to an XML File Destination object.
To know more about writing to an XML File Destination object, click here.
This concludes using the Tree Join Transformation object in Astera.
You can download the file for this use case from the following link:
The Union Transformation object in Astera is used to combine incoming data from two or more inputs into a single output. Its functionality is similar to a union operator in SQL query. It has multiple input nodes and a single output node. It puts together two sets of data irrespective of the repetition that might occur in the datasets. In order to perform this transformation on two datasets, their cardinality must be same.
Note: An input node in a union transformation cannot receive data from two different set transformations.
In this example, we have a customers data from two different departments: Sales and Marketing, stored in two separate Excel Workbook Source files. We want to combine this data into a single dataset using a Union Transformation object. To keep track of records coming in from each department, we will also add a new field, Category, in the layout of the Union Transformation object and pass the value using a Variables object.
To work with a Union Transformation object, drag-and-drop the Union Transformation object from Toolbox > Transformations > Union onto the dataflow designer.
Map the Customers_Marketing data to Input_1 and Customers_Sales data to Input_2 in the Union Transformation object.
Now, right-click on the Union Transformation object’s header and select Properties.
The first window is the Layout Builder window, where you can customize your layout or modify your fields. You can also provide a default value to be used in place of null values.
Add a new field, name it Category and specify its Data Type as String.
Click Next.
Next is the Union Transformation Properties window where there are two input nodes defined as Input_1 and Input_2 by default. You can also rename them if you want. You can also define any number of input nodes based on the number of datasets you want to combine using the Union Transformation object.
Click OK.
Now, map the categories of respective departments from the Variables resource object to the Category field in the Union Transformation object. This is done to identify which department a particular record is coming from.
Now, we have successfully configured the Union Transformation object.
Right-click on the Union Transformation object’s header and select Preview Output.
You can see that the Union Transformation has successfully combined the two datasets into a single, unified dataset.
Note: A Union Transformation will show the combined fields from both sources as a resultant, regardless of whether they are present in one or both datasets. In the final output, the dataset which does not have one or more fields will be assigned a null value in that field.
You can now further transform your dataset or write it to a destination.
This concludes working with the Union Transformation object in Astera.
The Data Cleanse Transformation object is a new addition to Astera's library of transformations. It makes it all the more convenient for business users to cleanse raw data and present it in a more refined, standardized, and enterprise-ready format. Using the Data Cleanse Transformation object, users can clean up data from null values and redundant text and characters, and prepare raw data for transformation, validation, profiling, and record matching functions.
Retrieve the data you want to cleanse using the relevant Source object. (Click here to learn more about setting up Sources.)
Now drag the Data Cleanse Transformation object from the Transformations section in the Toolbox and drop it onto the designer.
This is what a Data Cleanse transformation object looks like.
Map data from the source object to the Data Cleanse Transformation object. You can either auto-map the entire data set or map a few fields manually.
Now you have to specify the criteria for data cleansing. Right-click on the Data Cleanse Transformation object and select Properties from the context menu.
This will open a new window where you have to set up the properties for data cleansing. The first window is the Layout Builder window. Here you can customize the layout of your dataset by adding, removing or renaming fields. Once you have created the layout, click Next to proceed to the next window.
This is where you set up the data cleanse criteria for your source data.
You can find various data cleanse options arranged in different sections. Let’s explore them one by one.
The options provided within this category allow you to remove values, spaces, tabs, and line breaks from your data. You can find the following options within this category:
All whitespaces – Removes all whitespaces from the data
Leading and trailing whitespaces – Removes whitespaces preceding and succeeding the values
Tabs and line breaks – Removes tabs and line breaks within source values
Duplicate whitespaces – Removes double spaces from the data
Letters – Removes all alphabets from the data
Digits – Removes all digits from the data
Punctuation – Removes all punctuation from the data
Specified Character – Removes any specific character from the data
As the name suggests, the options within this category allow you to replace null values inside a string or numeric field with a corresponding value – blank in case of a string, and zero, in case of a numeric field.
Null strings with blanks: Replaces all null strings with blanks
Null numerics with zeros: Replaces all null numeric values with zeros
The Find and Replace options enable users to replace a value in the source dataset with another value.
It also provides users the option to choose whether the find and replace function is to be performed on a case sensitive basis. You can select a search mode from three options:
Normal – Will perform a normal find and replace function
As in this example, we want to change the status from ‘Planned’ to ‘Scheduled.’
So, we will type in ‘Planned’ in the Find field and ‘Scheduled’ in the Replace field.
Now, if we look at the output, we can see that the Data Cleanse Transformation Object has found and replaced the status values from ‘Planned’ to ‘Scheduled.’
Extended – Allows you to search for tabs(\t), newline(\r\n), or a character by its value (\o, \x, \b, \d, \t, \n, \r and \) and replace with the desired value
In the example below, we want to replace whitespaces within our source values with a hyphen (-).
So, we will type ‘\s’ in the Find field and ‘-’ in the Replace field.
Now, if we look at the output, we can see that the Data Cleanse Transformation object has found and replaced whitespaces from within the values with a hyphen.
Preview before applying the ‘’Extended’ Find and Replace" function.
Preview after applying the “Extended Find and Replace” function.
Regular Expressions – Allows you to find and replace a value based on a regular expression.
In the example below, we want to replace the “ALFKI” value(s) in the CustomerID field with “A1234”.
For this, we will write a regex in the Find field and the desired value in the Replace field.
Now, if we look at the preview, you can see that Astera has replaced values in the source data with the desired values.
Case options allow users to convert the letter case of source data to Upper, Lower, or Title case.
You can choose from the following options:
None – Keeps the letter case as is.
Upper – Changes all letters to upper case.
Lower – Changes all letters to lower case.
Title – Changes all letters to title case.
The Modify Data option provides you the flexibility and convenience of applying an expression to all fields in your data. Check the Run expression on all fields option to activate this feature.
The Run Expression on all fields feature was previously called ApplyToAll and offered as a standalone transformation in Astera 7.5 and previous releases. It had a limited functionality though, compared to the all-new Data Cleanse Transformation object, which is why it has been replaced altogether with the Data Cleanse Transformation object in Astera 7.6 and now in Astera 8.0.
The Run Expression on all fields feature is enabled by default for any existing flows created prior to Astera 7.6. This means that existing flows created on Astera 7.5 or a prior release will continue to work seamlessly on 8.0 or further upgrade and won’t require any modification at all.
Here, you can choose from the extensive library of built-in expressions and apply it to all mapped fields by adding it to a “$FieldValue” parameter.
As in this example, we have mapped a regular expression to the “$FieldValue” parameter.
Now if we look at the preview, you can see that Astera has applied the regular expression to all fields and removed whitespaces from the values.
Preview before running the expression on all fields:
Preview after running the expression on all fields:
This function was previously performed using the ApplyToAll transformation in Astera 7.5 and previous releases. However, now you can perform this and other data cleanse tasks using the Data Cleanse Transformation object.
The license key provided to you contains information about how many Astera clients can connect to a single server as well as the functionality available to the connected clients.
Note: You cannot use your existing set of keys (from version 6 or 7). If you are planning to migrate from version 7 (or earlier) to version 8, 9 or 10, please contact sales@astera.com, as you will need a new license key.
After you have configured the server, and logged in with the admin credentials, the last step is to insert your license key.
Go to Server > Configure > Step 4: Enter License Key.
On the License Management window, click on Unlock using a key.
Enter the details to unlock Astera – Name, Organization, and Product Key and select Unlock.
You’ll be shown the message that your license has been successfully activated.
Note: The connected client applications will shut down for the server license to take effect.
Your client is now activated. To check your license status, you can go to Tools > Manage Server License.
This opens a window containing information about your license.
This concludes unlocking Astera client and server applications using a single licensing key.
Each destination on the dataflow is represented as a destination object. You can have any number of destinations of the dataflow. Each destination can only receive data from a single source. To feed multiple sources into a destination you need to connect them through a transformation object, for example Merge or Union. For more information on Transformations, see the Creating Transformations article.
The following destination types are supported by the dataflow engine:
Flat File Destinations:
Tree File Destinations:
Database Destinations:
All destinations can be added to the dataflow by grabbing a destination type from the Toolbox and dropping it on the dataflow. File destinations can also be added by dragging-and-dropping a file from an Explorer window while pressing the ‘Shift’ key. Database destinations can be dragged-and-dropped from the Data Source Browser while holding down the ‘Shift’ Key. For more details on adding destinations, see Introducing Dataflows article.
Adding a Delimited File Destination object allows you to write to a delimited file. An example of what a Delimited File Destination object looks like is shown below.
To configure the properties of a Delimited File Destination object after it was added to the dataflow, right-click on its header and select Properties from the context menu.
Adding a Fixed-Length File Destination object allows you to write to a fixed-length file. An example of what a Fixed-Length File Destination object looks like is shown below.
To configure the properties of a Fixed-Length File Destination object after it was added to the dataflow, right-click on it’s header and select Properties from the context menu.
Adding an Excel Workbook Destination object allows you to write to an Excel file. An example of what an Excel Workbook Destination object looks like is shown below.
To configure the properties of an Excel Workbook Destination object after it was added to the dataflow, right-click on its header and select Properties from the context menu.
Adding an XML/JSON File Destination object allows you to write to an XML file. An example of what an XML/JSON File Destination object looks like is shown below.
To configure the properties of an XML/JSON File Destination Object after it was added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
General Properties window:
File Path – Specifies the location of the destination XML file. Using UNC paths is recommended if running the dataflow on a server.
Note: To open an existing destination file for editing in a new tab, click XXX icon next to the File Path input, and select Edit File.
File Options:
Using the Encoding dropdown, select the appropriate encoding scheme for your destination file.
Check the Format XML Output checkbox to have line breaks inserted into the destination XML file for improved readability.
Schema Options:
Read From Schema File Specified Below – Specifies the location of the XSD file controlling the layout of the XML destination file.
Note: You can generate the schema based on the content of the destination XML file if the file already exists. The data types will be assigned based on the destination file’s content. Note that the existing destination file will be overwritten when the dataflow runs.
To generate the schema, click the icon next to the Schema File input, and select Generate.
To edit an existing schema, click the icon next to the Schema File input, and select Edit File. The schema will open for editing in a new tab.
Using the Root Element dropdown, select the node that should be the root of your destination schema. Any nodes up the tree will be excluded.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the destination. Using UNC paths is recommended.
Adding a Database Table Destination object allows you to write to a database table. An example of what a Database Table Destination object looks like is shown below.
Destination Connection screen – Allows you to enter the connection information for your destination, such as Server Name, Database and Schema, as well as credentials for connecting to the selected destination.
Pick Table window:
Database Transaction Management: Enable Transaction Management if you want to wrap your transfer inside a transaction. Depending on your database settings, this can give you performance improvements during the transfer. When Transaction Management is enabled, you should choose between always committing transaction at the end of the transfer, or only committing it if there were no errors. Any errors would result in the entire transaction being rolled back.
Preserve System Generated Key Values: This option is only available if you have assigned at least one field in your destination layout as a System Generated field. If enabled, Astera will pass the incoming value from the source to the system generated field. Otherwise, the incoming source value will be ignored, and the system will write auto-increasing values to the destination System Generated field.
Data Load Options: specify the type of insert of your records into a destination database. The available types are Use Single Record Insert, Bulk Insert with Batch Size, and Bulk Insert with All Records in One Batch.
These types allow you to customize your transfer to balance performance vs. logging needs. Bulk inserts typically result in a better performance (faster transfer for a given number of records), but they also come with less logging, and less ability to undo unwanted inserts should you need to.
Use Single Record Insert: Records are inserted into a destination table one-by-one. Performance is the slowest among the three insert types. However, any errors or warnings during the transfer are displayed to you immediately as the transfer progresses.
Bulk Insert with All Records in One Batch: Typically a quick method of transferring large amounts of data. But keep in mind that, should there be any database specific errors in your transfer, they will not show until the end of the transfer at which time the entire batch were to be written to the destination database.
Note: Not all database providers support this type of insert.
Bulk Insert with Batch Size: A good tradeoff between performance and logging needs. Records are inserted in batches of the specified size. Typically, larger batch sizes result in better transfer speeds; however performance gains may be less with relatively large batch sizes.
Note: Not all database providers support this type of insert.
Note: Bulk insert may not be available if there are certain data types in a destination table. In this case the transfer will proceed as “single insert”.
SQL Statement Destination object offers extra flexibility over database destination objects in applying a custom INSERT or UPDATE SQL code that controls what will be written into the destination table. An example of what an SQL Statement Destination object looks like is shown below.
To configure the properties of an SQL Statement Destination object after it was added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Database Connection window – Allows you to enter the connection information for your SQL Statement, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected database.
SQL Query window: In the SQL Query window, you can enter an SQL expression controlling which fields and records should be written to the destination. The SQL expression should follow standard SQL syntax conventions for the chosen database provider.
For example,
Insert into Orders values (@OrderId, “@OrderName”, “@CreatedDtTm”)
Notice the @ symbol in front of a field name. This makes the field appear in the field list inside the object box so that the field can be mapped. The fields not showing @ symbol in front of them will not show in the list of fields, but they can still receive values according to the logic of the SQL statement itself.
For example,
Insert into Orders (OrderId, OrderName, CreatedDtTm) values (@OrderId, “@OrderName”, “2010/01/01”)
Note: You can optionally use $ parameters inside your SQL expression.
Astera Data Stack introduces an innovative AI Matching feature which leverages the power of Artificial Intelligence to perform intelligent matching. This feature works based on semantic similarity, ensuring more accurate and comprehensive matching results.
In Astera Data Stack, the AI Match object can be found in the Toolbox and can be used within the scope of the Dataflow.
For our use case, we have a Customers dataset from the sales department as shown below:
We want to replace the values in the Country column of the sales dataset by semantically matching them with Country values from the Customers dataset provided by the marketing team, ensuring both departments follow a unified naming standard.
To get started, let’s drag-and-drop an Excel Workbook Source object and configure it with the customers dataset provided by the sales department.
Next, drag-and-drop the AI Match object from the Toolbox onto the Dataflow and auto-map the fields from the Excel Workbook Source onto the AI Match object.
Once all the fields have been mapped, right-click on the AI Match object and select Properties from the context menu.
This will open the Layout Builder screen, which shows the layout of the incoming dataset. Click Next.
The AIMatch Transformation Properties screen will open, let’s configure these properties.
File Path: This is where we provide the path of the file on the basis of which we want to perform our semantic matching.
Worksheet: This is where we can define which Excel sheet data to use if there are multiple sheets.
Lookup Field: This is the field based on which we are performing the lookup.
Incoming Field: This lets us define the lookup field from the incoming dataset
For our use case, let’s select the Country Field for both.
Once done, click OK and right-click on the AI Match object to preview its output.
As you can see below, the values in the Country field have been semantically matched and replaced from the file, using AI. We can also see that, since the country Pakistan did not have a matching field in the marketing dataset, it hasn’t been replaced.
Now, let’s drag-and-drop a Database Table Destination object and map the matched data onto it.
Running this Dataflow will write the data to the destination table.
This concludes the working of the AI Match object in Astera Data Stack.
The SQL Statement Lookup object in Astera is used to look up certain values that are mapped to it from a source object. It uses an SQL statement to access a table that contains the lookup values and their corresponding output values. Once the lookup is performed, the SQL Statement Lookup object returns either a single or multiple output fields, depending on the nature of the lookup table. Similarly, the lookup can be performed based on one lookup field or multiple lookup fields. When the incoming values match the lookup values, the output field or fields for those particular records are returned by the SQL Statement Lookup object.
In this use case, we will read data from the Customers table in the Northwind database using a Database Table Source object. This table contains customer information from a fictitious organization and will serve as the source table. Our purpose is to use an SQL Statement Lookup object to find some information about the orders placed by customers. This data is stored in a separate table called Orders, which will serve as the lookup table.
Drag-and-drop the Database Table Source object from Toolbox > Sources > Database Table Source onto the dataflow designer. Configure the object so that it reads data from the Customers table.
To learn how you can configure a Database Table Source object, click here.
Now, drag-and-drop the SQL Statement Lookup Transformation object from Toolbox > Transformations > SQL Statement Lookup onto the dataflow designer, next to the source object.
Right-click on the header of the SQL Statement Lookup object and select Properties from the context menu.
This will open a new window.
Here, we need to configure the properties of the SQL Statement Lookup object.
In the Database Connection window, enter details for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you wish to connect to. The required credentials will vary according to your chosen provider.
Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A separate window will appear, showing whether your test is successful. When the connection has been successfully established, close it by clicking OK, and then click Next.
The next window will present a blank space for you to write an SQL statement. Here, you can enter any valid SELECT statement or stored procedure to read any table from the database that was specified earlier. This table will serve as the lookup table.
In this case, we will be reading data from the Orders table.
Enter the SQL statement and click OK. This will take you back to the dataflow designer.
As you can see, the SQL Statement Lookup object has been populated with all the fields present in the Orders table.
The next step is to choose an incoming field or multiple incoming fields from the source object, based on which the lookup action will be performed. This field needs to be mapped to the transformation object.
In this case, we can clearly see that CustomerID is a common element between the two tables. Hence, this field will be used to perform the lookup. It will be mapped from the Database Table Source object to the SQL Statement Lookup object as a new member.
Right-click on the transformation object’s header and select Properties to open the Properties window. Keep clicking Next until you reach the Layout Builder window. Here, you can customize the layout by modifying the existing fields or creating new fields.
Once you are done, click Next.
On the next window, you can define one or more lookup conditions. These conditions will determine what values are returned when the lookup is complete.
You will have to make appropriate selections from three drop-down lists:
Database Element Name: This list contains all the elements present in the SQL Lookup object. Select the element that you wish to use as a lookup field. In this case, it is CustomerID.
Operator: This list contains a set of operators that are used to define the condition. In this case, we will be using the ‘equals to’ operator because the lookup value is supposed to match the incoming value.
Input Element: This list contains the elements that have been mapped to the lookup object. In this case, the only input element available is CustomerID from the Customers table.
Once you are done defining the condition, click Next.
The next window will allow you to choose a Lookup Caching Type. The following options are available:
No Caching: No data will be stored in cache. This option is selected by default.
Static: The lookup values are stored in a cache. Once the cache is created, the lookup object will always query the cache instead of the lookup table. When you select this option, the following sub-options are enabled:
Fill Cache With All Lookup Values at Start: Fills the cache with all of the lookup values at the start and continues to use this cache for every lookup.
Cache After First Use: Uses the database table for the first lookup and fills the cache right after it is done. This cache is then used for every subsequent lookup. Checking this option enables another sub-option:
o Cache Commit Count: Defines the number of records collected per cache chunk before they are committed to the cache.
Persistent: Saves the lookup values in a cache file that can be reused for future lookups. When you choose this option, the following sub-options are enabled:
Rebuild Persistent Cache on Next Run: Checking this option will allow the contents of the cache file to be modified after every run.
Cache File Name: Here, you can enter a name for your cache file.
In this case, we will select the No Caching option. Once you are done, click Next.
On the next window, you will see multiple lookup options.
The page provides a set of options for different scenarios that could be faced during a lookup.
If Multiple Values Are Found
Multiple Matches Found Option: This option provides the flexibility to choose the output value if more than one matches are found for a single value in the lookup table. You can select one out of three options that appear in the drop-down list:
Return First: Returns the first matched value.
Return Last: Returns the last value among all matched values.
Return All: Returns all the matched values.
If Value Is Not Found In the Lookup List
If no lookup values are found for a source value, you can choose from the following options to be appended with the output:
No Message: The output value will be the same as the input value and no message will appear with it.
Add Error: An error message will appear with the output.
Add Warning: A warning message will appear with the output.
If Value Is Not Found in the Lookup List, Assign Value
If no lookup value is found for a source value, you can assign an output value of your choice.
Assign Source Value: Returns the source value in the output.
Assign Null: Returns null in the output.
This Value: Allows you to enter any value that will be returned in the output.
In this case, we want to look up the details for all of the orders placed by every customer. Hence, we will select Return All from the drop-down list in the Multiple Matches Found Option. This will automatically disable the rest of the options available on the screen.
Once you are done choosing the option, click Next.
On the next window, you can define certain parameters for the SQL Statement Lookup object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
In this case, we will be leaving them blank. Click Next.
On the last window, you will be provided with a text box to add Comments. The General Options in this window have been disabled.
You are now done configuring the SQL Statement Lookup object. Click OK.
Right-click on the SQL Lookup object’s header and select Preview Output.
You will able to see the following results:
Scroll down the Data Preview window to see the rest of the results.
The SQL Statement Lookup object has successfully returned the details for the orders placed by every customer in the Customers table (Source table) by comparing the CustomerID to its counterpart in the Orders table (lookup table).
This concludes using the SQL Statement Lookup Transformation object in Astera.
The Database Table Destination object in Astera provides the functionality to write data to a database table. This destination option provides a great deal of control over how data is written to a database table with its extended data loading options.
Astera supports a wide range of on-premise and cloud-based databases including SQL Server, Oracle, DB2, Sybase, MySQL, Salesforce, Microsoft Dynamics CRM, and more. Astera delivers highly-optimized implementations for these database connectors including high-performance bulk insert, set-based updates and transaction management. This, combined with Astera’s parallel-processing architecture, delivers industrial-strength performance and scalability.
To add a Database Table Destination object to your dataflow, go to Toolbox > Destinations > Database Table Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Database Table Destination object onto the designer.
If you look at it, the Database Table Destination object is empty right now, as in, it does not have any fields or mappings. This is because the object has not been configured yet. There are two empty sub-nodes for Input and Output under the DatabaseDest root node.
To configure the properties of the Database Table Destination object, right-click on the header and select Properties from the context menu.
This will open a new window, Database Connection, in Astera.
First, you will need to select the relevant data provider from the Data Provider drop-down list.
This is where you select the specific database provider you want to connect to. For instance, if you want to write you data to a SQL database, select SQL Server from the list. The connection details will vary according to the data provider selected.
Test Connection to make sure that your database connection is successful and click Next.
Now, you need to provide details to configure a connection with the destination database.
Enter your User ID and Password. You can also use the Recently Used drop-down list to connect to a recently-connected database.
The next window is the Pick Table window. Here, you can choose from the following options:
Pick Table: To append data into an existing table.
Create/Replace: To write data to a new table or replace an existing table.
Truncate Table: To overwrite data in an existing table.
Choose the option based on your requirement. In this case we will select the Create/Replace Table option and create a new table in the database.
Note: We will be creating a new table CustomerDetails.
For a database destination object to work, data fields should be mapped to the object so that the mapped data can be written to the destination.
Configure the source object and place it onto the designer next to the Database Table Destination object.
Note: In this case a Customers sample data, coming in from an Excel Workbook Source, will be written to a Database Table Destination.
Map the source object to the destination object. Data mapping from source to the destination can be done in the following ways:
By dragging and dropping the parent node from the source object onto the destination object.
By mapping the output port of the source object onto the input port of the destination object.
By right-clicking on the parent node inside the source object and selecting Write to > Database Table Destination from the context menu.
The fields are now mapped.
The Pick Table window has some other configuration options.
Define Input Ports for Mapping
Single Port: Works only for the records that have been updated and won’t treat records individually.
Individual Ports for Actions: Works for all the records individually as per the selected action. The actions that are provided are: Insert, Delete, Update and Upsert.
Database Options
Use constraint based write: When the layout has certain restrictions that you want to keep while writing.
Preserve system generated key values: To generate unique values for the selected primary key in the dataset. This option is only available if you assign at least one field in your destination layout as the System Generated field.
Use transaction
Always commit transaction on completion: When you want the whole transaction to be processed regardless of errors.
Rollback if there are any errors: When you don’t want the dataset to process in case of errors and roll back completely.
Check field lengths for the incoming dataset with the lengths defined in the destination layout.
Check for null values from the incoming dataset.
Write null string as zero length strings where the string values are null, their length will be defined as zero.
Write null numeric values as zeros for the numeric data types, the null values will be defined as zeros.
Disable indexes during load for lengthy processing.
Data Load Options
Bulk insert with batch size when you want the whole dataset to be loaded in batches for the specified size. Typically, larger batch sizes result in better transfer speeds; however performance gains may be less with relatively large batch sizes.
Bulk insert with all records in one batch when you want all the records to be loaded into a table in one batch. In this case, any database specific error in your transfer won’t show until the end of the transfer.
Use single record insert when you want records to be loaded individually. Records are inserted into a destination table one-by-one. This loading option renders the slowest performance among the three insert types. However, any errors or warnings during the transfer are displayed immediately as the transfer progresses.
Bulk Copy Options
Use Internal Transaction: When specified, each batch of the bulk-copy operation will occur within a transaction.
Fire Triggers: When specified, it will cause the server to fire the insert triggers for rows being inserted into the database.
Keep Nulls: Preserves null values in the destination table regardless of the settings for default values. When not specified, null values are replaced by default values where applicable.
Table Lock: Obtain a bulk update lock for the duration of the bulk copy operation. When not specified, row locks are used.
Check Constraints: Check constraints while data is being inserted. By default, constraints are not checked.
Keep Identity: Preserve source identity values. When not specified, identity values are assigned by the destination.
Default: Use the default values for all options.
Parallel Writing is used when you want to expedite the data loading process by increasing the number of writers for that dataset.
Once you have specified your options on this screen, click Next.
The next window you will see is the Layout Builder. Here, the layout of the database destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added, and select the properties for it. A new field will be added to the destination table’s layout.
Note: In this example, we will add a new field AccountType to the layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Note: Here, we are deleting the Fax field from the layout.
Right-click on the highlighted line, a context menu will appear which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down buttons.
Note: Find the Move up/Move down icons on the top left of the builder.
For example: To move the Country field right below the Region field, select the row and click the Move up button in the toolbar at the top, to move the field up from the 9th to the 8th position.
Once the object layout is configured, click Next. This will take you to the Config Parameters window where you can further configure and define parameters for the database destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change during the runtime.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next. A General Options window will appear. Here you have the following options:
Comments can be added.
General Options are given, which relate to the processing of records in the destination file.
Clear Incoming Record Messages for any messages coming in from objects preceding the current object to be cleared.
Do Not Process Records With Errors will not let erroneous records to process further for the output.
Do Not Overwrite Default Values with Nulls makes sure that values are not overwritten with null values in the output.
Click OK.
The DatabaseDest object is now configured according to the settings made in the properties window.
Note: The changes that were made in this case are:
Added a new field AccountType in the layout and mapped it’s value from the Constant Value Transformation object.
Moved the Country field below the field Region field.
The Database Table Destination object is now successfully configured, and the destination file can now be created by running the dataflow.
The job can be traced through the Job Progress window once the job starts running.
The Switch Transformation object matches source data for the criteria specified by the user, and wherever the criteria is met, it replaces the information in the particular field with the desired output (also specified in the layout). This gives users more control over their data and helps them manage it in a better way.
There are two modes in Switch transformation:
Basic Mode
Enhanced Mode
The Basic Mode in the Switch transformation object matches for specific values in the source data and replaces it with the desired output. Enhanced Mode enables users to set lookup criteria by writing expressions and thus makes the feature more flexible.
Select your source by dragging the relevant object from the Sources section in the Toolbox on to the dataflow designer and configure the connection by putting in relevant details.
Note: In this example, we are working with an Excel Workbook Source that contains employee information for a fictitious organization, but you can select the source type from a wide range of options provided in Astera.
After setting up the connection and configuring the source file, drag the Switch transformation object from the Toolbox. If the Toolbox is hidden, go to View > Toolbox > Transformation > Switch.
Map the required fields from the source to the Switch transformation object.
Either double-click on the Switch Transformation object to open the Properties window or right-click on the object and go to Properties from the list.
The first window is the Layout Builder window. Here you can manage the fields (add and/or remove the fields) to make your Switch field layout.
Note: Switch Transformation only allows one output field. If you check the output box next to multiple fields, Astera will show a verification error.
After specifying the layout and selecting the relevant output field, click Next. This will take you to the Switch Map Properties window. At this stage, you can select the mode of the Switch transformation and assign the rules in the Case Value and Output Value sections.
Astera will look for the values specified in the Case Value column in the source file and replace it with the corresponding values in the Output Value column.
Note: Switch transformation only allows one default value and if you select multiple default values, Astera will give verification error.
In this example, the source table contains information about departments in numbers. We will use the Switch transformation object in basic mode, to switch the stored numeric information with the descriptive information.
Data Preview (Before Switch)
Data Preview (After Switch)
Steps 1-5 are going to remain the same even when working with the Enhanced Mode in Astera.
After you have created the layout in the Layout Builder window in the object’s properties, click Next and go to the Switch Map Properties window and select Enhanced Mode.
Note: Switch transformation is not field specific in the enhanced mode therefore, the option for selecting the Switch Field is disabled.
An organization stores information about employees’ salaries. They have set criteria about issuing credit cards, which is dependent on individual’s salary. In this scenario, to see which individual is eligible for which perk, define the salary range in the Case Expression field and specify the corresponding output in the Output Expression section (See the screenshot above). To store the information in a separate field, we created a new field (CreditCard) in the Layout Builder and selected it as the Output.
Data Preview (Before Switch)
Data Preview (After Switch)
The Database Lookup object in Astera is used to look up values from a source. It uses a database table that contains the lookup values as well as a set of corresponding output values. When the lookup is performed, the object returns either a single output field or multiple output fields, depending on the nature of the lookup table. Similarly, the lookup can be performed based on one lookup field or multiple lookup fields. In each case, the output field or fields are returned from the records in which the lookup values match the incoming values.
In this use case, we have a sample Customers dataset that is stored in a database table. Within this dataset, there is a field that contains the country of residence for each customer. We have another database table that contains all of these countries and their corresponding codes. Our goal is to replace the full country names with codes while writing the customer dataset to an Excel file. To do this, we will use a Database Lookup object.
Drag the relevant source object from the Toolbox and drop it onto the designer. In this case, we will select the Database Table Source object from Toolbox > Sources > Database Table Source and configure it so that it reads data from the Customers dataset.
To learn how you can configure a Database Table Source object, click here.
To preview the data, right-click on the object header and select Preview Output from the context menu. Here, you can see that there is a field that contains each customers’ country of residence.
Drag the Database Lookup object from Toolbox > Transformations > Database Lookup and drop it onto the designer.
Right-click on the header of the Database Lookup object and select Properties from the context menu.
This will open a new window on your screen.
Here, you are required to configure the properties for the Database Lookup object.
On the Database Connection window, enter the details for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you wish to connect to. The required credentials will vary according to your chosen provider.
Provide the required credentials. Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A new window will open, showing whether your test is successful or has ended in an error. When the connection has been successfully established, close it by clicking OK, and then click Next.
The next window is the Database Lookup Map Properties window. Here, you can pick a table from the database that you have connected to.
In this case, we will select the table named CountryCodeLookup. This table contains the code for each country and will serve as the lookup table in our use case.
In the text box provided under the Pick Table option, you can enter a where clause to modify the lookup query. In this case, we will leave it empty.
Once you have chosen a table, click Next.
On the next window, you can choose a Lookup Cache Type from the following options:
No Caching: No data will be stored in a cache. This option is selected by default.
Static: The lookup values are stored in a cache. Once the cache is created, the transformation object will always query the cache instead of the lookup table. When you select this option, the following sub-options are enabled:
Fill Cache With All Lookup Values at Start: Fills the cache with all of the lookup values at the start and continues to use this cache for every lookup.
Cache After First Use: Uses the database table for the first lookup and fills the cache right after it is done. This cache is then used for every subsequent lookup. Checking this option enables another sub-option:
Cache Commit Count: Defines the number of records collected per cache chunk before they are committed to the cache.
Dynamic: The lookup values are stored in a temporary cache file, which is deleted once the dataflow has been executed. When you select this option, the following sub-options are enabled:
Fill Cache With All Lookup Values at Start: Fills the cache with all of the lookup values at the start and continues to use this cache for every lookup.
Cache After First Use: Uses the database table for the first lookup and fills the cache right after it is done. This cache is then used for every subsequent lookup. Checking this option enables other sub-options:
Cache Commit Count: Defines the number of records collected per cache chunk before they are committed to the cache.
Cache Key Column: Defines a matching key field to check whether a record already exists in the cache.
Persistent: Saves the lookup values in a cache file that can be reused for future lookups. When you choose this option, the following sub-options are enabled:
Rebuild Persistent Cache on Next Run: Checking this option will allow the contents of the cache file to be modified after every run.
Cache File Name: Here, you can enter a name for your cache file.
In this case, we will select the No Caching option. Once you are done, click Next.
On the Lookup Options window, you can choose between multiple lookup options.
This page provides a set of options for different scenarios that could be faced during a lookup.
If Multiple Values Are Found
Multiple Matches Found Option**:** This option provides the flexibility to choose the output value if more than one matches are found for a single value in the lookup table. You can select one out of the three options that appear in the drop-down list:
Return First: Returns the first matched value.
Return Last: Returns the last value among all matched values.
Return All: Returns all matched values.
If Value Is Not Found In the Lookup List
If no lookup values are found for a source value, you can choose from the following options to be appended with the output:
No Message: The output value will be the same as the input value and no message will appear with it.
Add Error: An error message will appear with the output.
Add Warning: A warning message will appear with the output.
If Value Is Not Found in the Lookup List, Assign Value
If no lookup value is found for a source value, you can assign an output value of your choice.
Assign Source Value: Returns the source value in the output.
Assign Null: Returns null in the output.
This Value: Allows you to enter any value that will be returned in the output.
In this case, there is only one code for each country. Therefore, we will choose Return First from the drop-down list in the Multiple Matches Found Option. Moreover, we will leave the other options at their default selection i.e. No Message under If Value Is Not Found in the Lookup List, and Assign Null under If Value Is Not Found, Assign Value.
Once you are done choosing the options, click Next.
On the Config Parameters window, you can define certain parameters for the Database Lookup object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
In this case, we will leave them blank. Click Next.
On the last window, which is the General Options window, you will be provided with a text box to add Comments. The General Options in this window have been disabled.
You are now done configuring the Database Lookup object. Click OK to close the configuration window.
Expand the Database Lookup object to view the layout of the lookup table. In this case, it contains two fields, Country and Code. The former contains the full name of each country and the latter contains each country’s code.
Map the Country field from the Database Table Source object to its counterpart in the Database Lookup object.
Drag an Excel Workbook Destination object from Toolbox > Destinations > Excel Workbook Destination and drop it onto the designer. Configure the object by providing a name and the path to the directory and folder where you want to save your destination file.
To learn how you can configure an Excel Workbook Destination object, click here.
Auto-map the source dataset to the destination object.
Delete the mapping link between the Country fields in the source and destination. To do this, right-click on the mapping link and select Delete from the context menu.
Map the Code field from the Database Lookup object to the Country field in the destination object. This is what the final dataflow should look like:
Right-click on the destination object’s header and select Preview Output from the context menu.
In the Data Preview window, you will see that each county name has been replaced by its corresponding code.
This concludes using the Database Lookup Transformation object in Astera.
The SQL Statement Destination in Astera offers extra flexibility over other destination objects by providing the option to apply a custom INSERT, UPDATE, or DELETE SQL statements to control what will be written to the destination table. The object can also be used to call stored procedures. Moreover, you can parameterize your SQL statement using the Parameterize Replacement functionality.
In this article, we will be looking at how you can configure and use the SQL Statement Destination object in Astera.
Before moving on to the actual configuration, we will have to get an SQL Statement Destination object from the Toolbox. To do so, go to Toolbox > Sources > SQL Statement Destination. In case you are unable to view the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the SQL Statement Destination object onto the designer.
The destination object is currently empty because we are yet to map any data fields to it.
To configure the SQL Statement Destination object, right-click on its header and select Properties from the context menu. Alternatively, you can double-click the header of the destination object to go to its Properties.
A new window will open when you click on Properties from the context menu.
Here, you need to configure the properties for the SQL Statement Destination object.
On the Database Connection window, enter the details for the database you wish to connect to.
Use the Data Provider drop-down list to specify which database provider you wish to connect to. The required credentials will vary according to your chosen provider.
Provide the required credentials. Alternatively, use the Recently Used drop-down list to connect to a recently connected database.
Test Connection to ensure that you have successfully connected to the database. A separate window will appear, showing whether your test is successful. Close it by clicking OK, and then click Next.
The next window will present a blank page for you to enter an appropriate SQL statement for the required outcome. This can consist of an INSERT, UPDATE, or DELETE statement that manipulates the data being written to the database.
The curly brackets on the right side of the window indicate that the use of parameters is supported, which implies that you can replace a regular value with a parameterized value that can be changed during runtime.
In this use-case, we will be inserting new records into an existing table, named TESTTABLE, that has three columns: OrderID, CustomerID, and EmployeeID.
Notice the @ symbol in front of a field name. This makes the field appear in the field list inside the object box so that the field can be mapped. The fields that do not have a @ symbol in front of them will not show in the list of fields, but they can still receive values according to the logic of the SQL statement itself. String fields need to be surrounded by single quotes, whereas Integer fields do not. In this case, CustomerID is a String field, while OrderID and EmployeeID are Integer fields.
The Database Options given at the bottom of the window provide support for transaction management. Checking the Use Transaction option will enable two other sub-options:
Always commit transaction on completion: Ensures that the job is completed regardless of any erroneous records.
Rollback if there are any errors: Aborts the job in case of one or more erroneous records.
Once you have entered an SQL statement and chosen your desired option, click Next.
On the new Config Parameters window, you can define certain parameters for the SQL Statement Destination object.
These parameters facilitate an easier deployment of flows by excluding hardcoded values and providing a more convenient method of configuration. If left blank, they will assume the default values that were initially assigned to them.
At the end, a General Options window will appear. Here, you are provided with:
A text box to add Comments.
A set of General Options related to the processing of records.
To conclude the configuration, click OK.
For a destination object to work, data fields must be mapped to it from a source. In this case, we will be using an SQL Query Source object to get data from the Orders table in the Northwind database.
Configure the source object and place it next to the SQL Statement Destination object.
Map the required data fields from the source object to the destination object. This can be done in the following ways:
By dragging and dropping the parent node of the source object onto that of the destination object.
By individually dragging and dropping the required fields from the source object onto their respective nodes in the destination object.
To preview the output, right-click on the destination object’s header and select Preview Output from the context menu. In this case, you will see the following result:
You can now write data to the destination table by running the dataflow.
This is how we use the SQL Statement Destination object in Astera.
Astera’s XML/JSON File Destination object provides the functionality to write data to an XML or JSON file when the data is in hierarchical format.
In order to understand how to write to an XML/JSON File Destination object, we will take help of a use-case where we will convert flat datasets to an hierarchical set, and then write the transformed data to an XML file.
Customers and Orders data from database tables will be used as source objects. We will then join them using the TreeJoin Transformation to create a hierarchical dataset.
To get an XML/JSON File Destination object from the Toolbox, go to Toolbox > Destinations > XML/JSON File Destination and drag-and-drop it onto the designer. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
The dragged destination object is empty right now. This is because the object has not been configured yet.
Note: In this case we will be using the use case discussed earlier, taking the Treejoin as the source for data to write to the XML/JSON File Destination.
A schema file is needed to write data to an XML/JSON File Destination. To create the schema file, right click on the source object (Treejoin transformation in this case), a context-menu will appear, select the option Generate XML Schema for layout.
A new dialog box will open where you will be asked to save the XML schema file. Give the File Name and file path for the schema file and click Save.
The schema file has been created in the specified location. To view this file, go to the file location and open the file in Astera.
The opened file would look like the one below:
To configure the XML/JSON File Destination object, right-click on its header and select Properties from the context-menu.
A window, Destination XML File, will open. Here we will specify the file locations; File Path and Schema File for the XmlJsonDest object.
The destination file will now be saved in the provided file location. Click OK, and map the destination object to the source object before further configuration.
The XmlJsonDest object will now have the layout of the source object (Treejoin Transformation in this case).
To map the source object to the destination object, the following ways of mapping can be used:
By dragging and dropping the parent node of the source object (TreeJoin node in the TreeJoin table) onto the child node of the destination object (TreeJoin node in the XmlJsonDest object) for automatic mapping.
By manually mapping the source parent node (TreeJoin in the TreeJoin table) by dragging it to the respective destination child node (TreeJoin in the XmlJsonDest object).
The fields are now mapped.
Once the file locations have been specified and the mappings have been done, further properties can be defined.
XML Layout
The next window after the Destination XML File window is the XML Layout window.
This window shows the XML layout for the XmlJsonDest object. The collection nodes for the object can be seen in this window with their fields.
Config Parameters
Click Next, and a window, Config Parameters, will open, which will allow us to further configure and define parameters for the XML/JSON Destination object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
General Options
Click Next, and a new window, General Options, will open. On this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The XmlJsonDest object has been successfully configured and the destination file can now be created by running the dataflow.
The Parquet File Destination object allows the user to fetch and map data from various kinds of sources that the Parquet format supports. It is used to write data to Parquet files since it can efficiently store large datasets. It can also be used with various transformations.
The Parquet File Destination object in Astera offers compression methods to reduce file size and control memory consumption.
Drag and drop the Parquet File Destination object from the Destinations section of the Toolbox.
Note: We already have an object configured for our use case.
Right-click on the Parquet File Destination object and select Properties from the context menu.
This will open the Properties screen.
Now, let’s look at the options present on this screen.
File Location
File Path: This is where the file path to the destination file is to be defined. It will be created once the dataflow is executed.
Options
Compression Method - You can select a compression method from this drop-down menu.
Snappy: This method offers high speed and reasonable compression.
Gzip: This method allows the reduction of data size at a fast rate.
Append to File (If Exists): This option will append data to the destination if there is a previously existing file present with data.
Write Numeric Nulls As Zero: Checking this box will write all null values as zero.
Write Null Booleans As False: Checking this box will write all Null Boolean values as false.
Once done, click Next and you will be led to the Layout Builder screen.
Here, the layout is going to be mapped for the destination. It can be built from the incoming data source or can be altered by the user.
We will be using our pre-configured Excel Workbook Source to map the incoming data to the Parquet File Destination object.
Open the Layout Builder again and it will be populated.
Click Next and you will be taken to the Config Parameters screen.
Parameters allow the deployment of flows by eliminating hardcoded values and provide a dynamic way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
Click Next and you will be led to the General Options screen.
Here, you can add any comments that you wish to add.
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be processed by the object.
Do Not Overwrite Default Values with Nulls: Selecting this option will make sure that values are not overwritten with null values in the output.
Click OK and the Parquet File Destination object will be configured.
This concludes the configuration of the Parquet File Destination object in Astera.
The Excel Workbook Destination object in Astera provides the functionality to write data to Microsoft Excel workbooks. An important thing to note here is that, it is not necessary to have Microsoft Excel installed on the machine for the excel destination object in Astera to work. The feature gives you the option to specify the worksheet and the starting cell where the data write begins.
To get the object from the Toolbox, go to Toolbox > Destinations > Excel Workbook Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Excel Workbook Destination object onto the designer.
The dragged destination object is empty right now. This is because the data fields are not mapped to it yet. In this case, we will use a simple source to excel destination mapping scenario as an example.
Configure the source object and place it onto the designer next to the Excel Workbook Destination object.
Note: We are using a sample table containing Customers data from an SQL database.
Now map the data fields from the source object to the destination object. Mapping can be done in the following ways:
i. By dragging and dropping the parent node of the source object onto the parent node of the destination object for auto-mapping the layout.
ii. By creating a map from the source parent node to the destination parent node.
iii. By directly writing the fields in the source layout to an Excel Destination through the source context menu of its parent node.
The fields are now mapped.
To configure the Excel Workbook Destination object, right-click on the header, select Properties from the context menu and a dialog box will open.
Provide the File Path. This is where the excel destination file will be saved.
The dialog box has some other configuration options. Let’s go over these options:
Options:
If the First Row Contains Header, check this option to read headers from the source file.
The Worksheet field can be used to specify the name of a worksheet for either overwriting the data in an already existing worksheet or adding a new worksheet.
Choose Append to File (If Exists) to append to an existing file or create a new file. Creating a new file will overwrite any existing file.
Check on Write to Multiple Files for the data to be saved to multiple files instead of one single file. This can be done within a single dataflow through the destination object and supporting transformations.
Once the data reading options have been specified on this screen, click Next.
The next window is the Layout Builder. On this window, the layout of the excel destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added and select the subsequent properties for it. A new field will be added to the source layout.
Note: Adding a new field (Email) to the layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Note: Deleting the Fax field from the layout.
Right-click on the highlighted line, a context menu will appear which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down keys.
Note: Find the Move up/Move down icons on the top left of the builder.
For example: To move the Country field right below the Region field, select the row and use the Move up key to from the 9th row to the 8th row.
The row is now moved from the 9th position to the 8th position.
Once the object layout is configured, click Next. A new window will appear, Config Parameters, which allows to further configure and define parameters for the excel destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
A General Options window will appear. On this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The ExcelDest object is now configured according to the changes that were made in the properties window.
Note: The changes that were made in this case are:
Added a new field, Email in the layout.
Moved the Country field below the Region field.
The Excel Workbook Destination object is successfully configured, and the destination file can now be created by running the dataflow.
Add a new rule by clicking on the Add switch condition icon .
Now click on this button to open the Expression Builder.
Learn more about how the works in Astera.
This section talks about the various database write strategies offered within Astera.
The Field Profile feature captures statistics for selected fields from one or several objects. Field Profile is essentially a transformation object as it provides Input and Output ports similar to other transformations. These output ports make it possible to feed the statistics collected to another object on the dataflow.
In this document, we will learn how to create a Field Profile in Astera.
In this case, we have extracted data from a sample Invoices Database Table Source.
We want to collect detailed statistics from some of these fields of data and write it to a Delimited File Destination. For this purpose, we will use Astera's Field Profile feature.
To get a Field Profile object from the Toolbox, go to Toolbox > Data Profiling > Field Profile. If you are unable to see the toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Field Profile object onto the dataflow designer.
You can see that the dragged Field Profile object contains an Input node and an Output node. The Input node is empty as we have not mapped any fields to it yet.
One-by-one map ShipName, CustomerID, Country, OrderDate, ProductName, UnitPrice, and Quantity from the source object to the Field Profile object’s Input node.
Note: Statistics will be collected only for the fields linked to the Input node of the Field Profile object. This way, you can selectively collect statistics for a subset of fields from the selected field layout.
To configure the Field Profile object, right-click on its header and select Properties from the context menu.
A configuration window will open. The first screen you will see the Layout Builder. This is where we can create or delete fields, change their name, and data type.
Click Next. On the Properties window, specify the Statistics Type from the dropdown list.
Field Statistics dropdown allows you to select detail levels of statistics to collect. Select among the following detail levels:
Basic Statistics: This is the default mode. It captures the most common statistical measures for the field’s data type.
No Statistics: No statistics are captured by the Data Profile.
Detailed Statistics – Case Sensitive Comparison: Additional statistical measures are captured by the Data Profile, for example Mean, Mode, Median etc. using case-sensitive comparison for strings.
Detailed Statistics – Case Insensitive Comparison: Additional statistics are captured by the Data Profile, using case insensitive comparison for strings.
In this case, we will select Detailed Statistics – Case Sensitive Comparison.
Click OK.
Right-click on Field Profile object’s header and select Preview Output from the context menu.
A Data Preview window will open and show you the statistics of each mapped field as a record.
Observe that the Field Profile object contains an Output node. Once expanded, you will see various statistical measures as fields with output ports.
We can write these statistical measures to a destination file.
Right-click on the Output node and go to Write To > Delimited File Destination.
A Delimited File Destination object will be added to the dataflow designer with auto-mapped fields from the Output node.
Configure settings for your Delimited File Destination from here.
A Job Progress window will open at this instant and will show you the trace of the job.
You can open the delimited file that contains field statistics from the link provided in the Job Progress window.
MongoDB is a document-oriented database in which one collection holds different documents. The MongoDB Destination object in Astera Data Stack, provides functionality to write data onto it. This component provides functionality to control how data should be written in collections.
While writing data, the number of fields, content, and size of the document can differ from one document to another. This can be easily catered to by including configurations on write concerns that describe level of acknowledgment from MongoDB for write operations.
MongoDB is mainly used for Big Data.
The MongoDB Destination object can be used to map incoming data from a source to the MongoDB server. MongoDB makes it easier for users to store structured and unstructured data both.
For our use case, we already have an XML/JSON Source object configured in a Dataflow.
To start, drag-and-drop the MongoDB Destination object from the Destinations section of the Toolbox onto the Dataflow.
Right-click on the MongoDB Destination object and select Properties from the context menu.
This will open a new window.
User Name: This is where we enter the user name of the MongoDB or Local Server
Password: The password of the MongoDB or Local Server is used here
Primary Server Name: The name of the primary cluster is used here
Database: This is where we select the database on which we wish to write the data.
Authentication Database: This is the database used for authentication.
Port: The port is used to handle incoming and outgoing requests to the server
Enable Set of Replica: Selecting this checkbox allows the use of a secondary cluster
Secondary Server Name: This name of the secondary cluster is used here
Use TLS: Select this option if the server requires TLS security.
Once your credentials have been filled, test the connection, and click Next.
For our use case, we have input the credentials to use the MongoDB Destination for our local server.
We will now be taken to the MongoDB Pick Collection screen.
For our use case, we will select Create/Replace and add a new Collection.
Database Operations – These operations are used when we are picking an already existing collection.
Insert: To insert a new record into the collection.
Update: To update an existing record in the collection.
Delete: To delete a record from the collection.
Upsert: To insert and update a record in the collection.
Select Fields for matching database records: Selecting from this drop-down menu lets the user select fields based on which to match the records for the selected database operation.
Write Concern Options – Selecting from these options lets the server provide an acknowledgment to the user based on how the process was carried out.
ACKNOWLEDGED: This will return an acknowledgment in the Job trace window if the process stops after getting an error or if the process successfully completes.
UNACKNOWLEDGED: This option will not return an acknowledgment, no matter how the data write is carried out.
MAJORITY: If there are multiple primary and secondary servers, this option will return when the majority of the servers have been processed.
W1: Selecting this option will return an acknowledgment when the primary server has been processed.
W2: Selecting this option will return an acknowledgment when the primary server and one secondary server have been processed.
W3: Selecting this option will return an acknowledgment when the primary server and two secondary servers have been processed.
Data Load Options – These options let the user define how the data is going to be loaded into the database.
Bulk insert with batch size: This will insert all records divided into batch sizes that the user has defined.
Bulk insert with all records in one batch: This will insert all records in a single batch.
Use single record insert: This option will treat every record individually and insert them one by one.
Select Type of Bulk Insert: Selecting from this drop-down menu lets the user define whether the Bulk Insert will be Ordered or UnOrdered.
In the case of Ordered, data writing will be stopped if an error is encountered between record insertions.
In the case of UnOrdered, data writing will continue despite any errors being encountered.
Click Next and you will be led to the MongoDB Layout screen.
Currently, our layout is empty since we have not mapped any fields to it.
We will map the incoming fields from the XML/JSON Source object to the MongoDB Destination object.
We will then reopen the MongoDB Layout screen.
As you can see below, the entire layout has now been defined.
Click OK and the MongoDB Destination object will be configured.
Select the Start Dataflow option in the main toolbar and the data will be written to the destination.
As you can see in the Job Progress window, the data has been successfully written to the destination.
Note: You can see the acknowledgment in the window since we selected the respective option.
This concludes the configuration of the MongoDB Destination object in Astera.
In addition to the standard logging functionality, Astera provides a special Data Quality Mode option, useful for advanced profiling and debugging. When a dataflow is created/opened in Data Quality Mode, most objects on the dataflow show the Messages node with output ports.
In this document, we will learn how to use the Data Quality Mode in Astera.
In this case, we have a simple dataflow designed to perform a data quality check. It contains customers’ data coming in from an Excel Workbook Source. A Data Quality Rule object is added to validate data for null values and perform warning checks.
If you preview the Customers dataset output at this stage, you will see that some of the records have missing values in the Region and Fax fields.
Data quality rules are set so that records with empty Region values are marked as errors and records with empty Fax values are marked as warnings. A red exclamation sign in the Data Preview window identifies the records that have failed to match the rule and returned an error or a warning as a result.
Now, for instance, we want to collect information regarding the number of errors/warnings in a single record, the error/warning messages attached to these records, and write this information to a destination. For this purpose, we will use Data Quality Mode.
Note: The Record Level Log feature also collects and records this information but we can not further process it in the dataflow.
Once Data Quality Mode is activated, a Messages node will be added to all the objects in the dataflow.
The Messages node captures the following statistical information:
TotalCount
ErrorCount
WarningCount
InfoCount
MessagesText
DbAction
Custom
In addition, FirstItem, LastItem, and Items sub-nodes provide a way to collect quality control data for each of the records. The quality control data includes ElementName, MessageType, or Action etc. and can be written to a destination object for record-keeping purposes.
Connecting the Messages node’s output ports to another object’s input ports on the dataflow makes it possible to get both - summary statistics and record-level statistics for the dataset, which are useful for analysis and debugging. To do this:
Right-click on the Messages node inside the NoNullValues_Rule object and go to Write to > Delimited File Destination.
A Delimited Destination object is added to the designer with mapped fields.
Configure settings for the Delimited File Destination to save this data.
Right-click on the header of the destination object and select Preview Output from the context menu.
A Data Preview window will open, showing error and warning information.
The Quick Profile option in Astera gives users the ability to preview field statistics of any set-level object in the dataflow in design time. It provides information such as the data type, minimum/maximum values, data count, error count, etc which can be used to identify and correct data quality issues while designing flows.
The Quick Profile window can be viewed for an entire flow by clicking View > Quick Profile or using the shortcut key Ctrl+Alt+A.
Note: If there are multiple objects in a flow, using Quick Profile from the View tab or the shortcut key will provide information on the columns of the last object in the flow.
To view field statistics at a particular object in the dataflow, right-click on the object’s header and select Quick Profile.
In this case, we are using data from a Loans Excel File Source:
A window like this will slide up from the bottom of the screen:
Quick Profile provides an overview of the content and quality of all the fields, allowing us to determine whether the data is suitable for further transformation. When creating the flows, we may use this functionality at any point to identify any erroneous data that might be affecting the final results.
Note: The data processed by Quick Profile is in-memory and can only be stored permanently if exported to an Excel file. To do that, right-click anywhere in the Quick Profile window and select Export to Excel.
Enter a valid file name and click Save.
This concludes the use of Quick Profile feature in Astera.
Data Driven Write Strategy is a set level functionality, which means that the entire incoming dataset must flow through it. It allows a user to stamp a database directive on the record so that when it reaches its destination, it will be loaded according to that directive to perform the specified write action. You can specify multiple rules within the properties of a Data Driven write strategy object. These rules are tried against each record from top to bottom. If a record passes the first rule, it will not be tried against the next rule(s).
Assume a scenario in which Orders data from a Database Table Source is written to a Database Table Destination. We want to DELETE those records where ShippedDate is before the year 2000 and declare those records where Freight is less than 10 as ERROR. We will use the Data Driven write strategy object to achieve this task.
Drag-and-drop the Data Driven object from Toolbox > Database Write Strategy > Data Driven onto the dataflow designer and map the source data to the Data Driven object.
Right-click on the header of the Data Driven object and select Properties.
A Layout Builder window will open where you can modify your layout. Click Next.
The next window is the Data Driven Write Strategy Conditions window, where you can specify rules to route your data. Click on the fx option to enable the Expression Builder.
Once you select this option, the Expression Builder will be enabled.
Now, specify the following rules in the Expression Builder and select Database Action Type as ERROR for the Freight rule and DELETE for the Date rule.
Year(ShippedDate) < 2000
Freight < 10
There are five Database actions: Insert, Update, Delete, Skip and Error. From these, you can select the action you want to be taken for a certain rule.
Once you are done specifying the rule(s), click OK.
You can now write your data to a Database Table Destination.
It is important to note here that while working with Database Write Strategies, Single Port is selected. Once you check the Single Port option in the Database Table Destination object, a box will appear in which you will have to specify a field for matching database record. In our case, we will select OrderID.
We have successfully configured the settings and built the layout.
Let’s preview the output.
Data Driven output:
You can see that Astera has appended an error message with the records where Freight is less than 10. You can create an error log of these records or load them into a database if you want.
Note: Since, the Freight rule is our second rule, only those records which have failed against the first rule would be declared as an error. Which is why the error message only appears with records where both, ShippedDate is greater than 2000, and Freight is less than 10.
Now, whenever you access the same database table, you will see that the records where ShippedDate is before the year 2000 would be deleted.
This concludes using the Data Driven Write Strategy in Astera.
The Source Diff Processor object is one of the Database Write Strategies offered in Astera. It works like the Database Diff Processor, however, unlike the Database Diff Processor, it is used to perform write actions (such as Insert, Update and Delete) on file destinations. It stores a snapshot of your data processed in the first run in a CDC file. So, the next time you run it, it will only import the new records.
We have a sample Employees dataset coming in from an Excel Workbook Source. Initially, we had records of 10 employees but later on, 2 more were added in the source dataset. We wish to apply a database write strategy that can read the data incrementally from file sources. To achieve this, we will use the Source Diff Processor in Astera.
Drag-and-drop the Source Diff Processor object from Toolbox > Database Write Strategy > Source Diff Processor onto the dataflow designer and map the source data to it.
Right-click on the Source Diff Processor object’s header and select Properties.
A Layout Builder window will open where you can modify your layout. Click Next.
The next window is the Incremental Write Options window.
Here, you have to specify the Record Matching field. This field is used to match and compare the incoming and existing records. We will select EmployeeID as the Record Matching field.
Case Sensitive – Check this option if you want to compare records on a case sensitive basis.
Sort Input – Check this option if you want to sort the incoming data.
Now, if the incoming dataset has a new record with a new EmployeeID i.e. the ID is not present in the existing file which is being compared against the incoming file, Astera will perform the INSERT action.
If the EmployeeID is already present in the existing file, Astera will compare the records against that ID and will perform UPDATE action in the fields where the information has updated.
If the EmployeeID is there in the existing file, but not present in the incoming file, it means that the particular record has been deleted. In this case, Astera will perform the DELETE action.
In the Output Options section, you can either select the Single Output option or One Port for Each Action.
The Single Output option is selected if you wish to load your data into the destination without modifying it further on the basis of individual write actions. If you select Single Output, the database action such as INSERT, UPDATE, SKIP or ERROR will be chosen by the database write strategy’s logic rather than being specified by the user. Using a Single Output is recommended when a database write strategy is applied.
One Port for Each Action is used when you want to further transform or log your data. If you select One Port for Each Action, you will get separate nodes for each Diff action in the Source Diff Processor’s object.
In this example, we will select Single Output.
The third section in the Incremental Write Options window is the Incremental Transfer Information File Path option. Here, you must specify the file path where you want to store information related to the last run.
Now, if you have worked with Excel Workbook and Database table Sources in Astera, you would have noticed that the Database Table Source object gives you the option to read incremental changes. However, no such option is available in Excel or other file source objects. This option in the Source Diff Processor enables you to read incrementally from different file formats such as Excel, Delimited, and Fixed Length.
Click OK.
Now, right-click on the Source Diff Processor object’s header and select Preview Output.
Output preview for Single Output:
Output preview if One Port for Each Action is selected:
You can now write your data to any destination or perform any transformation on the dataset.
This concludes using the Source Diff Processor write strategy in Astera.
A Record Level Log captures the status (Success, Error, Warning, or Skip) for each of the records transferred, and includes snapshots of the source record and the destination record. It also provides additional details, such as error messages.
You can have any number of record level logs on the dataflow. Each record level log will collect the status of the records in the object that it is connected to.
In this document, we will learn how to use Record Level Log object in Astera.
In this case we have a simple dataflow performing a data quality check process. It contains a Customers dataset stored in an Excel Workbook Source. Then, a Data Quality Rule is applied to validate data for error and warning checks and finally, data is written to a Database Table Destination.
If you Preview Output for the Customers dataset, you will see that some of the records for the Region and Fax fields are empty.
A Data Quality Rule is applied to identify null records in the Region field as errors, and empty records in the Fax field as warnings. Upon previewing its output you will see that the records that failed to match the rule have returned an error, denoted by a red warning sign.
If you hover on these warning signs, it will show you the error message.
Now, when we run this dataflow we want to know of records which passed the validation check, records that failed the validation check, records containing errors and records that ended in only warnings.
For this purpose, we will use Record Level Log.
To get a Record Level Log object from the Toolbox, go to Toolbox > Data Profiling > Record Level Log. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Record Level Log object onto the dataflow designer.
Another way to get Record Level Log object is to right-click on the Output node inside Database Table Destination and go to Write to > Record Level Log
You can see that the dragged Record Level Log object is empty right now. This is because we have not mapped any fields to it yet.
Auto-map the fields from the source object to the Log object.
To configure the Log object, right-click on its header and select Properties from the context menu.
A configuration window will open. The first window you will see is the Layout Builder window. This is where we can create or delete fields, change their name and data type.
Click Next, and you will be directed to a Properties window where you can configure settings for creating the log file.
Specify the Profile File path where Astera will save this log file. Log files are saved with .prof extension.
Specify the Log Level Type from the dropdown list.
All – all records (including Success records) are logged
Errors – only error records are logged
Warnings – only warning records are logged
Errors and Warnings – both error and warning records are logged
Off – no logging
In this case, we will select Errors and Warnings as our log type.
Stop Logging After … Records with Errors – allows you to limit excessive logging by setting a cap on the maximum number of errors to be logged. The logging stops after the cap has been reached.
The default value is 1000 errors.
Click OK.
Click on the log file link provided in the Job Progress window.
Record Level Log will open in Astera showing you the status of logged records.
Astera stores the error logs in XML format and if you expand each record, it will give you the Field Name to which the error/warning message was attached, the Processing Step of data check that resulted in the errors, as well as the error Message.
If you click on View Record, Astera will show you the field values of the record failing the data quality rule.
Database Diff Processor is one of the four Database Write Strategies offered in Astera. Its purpose is to synchronize the data present in two separate datasets. The object compares the two datasets and performs write actions (insert and update) on the destination table so that both tables contain the same information.
In this use case, we have a sample dataset of customers that is stored in a database table. Currently, this dataset contains 10 records, but two more customer records are to be added later on. Furthermore, updated phone numbers are to be added for two other customers.
We want to write the initial dataset to another database table and ensure that whenever the aforementioned changes are made, they are applied to both tables. To achieve this, we will be using the Database Diff Processor object in Astera.
1. Drag and drop the Database Table Source object from Toolbox > Sources > Database Table Source onto the dataflow designer. Configure this object so that it reads data from the source table.
To learn how you can configure a Database Table Source object, click here.
Drag-and-drop the Database Diff Processor object from Toolbox > Database Write Strategies > Database Diff Processor onto the dataflow designer. Auto-map all of the elements from the source object to the Database Diff Processor object.
Right-click on the header of the Database Diff Processor object and select Properties.
This will open the Database Connection window. Here, you will have to enter the credentials for the database you want to connect to. Alternatively, you can connect to a recently used database by selecting it from the Recently Used dropdown list.
In this case, we will connect to a test database that contains an empty Customers table.
Once you have entered the required credentials, click Next.
On the next window, you will have to pick a destination table, where the write actions (Insert and Update) will be performed.
In this case, we will pick the empty Customers table that has already been created in this database.
There are a couple of options related to Record Matching at the bottom of the screen:
Select a field to be used for matching records between the source table and the destination table. In this case, we will select CustomerID because it cannot change for each customer.
Check the Case Sensitive option if you want the comparison to be case sensitive. In this case, we will leave this option unchecked.
Now that the required table and Record Matching options have been selected, click OK to close the configuration window.
Run the dataflow to write your source data to the destination table. To preview the source data, right-click on the Database Table Source object and select Preview Output.
This is what the source data looks like:
To check whether this data has been written to the destination table, right-click on the Database Diff Processor object and go to Database Table > View Table Data.
The destination table will open in a separate tab within Astera.
The data present in the destination table is the same as that in the source table, showing that we have successfully written the data by running the dataflow.
The source dataset has been updated to include two more customer records. Moreover, two other customers have updated their phone numbers. This is what the source data looks like after the changes have been implemented:
Run the dataflow again to apply these changes to the destination table. This is what the destination table should look like when you open it in Astera after running the dataflow again:
The changes that were made to the source table have automatically been applied to the destination table as well, showing that the Database Diff Processor object has achieved its task.
This concludes using the Database Diff Processor write strategy in Astera.
The Delimited Serializer converts a structured data set into a single text stream with fields and records separated by delimiters or identified by text qualifiers. A serialized data with delimiters allows sharing or storage of the data in a form that allows recovery of its original structure.
In this document, we will learn how to use a Delimited Serializer to serialize a structured data in Astera.
In this case, we are using the Customers table from the Northwind database. You can download this sample data from the following link:
The source file contains customer’s contact information including their ContactName, Address, PostalCode, Phone etc., in a structured format.
We want to convert the information contained in multiple fields into a single text stream separated by a delimiter.
To perform this task, we will use the Delimited Serializer object in Astera.
To get the Delimited Serializer object, go to Toolbox > Text Processors > Delimited Serializer and drag-and-drop the object onto the designer.
You can see that the dragged object contains a Text field with an output port and an Input sub-node which is currently empty.
Auto-map source fields by dragging and dropping the top node of the source object, that is Customers, onto the Input node of the transformation object – Delimited Serializer.
Right-click on the object’s header and select Properties.
A configuration window will open as shown below.
Let’s look at the properties on this window.
Field Delimiter – Allows users to specify a delimiter for the source fields from the dropdown list.
Text Qualifier – Allows users to specify qualifiers at the start and end of a text stream. In most cases, text qualifier encloses an entire record.
Build Operation Type – Contains two options in which a dataset can be serialized:
One Record Per Input – creates a single text record separated by delimiters for the entire data set.
One Record Per Transaction – creates as many text records as the source file separated by only field delimiter.
Let’s leave the properties as default, and click OK. The data has been serialized.
To preview the data, right-click on the Delimited Serializer object’s header and select Preview Output from the context menu.
A Data Preview window will open showing the serialized data with field delimiters
To store this serialized output, write it to a destination file or you can use this data further in the dataflow.
This concludes using the Delimited Serializer in Astera.
The Delimited Parser in Astera reads and processes a single stream of text in delimited format as input and returns its elements as parsed output. It enables users to transform an otherwise semi-structured data into a structured format.
In this document, we will learn to use the Delimited Parser to parse an incoming text stream in Astera.
In this case, we are using the Delimited File Source to extract our source data. You can download this sample data from the following link:
The source file contains customers’ contact information including their name, address, postal code, phone number, etc.
Upon previewing the data, you can see that it is difficult to decipher fields and elements since the data is in a single text stream with fields and records separated by delimiters. To make sense of this data, each records needs to be parsed into its elements in respective fields.
To do this, we will use the Delimited Parser object.
To get the Delimited Parser object, go to Toolbox > Text Processors > Delimited Parser and drag-and-drop the object onto the designer.
You can see that the dragged object contains a single Text field.
Map the Customer_Info field inside the source object onto the Text field inside DelimitedParser object.
Right-click on the object’s header and select Properties.
A configuration window will open as shown below.
Let’s look at the properties on this window.
Parse Data Pattern – Contains three patterns in which the dataset can be parsed:
Single Record – Data is parsed into a single record with multiple fields. Users need to provide a field delimiter, and a text qualifier, if necessary.
Multiple Records – Data is parsed into multiple records with a single or multiple fields. Users need to provide a field delimiter as well as a record delimiter.
Field Arrays – Data is parsed into an array of records and fields. Users need to provide a field value delimiter and an array separator.
The source data in this case contains multiple records with many different fields. Therefore, we will set the Parse Data Pattern option to Multiple Records.
Provide a Field Delimiter and a Record Delimiter. The source file also contains a Text Qualifier.
Click Next. This is the Layout Builder screen.
Here, write the names of the fields that you want to create.
Click OK. The Delimited Parser object now has new fields in the Output node.
To preview data, right-click on the object’s header and select Preview Output from the context menu.
A Data Preview window will open. Upon expanding the records, you can view the parsed output.
To store this parsed output, you can write it to a destination file or use it for some transformation further in the dataflow.
This concludes using the Delimited Parser in Astera.
After configuring the settings for the Delimited File Destination object, click on the Start Dataflow icon from the toolbar located at the top of the window.
To activate this feature, click on the Data Quality Mode icon located at the top of the dataflow designer.
After configuring settings for the Log object, click on the Start Dataflow icon from the toolbar located at the top of the window. A Job Progress window will open at this instant and will show you the trace of the job.
The Filter transformation object in Astera is used to filter out records based on some pre-defined rule. The records that match the specified rule are filtered out and can be mapped further in the dataflow whereas the records that do not satisfy the specified condition are omitted. The rule or logic to filter out records from data can either be chosen from an extensive library of built-in functions or you can write one of your own.
Filter transformation is quite similar to Data Quality Rules in its functionality. However, unlike Data Quality Rules, which return an error or warning when the rule condition fails while still passing the record downstream, the Filter transformation will completely filter out any such records. The filtered records, as well as their status will not be accessible to any downstream object on the dataflow, including any type of log.
In this case, we have Customers data for a fictitious organization stored in a Delimited file source. We want to filter out the records in which:
Country = Germany
Contact title = Owner
To filter out these records from our source data, we will use Filter transformation object and write the relevant expression in the Expression Builder to achieve our desired output. We will then write our filtered output to a Fixed length destination.
Retrieve your source data by using the relevant source object from the Sources section in the Toolbox. In this case, our sample Customers data is stored in a delimited file so we will be using a Delimited file source object to retrieve this data in the dataflow.
Next, drag and drop the Filter transformation object from the Transformations section of the Toolbox to the designer and map fields from the source object to the Filter transformation object.
Right-click on the Filter transformation object and select Properties.
A Layout builder window will now open. Here you can modify fields and the object layout.
Click Next. This will take you to the Filter Transformation Properties window. Here, you can see the following three sections:
Functions: An extensive library of built-in functions organized in various categories. From here, you can select functions according to your requirement.
Expression: The filter expression will appear in this Expression box. You can either write your own expression or choose from the built-in functions library.
Objects: Contains the object layout. You can double click on any element in the layout to write it in the Expression field.
In this example, we want to filter out records of customers with the ContactTitle, ‘Owner’, and Country, ‘Germany’. For this, we will write the following expression in the Expression box:
Country = “Germany” and ContactTitle = “Owner”
After writing the expression, click on the Compile button to check if the expression is correct. If the Compile Status is Successful, the expression is correct. If not, then you need to check and rectify your expression before proceeding to the next window.
Click Next. This will take you to the Config Parameters window where you can further configure and define parameters for the Filter transformation object.
Click Next to proceed to the General Options window. This window consists of options common to most objects in a dataflow, such as:
Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then feed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect statistics.
The Comments input allows you to enter comments associated with this object.
The Filter transformation object has been configured. Click OK.
To preview filtered records, right-click on the Filter transformation object and select Preview Output.
The output would look like this:
You can now write your output to a destination or further transform it by applying some transformation.
In this case, we will write our output to a Fixed Length file destination.
Rename your destination object by double-clicking its header. Here, we will rename it as German_Customers.
This concludes using the Filter transformation object in Astera.
Astera’s Fixed Length File Destination provides the functionality to write data to a Fixed Length File.
To get a Fixed Length File Destination object from the Toolbox, go to Toolbox > Destinations > Fixed Length File Destination. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Fixed Length File Destination object onto the designer.
The dragged destination object is empty right now. This is because the object has not been configured yet.
For the Fixed Length File Destination object to work, it needs to be provided with a data source.
Configure the source object and place it onto the designer next to the Fixed Length File Destination object.
Note: In this case we will be using a Customers sample table from the Database Table Source to write to the Fixed Length File Destination.
Now map the source object to the destination object. The following ways can be used for mapping:
i. By dragging and dropping the parent nodes onto each other for automatic mapping.
ii. By mapping the source parent node by dragging it to the destination parent node manually.
iii. By directly writing the source layout to a Fixed Length File Destination through the source context menu of its parent node.
The fields are now mapped.
To configure the Fixed Length File Destination object, right-click on the header, select Properties from the context menu, and a dialogue box will open.
Provide the File Path. This is where the fixed length destination file will be saved.
The dialog box has some other configuration options. Let’s go over these options:
Options:
If the First Row Contains Header, check the box to read headers from the source file.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination, carriage-return and line-feed. The record delimiter can also be of choice, instead of choosing from the available options.
In case the records don’t have a delimiter, the Record Length field is used to specify the character length for a single record.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Choose Append to File (If Exists) to append to an existing file or create a new file. Creating a new file will overwrite any existing file.
Check the Write to Multiple Files option for the data to be saved to multiple files instead of one single file. This can be done within a single dataflow through the destination object and supporting transformations.
To define hierarchical file layout and process the data file as a hierarchical file check the This is a Hierarchical File option. Astera IDE provides extensive user interface capabilities for processing hierarchical structures.
Once the data reading options have been specified on this window, click Next.
The next window is the Layout Builder. On this window, the layout of the fixed length destination file can be modified.
To add a new field to the layout, go to the last row of the layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field to be added and select the subsequent properties for it. A new field will be added to the source layout.
Note: Adding a new field (Email) to the layout.
To delete a field from the layout, click on the serial column of the row that is to be deleted. The selected row will be highlighted in blue.
Note: Deleting the Fax field from the layout.
Right-click on the highlighted line, a context menu will appear which will have the option to Delete.
Selecting Delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
To change the position of any field and move it below or above another field in the layout, select the row and use Move up/Move down keys.
Note: Find the Move up/Move down icons on the top left of the builder.
For example: To move the Country field right below the Region field, select the row and use the Move up key to from the 9th row to the 8th row.
The row is now moved from the 9th position to the 8th position.
Once the object layout is configured, click Next. A new window will appear, Config Parameters, which allows to further configure and define parameters for the fixed length destination file.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
A General Options window will appear. On this window:
Comments can be added.
General Options are given, which relate to processing of records in the destination file.
Click OK.
The FixedDest object is now configured according to the changes that were made in the properties window.
Note: The changes that were made in this case are:
Added a new field, Email in the layout.
Moved the Country field below the Region field.
The Fixed Length File Destination object is successfully configured, and the destination file can now be created by running the dataflow.
Upcoming...