1 of 100

Version 10

Welcome to Astera Data Stack Documentation

Getting Started!

Artifacts

Use Cases

More...

RELEASE NOTES

Astera 10.5 - Release Notes

Astera 10.5, where precision meets innovation in data governance. Unleash the power of smooth data management with our Governance platform's features: From intuitive UI enhancements to advanced AI-driven data enrichment and profiling.

Astera's Access Management ensures secure data marketplace navigation, while our Business Glossary, generated intelligently by AI, adds clarity to your vocabulary.

Experience a refined user interface, and optimized AI functionalities, setting a new benchmark in data governance.

Elevate your data trek with Astera 10.5 – where efficiency, visibility, and performance converge effortlessly, all within an intuitive drag-and-drop interface.

Data Governance

Data Discoverability

Advanced search and filtering capabilities to find exactly what the user is looking for. The Astera Governance platform offers these features and much more.

Data Enrichment

We've integrated AI to automatically generate business titles, descriptions for assets/artifacts, and field descriptions, enhancing clarity and efficiency in our processes.

Data Profiling

Data profiling refers to the process of examining, analyzing, reviewing, and summarizing data sets. The Astera Governance platform offers advanced functionality that encompasses Data Profiling.

Data Quality

Data quality is a measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability, and whether it's up to date.

Business Glossary (this is also being generated via AI)

A business glossary is a set of data-related terms and definitions. The Astera Governance platform also supports a business glossary, which is generated using AI.

Data Marketplace/Access Management

The Access Management section of the Astera Data Governance platform lets the owner set resources to each of the users, based on their roles.

Improvements

FTP and File Handling Enhancements

Enhancements have been made to the Cloud Browser functionality, refining source parameter usage in FTP List, and improving password handling in the File Transfer Task.

Improvements in user experience include better informative messaging during server downtime and streamlined communication for file uploads/downloads.

User interface improvements encompass enhanced homepage pagination. The Edit Toolbar button has a polished appearance, and project explorer panels feature icons for improved visibility.

The Ask AI feature has been enhanced, and UI improvements contribute to a more user-friendly experience. Forward/back buttons in the Job Progress window have been improved, enhancing usability.

Artificial Intelligence (AI) Enhancements

Improvements in the AI Mapper performance and error handling have been made. The 'Build Using AI' feature has seen refined UI improvements for seamless user interaction.

AI Mapper optimizations enhance performance, and parsing of Functional Groups in EDI Source has been improved.

Other Enhancements

UI improvements for the Build Using AI feature have been made. Cluster repository building now features clearer communication.

Additional Notes

For existing customers, a repository upgrade and service restart are required. It is recommended that upgrading customers use the Repository Upgrade Utility.

This concludes the Astera 10.5 Release Notes.

Astera 10.4 - Release Notes

Astera 10.4 revolutionizes data management with Google BigQuery and Azure Data Lake connectors, Excel Worksheet Parameterization, and AI-driven mapping precision.

Furthermore, explore advanced API Management capabilities, Cloud Browser in Scheduler, refined AI features in Astea Report Model, and innovations like GraphQL APIs.

Experience a data revolution with Astera 10.4's user-friendly drag-and-drop interface, simplifying complex tasks without the need for coding. Elevate your data journey as the intuitive design puts advanced management capabilities at your fingertips, ensuring a seamless and efficient process.

Unleash the potential of your data management with this intuitive and transformative release.

Report Worksheet Parameterization

The Worksheet Parameterization feature empowers users to efficiently extract data from specific worksheets in an Excel file through convenient looping mechanisms. When accessing the configuration parameter screen, the option to define the desired worksheet is displayed.

This selection is made flexible and customizable as the worksheet option can be parametrized using variables or a configuration file. This comprehensive feature enhances data retrieval processes, simplifying and optimizing data handling from multiple worksheets in Excel.

AI Automap

In Astera Data Stack, the AI Automapper utilizes semantic relationships to facilitate field mapping. By analyzing the context and meaning of the fields, it intelligently establishes connections and maps them accordingly.

This advanced approach streamlines the process, enhancing accuracy and efficiency in field mapping tasks.

Dataprep

Grid Enhancements

The grid has been made more aligned by making it easier to sort and filter the columns. Connected to an easy interface and to the expression language, this enhancement is a step closer to making it more streamlined.

Connectors

MongoDB Destination

MongoDB is a document-oriented database in which one collection holds different documents. The MongoDB Destination object in Astera Data Stack provides functionality to write data onto it. This component provides functionality to control how data should be written in collections.

Google BigQuery

In Astera Data Stack, users will be able to connect with Google BigQuery as a database source or destination. As a source, both flat and hierarchical data can be read. For destination, flat data can be written to Google BigQuery.

Azure Data Lake (Gen 2)

Azure Data Lake Gen 2 is a cloud-based big data storage and analytics solution provided by Microsoft Azure. It offers scalable and cost-effective storage for structured, semi-structured, and unstructured data.

The Azure Data Lake Gen 2 provider will be present within the Cloud Connector object in Astera.

API Management

Support for raw input in API Client

In Astera API Management, the user can now make API requests in any content type by providing the content type along with its serialized content string.

Support for URL Encoded requests in API Client

In Astera API Management, the user can now make API requests using the application/x-www-form-urlencoded payload content type.

Allowing custom server responses for no data response

Define a custom response for your deployed APIs when the flow processing does not give an output. This gives developers flexibility in designing APIs according to the desired standards.

Core

Cloud Browser support in Scheduler

Improvements have been made to the user-friendliness of the Cloud Browser. Cloud Browser’s support has been added to the Scheduler.

Astera Report Model

AI Feature – Duplication Accuracy Enhancement

After utilizing the AI Feature to generate the layout, we noticed that certain fields were being replicated within the table collection region.

However, through thorough investigation and refining our prompt, we managed to significantly reduce field duplication by around 80% based on our analysis of the enhancements implemented.

AI Feature – Cache in DB

The output generated by our AI Feature used to be stored in a system folder, but there has been a modification to this functionality. Now, the cache is stored in a repository table named AICache.

Sort/Un-Sort Layout Option

We've now incorporated an option to sort or unsort the layout of the model. This enhancement adds a sorting feature accessible through the context menu in various regions. By clicking the new sort button, users can conveniently arrange the created fields in alphabetical order.

These changes will be immediately reflected in the data preview as well, and this sorting functionality is applicable across all types of regions, ensuring a more organized experience.

AI Feature Document Type Support – Account Statement

We have added an additional document type support for our AI Feature to extract data and create report models from Account Statements.

Other Features

GraphQL APIs

GraphQL APIs are a query language and runtime for APIs, enabling clients to request the data they need, reducing over-fetching and under-fetching. They offer a more efficient, flexible, and self-documenting approach to data retrieval and manipulation compared to traditional REST APIs.

Astera API Management now supports the use of GraphQL APIs as the input content type for an API. The feature can be seen within the Input Content Type drop-down menu of the API Client object.

Run OCR for all PDF Files

Astera now provides the functionality to extract data from PDFs that contain scanned documents using Optical Character Recognition. This option has now made scanned documents available for extraction to users, minimizing the effort of manual data entry from such documents.

Users can select the Resolution for OCR, allowing them to get the best result for their documents.

Toast Notification Enhancement

The Toast Notification feature in Astera Data Stack has seen quite a lot of improvement and enhancement, making it more efficient and seamless.

AIMapper Improvement

In Astera Data Stack, the AI mapping approach has been changed. The product is now implementing a Waterfall model for this feature.

In Astera 10.4, many bugs have been addressed user experience has been improved by enabling automatic refresh after actions like adding a catalog item. Other than that, manual refresh is no longer required.

RMD Source Support in Catalog

Support for the Report Source object has now been added to the resource catalog.

Azure Authentication

The Azure SQL Database is a fully managed platform as a service (PaaS) database engine that handles most of the database management functions such as upgrading, patching, backups, and monitoring without user involvement.

In Astera 10.4, the Azure SQL Server authentication is available.

This concludes the Astera 10.4 release notes.

Astera 10.3 - Release Notes

Astera 10.3 is here, brimming with excitement and a sea of new features!

With Astera's new AI Automapping feature, field mapping becomes easier, alongside seamless connectivity through providers such as Azure SQL, Google BigQuery, and many more.

Astera’s AI-powered data extraction is a game-changer, while Astera Data Stack's warehousing component introduces AI-Select and more. Unlock the full potential of your APIs with enhanced connectivity options and new features in Astera Data Stack.

Finally, experience the power of the Data Analytics Workbench, and refine your data with ease using Astera Data Prep. Elevate your data game with Astera 10.3!

Windows Authentication

Windows authentication is a security feature that allows users to log into a system using their Windows credentials.

Astera Data Stack leverages this authentication method and provides the option to register new users using Windows authentication through its Server Browser interface.

Modern Authentication in SendMail and ClusterSetting

In response to Microsoft Outlook discontinuing support for Basic Authentication and advocating the use of Modern Authentication, the SendMail object in workflow tasks has been updated.

It now incorporates a new feature of Modern Authentication, enabling users to seamlessly add authentication credentials directly from within the SendMail object.

This enhancement simplifies the process and ensures compatibility with the latest authentication standards recommended by Microsoft.

The option has also been added in the Cluster Settings as seen below:

Repository Upgrade Utility

Existing Astera customers can easily upgrade to version 10.3.1 by executing an exe. script, which automates the repository update to the latest release.

This streamlined approach enhances the efficiency and effectiveness of the upgrade process, ensuring a smoother transition for users.

Note: This upgrade applies to v10.0 and later ones. Previous versions cannot be upgraded and will need a clean repository as part of the upgrade.

Resource Catalog

The Catalog feature in Astera Data Stack is a centralized repository where one can store artifacts and share them with users as per the application.

The artifacts within the catalog are stored in a Catalog table. The security aspect of the Resource Catalog lets the user give permission to only the people whom they wish to share the artifacts in the catalog with.

Connectors

With the release of Astera 10.3, there have been quite a few new developments in the connectors’ domain.

Microsoft SharePoint

Microsoft SharePoint is an enterprise document management and collaboration platform that helps organizations manage critical content. Its enterprise content management capabilities streamline flows and centralize important content to enhance collaboration.

The SharePoint provider is present within the Cloud Connection object dropdown.

Simply drag and drop the Cloud Storage Connection object from the toolbox onto the designer.

Right-click on the object header and select Properties from the context menu.

This will open a new window.

Select Microsoft SharePoint Document Library from the Providers dropdown menu.

Note: The SharePoint connection can be accessed in any object where Cloud files are available.

Azure SQL

The Azure SQL Database is a fully managed platform as a service (PaaS) engine that handles most database management functions such as upgrading, patching, backups, and monitoring without user involvement.

In Astera Data Stack, users can access Azure SQL Databases using Database Table Source or Database Table Destination, DB lookup, SQL Statement Lookup, and Database Write Strategies objects. They can also connect with the Run SQL Script task in a workflow.

Google BigQuery – Preview

Google BigQuery is a serverless, highly scalable data warehouse that comes with a built-in query engine. The query engine can run SQL queries on terabytes of data in a matter of seconds, and on petabytes in minutes.

This kind of performance is achieved without having to manage any infrastructure and without having to create or rebuild indexes.

In Astera Data Stack, users can connect with Google BigQuery as a database source or destination. As a source, flat and hierarchical data both can be read. Only flat data can be written to the Google BigQuery destination.

Azure Datalake Gen 2 Storage - Preview

The Azure Data Lake Gen 2 provider is present within the Cloud Connector object in Astera.

Cloud Browser Usability

Improvements have been made to the user-friendliness of the Cloud Browser.

Changing the Cloud Connection, browsing cloud folders has been made easier in Astera 10.3. Cloud Browser’s functionality with SharePoint has also been improved.

File System Items Source

In the File System Items Source object, multiple filters are supported for both local and cloud connections.

Report Model

In Astera 10.3, the overall user interface of Astera has been improved and revamped. There are quite a lot of new features being introduced in this version. Let us take a look at them.

Pages to Read Option for PDFs

To provide users with greater control over their report editor, we have introduced a new feature called Pages to Read. This feature enables users to filter and selectively display specific PDF pages within the report editor, ensuring a focused and efficient document analysis experience.

Auto Create Table (Preview)

The Auto Create Table option empowers users to select and create tables within the document seamlessly. With this addition, users can automate the table creation process, saving valuable time and effort.

Report Model Path Parameterization

The Report Model Path Parameterization feature introduces enhanced flexibility to dataflows by enabling users to specify a customizable Report Model path in addition to the Report Source path. This crucial enhancement allows for runtime parameterization, granting the ability to dynamically change the Report Model or apply different templates to various report sources.

As a result, automating data extraction from Report Models becomes significantly easier, streamlining the overall data processing and analysis workflows. This feature empowers users with increased control and adaptability, facilitating more efficient and versatile data-driven decision-making processes.

AI-Powered Data Extraction

Astera now uses AI to recommend report model templates, allowing you to automatically generate models for multiple source files at once. By specifying the layout and document type, Astera recommends the most suitable model templates, saving you valuable time and energy when building your data extraction processes.

With this new feature, you can streamline your workflow and eliminate the need for manual data extraction. In this document, we will see how to use this feature to create the report models.

Extracting data from Scanned PDFs via OCR

Astera now provides the functionality to extract data from PDFs that contain scanned documents using Optical Character Recognition.

When provided such a PDF, the tool recognizes it as an image PDF. However, the Use OCR option must be enabled manually by the users first. This option has now made scanned documents available for extraction to users, minimizing the effort of manual data entry from such documents. Users can select the Resolution for OCR, allowing them to get the best result for their documents.

Edit Mode

Additionally, to ensure correct data extraction, as noise elements can cause erroneous data to be extracted, an Edit Mode is also available for the users to clean and tweak the extracted data.

Edit Mode allows you to deal with the data as a text file and make changes accordingly.

Data Field Verification

Report Models now have the functionality to verify if the data fields have been captured properly for all data instances by checking for any non-blank character being adjacent to instances of data fields. This option gives users a one-click check for the data fields they have created.

Additionally, to allow users better visibility of the erroneous fields, navigation between instances of the data field is also provided along with an option to auto-adjust field lengths for all data fields within the selected data region.

Users can now access wildcards and other additional features for patterns in a report model through a context menu by right-clicking on the pattern box.

Append Region and Data Region Interchangeability

Now, if need be, users can change a data region to an append region and vice versa within the Model Layout panel.

Dataprep

Astera Dataprep is a dynamic platform designed for rigorous data cleansing, transformation, and preparation activities. With its user-oriented and preview-focused interface, it offers significant functionality and visibility to streamline the preparation process.

The system allows for quick operations by seamlessly interchanging between scripting and point-and-click methodologies. Serving as a crucial intersection of data engineering and data science, Astera Data Prep is an invaluable tool in any data-driven operation.

Astera Dataprep is a data manipulation tool that offers interactive data correction, ATL scripting with auto-completion, and UI-ATL synchronization. It provides smooth navigation, action history tracking, and comprehensive data quality assurance.

The tool promotes scripting efficiency and reusability with template scripts and supports real-time visual insights for data analysis. It enables rich data transformations, including resolving cardinalities and merging datasets. Astera Dataprep streamlines data preparation, enhances productivity, and ensures data accuracy and integrity.

ATL Commands and ATL Intellisense

Dataprep offers 60+ ATL commands to enable a comprehensive set of data preparation strategies. ATL is a smart scripting language integrated with IntelliSense, where data engineers benefit from script auto-completion, reducing the need for constant reference to documentation. This propriety language can generate code snippets, enabling users to fill in the required fields effortlessly.

This streamlined approach enhances productivity, enabling data engineers to focus on the specific requirements of their data preparation tasks without the burden of repetitive syntax or command structure.

ATL Editor

It hosts all ATL commands and command-related operations. It is a multipurpose artifact that also serves as a preparation process navigation browser.

Data Source Browser

The Data Source Browser, while not a new feature, is a vital part of Dataprep. It hosts all file sources, catalog sources, cloud sources, and project sources to be imported into the Dataprep artifact.

Note: While the Data Source Browser is essential in Dataprep, it is not specific to it.

Dataprep Source and Transformation

Dataprep scripts are reusable and hence can be used as a source as well as a transformation in other artifacts such as dataflow, workflow, and analytics workbench.

Grid View

The grid view hosts a preview-centric interactive grid that automatically updates in real-time to display the transformed data upon each transformation/modification. It is a dynamic grid that provides instant feedback on data quality.

Dataprep Profile Browser

The Dataprep Profile Browser is a side window providing a comprehensive view of the data with graphs, charts, and field-level profile tables. It keeps a check on data health and highlights the presence of invalid entries, missing values, duplicates, etc.

Operations via drag and drop from Data Source Browser

Within Dataprep, this is a borderless, and headerless 2x2 grid that enhances the experience of data reading, joining, union, and lookup.

Users can navigate smoothly using point-and-click actions in the ATL editor. This includes action history tracking, allowing users to review and backtrack changes made during the data preparation process for transparency, editing, and control.

Data Models

Astera Data Stack's Data Model component has also introduced a handful of new features for the Astera 10.3 release.

AI Select

The AI Select feature in Astera Data Stack assists users in identifying potential Fact and Dimension candidates from their selected entities. To do that, first users can select Build Dimensional Model from the main menu bar.

In cases where users are unsure about classifying entities as Facts or dimensions, this feature leverages AI capabilities to automatically determine the appropriate classification, streamlining the data modeling process.

Infer Relationships with AI

Astera Data Stack incorporates an advanced AI-powered feature enabling users to deduce relationships between entities. This capability extends to recognizing self-referencing relationships as well as associations between the fields of different entities.

By leveraging AI algorithms, Astera Data Stack facilitates automated inference of intricate entity relationships for efficient data modeling.

API Flow

With the release of Astera 10.3, the area of APIs also brings a plethora of new features and improvements with it. Astera is a robust platform that enables seamless integration and efficient management of APIs.

It provides a comprehensive set of tools to create, publish, secure, and monitor APIs.

Import Custom CAPIs

The API browser provides a convenient option to import pre-built and pre-tested CAPI connectors directly from Astera’s GitHub repository. These connectors are carefully curated and include a comprehensive list of endpoints that have been thoroughly tested and configured for seamless consumer use.

This option allows users to easily access and integrate these connectors into their projects, ensuring reliable and efficient connectivity with the associated APIs.

Developer Portal (Beta)

A developer portal, also known as an API portal or API developer portal, is a website or platform that serves as a central hub for developers who are interested in consuming or integrating with APIs. It provides documentation, and support for developers to understand, explore, and use APIs effectively.

Astera 10.3 brings a beta release of this portal.

Multipart/form-data support for designing file transfer APIs

Multipart/form-data is a MIME (Multipurpose Internet Mail Extensions) media type used for sending binary data or files along with other form fields in HTTP requests.

The Request Publish object now supports this format type, allowing the designing of APIs that function to upload files and download files.

Certificate Store: generate, import, or export certificates

Client certificates are digital certificates that are used by clients (such as web browsers or client applications) to authenticate themselves to a server during a secure communication process, typically over HTTPS (HTTP over SSL/TLS).

These generate, import, and export options allow us to manage client certificates effectively and integrate them into our security infrastructure.

Show Swagger UI

We have integrated our tool with the Swagger UI component, allowing us to display the Swagger files of deployed APIs in a well-formatted and user-friendly manner.

This integration provides an enhanced user interface and experience for viewing and interacting with the API documentation.

Testflow generation enhancements, from the server browser

We have introduced the option to generate the test flows from the server browser. The test flow can now be generated after the API(s) deployment.

We can either create a test flow for a singleton API for the entire group of APIs.

Multipart (API Consumption)

The multipart format is a way of structuring data in an API request or response that allows multiple files or data types to be transmitted together as a single unit.

In Astera for the consumption side, the multipart format can be used to simplify the process of uploading or downloading large files, or when sending a single request that contains both file data and metadata.

Now, you can consume APIs In Astera Data Stack using an API client which supports multipart content.

AWS Signature Authentication

AWS Signature authentication is the process of verifying the authenticity of requests made to Amazon Web Services (AWS) using the AWS Signature method.

This authentication process involves calculating a digital signature for each request using the requester’s access key and secret access key, along with details about the request being made. AWS verifies the signature against the user’s access credentials and grants access to the requested resources if the signature is valid.

The AWS Signature authentication method ensures that requests are securely transmitted and that only authorized users can access AWS resources.

NTLM Authentication

NTLM (NT LAN Manager) authentication is a Microsoft proprietary authentication protocol used to authenticate users in a Windows-based network.

It provides secure authentication by using a challenge-response mechanism, where the server sends a challenge to the client, and the client sends a response that is encrypted using a hash of the user’s password.

NTLM authentication is used in various Microsoft products, including Windows, Internet Explorer, and Microsoft Office.

Raw preview request/response

A raw preview request and response feature allows API developers to view the exact request and response payloads being exchanged between clients and servers in their APIs.

This feature provides a detailed look at the headers, body, and metadata of the HTTP request and response, which can help API developers debug issues, test APIs, and optimize performance. By using raw preview request and response capabilities, API developers can gain a deeper understanding of how their APIs are being used and troubleshoot issues quickly and efficiently.

Copy CURL Command

Curl is a command-line tool that can be used to send HTTP requests to APIs and retrieve the respective responses.

It allows API developers and testers to easily interact with APIs and perform tasks such as testing, debugging, and troubleshooting. Curl supports various HTTP methods such as GET, POST, PUT, and DELETE, and can handle HTTP headers, cookies, and authentication.

It is a simple yet powerful tool that is widely used in API development and management.

API Logging

API logging is the process of keeping track of how an application programming interface (API) is being used.

It helps to understand how often the API is being used, how long each request takes, and any errors that occur. API logging can be used for troubleshooting, monitoring performance, and identifying security threats.

The logs can be stored locally or in a cloud-based system, where they can be analyzed to provide insights.

Support

The following feature of Astera Data Stack has had its support added to the product.

XML/Soap APIs in the API Client

Data Analytics Workbench

The Data Analytics component of the Astera Data Stack brings us the Data Analytics Workbench, alongside quite a lot of new features. From Linear Regression to Distribution Plot objects, this component has quite a lot to offer to the users.

Analytics Workbench

The Analytics Workbench is a powerful tool for designing and visualizing data science models and analytical graphs. With its intuitive drag-and-drop interface, users can easily construct complex analytical workflows and explore data patterns.

This artifact streamlines the process, enabling efficient data analysis and visualization in a user-friendly manner.

SMD Dashboard Designer

The SMD designer offers a grid-based interface that facilitates the design and visualization of models using drag-and-drop functionality. Users can easily arrange components, connect data flows, and configure properties to create sophisticated models.

This intuitive interface enhances the ease and efficiency of model design and visualization tasks.

Linear Regression

The Linear Regression object empowers users to establish and model the correlation between a quantitative response variable and one or multiple independent variables. It achieves this by fitting a linear equation to the provided data.

Linear regression is a diagnostic and predictive analytics technique, that offers insights into data relationships and makes future predictions based on observed patterns.

In Analytics Workbench, users have the flexibility to choose between four model estimation types:

Ordinary Least Square
Weighted Least Square
Generalized Least Square
Penalized Least Square

Decision Tree

A Decision Tree is a supervised learning algorithm utilized for classifying a dependent variable based on features within a dataset. In the Analytics Workbench, the Decision Tree object offers users a range of options such as test-train split configurations, null value handling, scaling and normalization techniques, decision tree criteria and splitting methods, as well as pruning strategies.

These features enhance the flexibility and customization of decision tree-based classification tasks.

Generalized Linear Model

The Generalized Linear Model object enables users to define a flexible generalization of linear regression that allows for response variables with error distribution models other than a normal distribution.

There are two components of the Generalized Linear Model (GLM).

Family Parameter

Gaussian
Binomial
Poisson
Gamma

Link Function

Identity
Log
Probit
Logit
Square Root
Inverse

A Generalized Linear Model uses a specific combination of the link functions and family parameters for a suitable fit to the data.

Pre-Analytics Testing

The Pre-Analytics Testing object wraps several statistical tests, that a user performs on the data, to determine an accurate statistical model to fit the source data.

Hence, the Pre-Analytics Testing object presents users with established parametric and non-parametric tests to evaluate data on these assumptions.

The Pre-Analytics Testing object hosts the following tests and graphs on each screen,

Heteroscedasticity scatter plot
Multicollinearity bar chart
Outlier Detection box plot
Normality Detection histogram

Correlation Analysis

The Correlation Analysis object enables users to compute Covariance and different types of Correlation such as Heterogenous Correlation, Partial Correlation, and Correlation - Significance Level between data fields.

The strength of the association is measured by computing correlation coefficients. In Analytics Workbench, users have the option to compute different types of correlation coefficients.

Contingency Table

A contingency table, also known as a cross-tabulation or a two-way frequency table, is a table used to summarize the relationship between two categorical variables. It displays the distribution of the data by counting the number of occurrences of each combination of categories.

In the analytics workbench, we support multiple contingency types,

Frequency
Probability
Percentage

Distributional Plots

The Distributional Plots object allows users to visualize categorical data variables using mainstream plots such as,

Bar charts,
Pie charts,
Histograms,
Frequency Polygons,
Spike plot

The Distributional Plots object has a drill-down functionality and an interactive interface with several configuration options. It is used to visualize a general profile of the user’s data.

Basic Plots

The Basic Plots object allows a user to understand and analyze their data/transformations through visual graphs such as

Line charts.
Scatter plots.

They provide interactive visuals with growing capabilities and features that enable an in-depth understanding of the nature of a user’s data and its trends.

Predictive Analysis

The Predictive Analysis object helps users predict the behavior of the dependent variable on a given test dataset. To make predictions, the object requires information about the analytical model fitted on a particular training dataset.

Once the champion model is selected in the analytics workbench, we can then use this workbench in the dataflow in the predictive analysis object for predictions.

Undo/Redo manager

The Undo/Redo manager allows users to undo/redo any actions that they have performed in the analytics workbench.

Astera Install Manager

The Install Manager installs the dependencies required for running auto-generate layout (AGL) and optical character recognition (OCR) on your system/machine.

Auto-generate layout allows for the generation of an extraction template at the click of a button. With optical character recognition, scanned pdfs can be processed by Astera to get extracted data.

AGL was introduced in Astera 10.0, and OCR has been introduced in Astera 10.2.

When you install the client and the server, you’ll see two install managers (one for the client and one for the server) installed.

You can run this from here directly or go inside the client, when it runs under an Admin account, and go to Tools > Run Install Manager and it would run the install manager to install the dependencies.

Note: If the client and server are on the same machine, then you need to run only one Install Manager out of the two (client and server). However, if the client and server are on different machines, then you’ll have to run the install manager for the client and the server on their respective machines.

UI Fixes and Improvements

Project

Project refresh has been greatly improved.
Project loading times have been improved.
A new UI has been implemented for the ‘Add New Item’ window in the Project with better-looking icons and a side panel that shows a description.

Jobs

Improvements in the Job trace window have been made.

Deployment and Scheduling

Deployment selections have been improved.
Scheduler refresh has been improved to work more efficiently.

This concludes the release notes for Astera 10.3.

Astera 10.2 – Release Notes

Astera is proud to unveil version 10.2 of our industry-leading products offering cutting-edge features and capabilities.

Encompassing numerous areas of operation, the features include:

Microsoft SharePoint functionality, now available via the Cloud Connection object, allows users to easily store and access files, whilst working on the dataflow designer.
A new OCR (Optical Character Recognition) capability that simplifies data extraction and processing.

Additionally, Astera Data Stack's API area has also been upgraded, offering users an even more streamlined experience. Astera is also a one-stop platform within the Astera umbrella that allows users to consume and manage APIs in a code-free environment.

Finally, Astera 10.2 comes with an updated user interface that includes a wealth of UI improvements and bug fixes, enhancing the overall user experience.

Astera's Report Model Component

The Report Model component in Astera 10.2 has introduced, modified, and enhanced some new and existing features to make the process of data extraction even more flexible and user-friendly.

The highlights of this release include:

Verification of created fields
The addition of the pattern bar context menu
An option to change data region type.

Additionally, we have introduced:

Optical Character Recognition (OCR)

which allows users to read data from PDFs having scanned documents. These new additions have made the experience of capturing data easier than ever before.

Extracting data from scanned PDFs via OCR

Astera now provides the functionality to extract data from PDFs that contain scanned documents using Optical Character Recognition.

When provided such a PDF, the tool recognizes it as an image PDF and automatically starts the OCR process. This option has now made scanned documents available for extraction to users, minimizing the effort of manual data entry from such documents. Users can select the Resolution for OCR, allowing them to get the best result for their documents.

Additionally, to ensure correct data extraction, as noise elements can cause erroneous data to be extracted, an Edit Mode is also available for the users to clean and tweak the extracted data.

Edit Mode allows you to deal with the data as a text file and make changes accordingly.

Data Field Verification

Users can now access wildcards and other additional features for patterns in a report model through a context menu by right-clicking on the pattern box.

Append Region and Data Region Interchangeability

Now, if need be, users can change a data region to an append region and vice versa within the Model Layout panel.

This allows users flexibility in changing the model layout as they are creating their extraction template.

Multipart (API Consumption)

The multipart format is a way of structuring data in an API request or response that allows multiple files or data types to be transmitted together as a single unit.

In Astera Data Stack for the consumption side, the multipart format can be used to simplify the process of uploading or downloading large files, or when sending a single request that contains both file data and metadata.

Now, you can consume APIs In Astera using an API client that supports multipart content.

AWS Signature Authentication

AWS Signature authentication is the process of verifying the authenticity of requests made to Amazon Web Services (AWS) using the AWS Signature method.

The AWS Signature authentication method ensures that requests are securely transmitted and that only authorized users can access AWS resources.

NTLM Authentication

NTLM (NT LAN Manager) authentication is a Microsoft proprietary authentication protocol used to authenticate users in a Windows-based network.

NTLM authentication is used in various Microsoft products, including Windows, Internet Explorer, and Microsoft Office.

Raw Preview Request/Response

A raw preview request and response feature allows API developers to view the exact request and response payloads being exchanged between clients and servers in their APIs.

Curl Command

Curl is a command-line tool that can be used to send HTTP requests to APIs and retrieve the respective responses.

It is a simple yet powerful tool that is widely used in API development and management.

API Logging

API logging is the process of keeping track of how an application programming interface (API) is being used.

The logs can be stored locally or in a cloud-based system, where they can be analyzed to provide insights.

Astera Install Manager

The Install Manager installs the dependencies required for running auto-generate layout (AGL) and optical character recognition (OCR) on your system/machine.

The auto-generate layout option allows for the generation of an extraction template at the click of a button. With optical character recognition, scanned pdfs can be processed by Astera to get extracted data.

AGL was introduced in Astera 10.0, and OCR has been introduced in Astera 10.2.

When you install the client and the server, you’ll see two install managers (one for the client and one for the server) installed.

You can learn more about the install manager and its setup here.

UI Fixes and Improvements

Project

Project refresh has been greatly improved.
Project loading times have been improved.
A new UI has been implemented for the ‘Add New Item’ window in the Project with better-looking icons and a side panel that shows a description.

Jobs

Improvements in the Job trace window have been made.

Deployment and Scheduling

Deployment selections have been improved.
Scheduler refresh has been improved to work more efficiently.

This concludes the release Notes for Astera 10.2.

Astera 10.1 - Additional Notes

Functionality and Features of Astera Data Stack

API Publishing

Astera Data Stack lets users design an API flow, which opens with Request and Response objects already present in the flow. These can then be used in a pipeline to the application of the user.

A simple, configured API flow can look like this:

API Consumption

Note: API Consumption is not a new module, it is already present in Astera Data Stack, but it has also been integrated into Astera Data Stack.

When it comes to the Consumption of APIs, Astera lets users configure an API Connection, its corresponding API Client object, along with an API Browser to maintain various imported or custom API Collections.

Astera Data Stack also makes use of various HTTP methods for Consumption such as GET, POST, PUT, DELETE, and PATCH.

Enhancements to API Consumption

Pagination

Automated Read till End Options for Page Number, Offset-limit Paginations.
Cursor Pagination using body fields.
Pagination support added for POST Requests

OAuth2

OAuth2 token caching and auto-refresh.

New features of API Consumption

OAuth2 and E-Tag

OAuth 2 Grant Flow: Authorization Code with PKCE
E-tags for request caching and concurrency control

C-API Connectors

Create and manage Custom C-API Connectors

CAPI connectors library includes:

Zendesk Support
Zendesk Sales CRM
HubSpot CRM
Box API

Other New Features

Auto-redirect API calls.
Use of Default browser for authentication as an alternative to Embedded Browser.
Import postman API collection.

Server Browser Functionalities

The Server Browser in Astera Data Stack lets the user manage and publish APIs once they have been deployed.

A wide range of functionalities are offered in the deployment of APIs, including setting authentication and security functions.

Within the Server Browser, user roles can be assigned, and specific resources can be provided to each role, respective to their area of application. The Server Browser also has the feature to auto-generate a swagger Url.

API Monitoring and Logging

Once the APIs have been deployed, they can also be viewed in a dashboard present in Astera Data Stack.

Finally, users can also utilize Astera for logging and tracing.

Astera 10.1 - Release Notes

The Astera Data Stack has evolved quite a bit in recent years. Astera 10.1, our latest release, brings an armada of new features and enhancements, including some new connectors, GIT functionality, and more!

Astera 10.1 also brings Astera's API Component to the user. With Astera, users can now enjoy the competitive edge of Astera’s powerful ETL engine to create code-free integrations and publish them through natively designed REST APIs to enable real-time data sharing between different departments within an organization, across platforms, or external partners without compromising security.

To further talk about the details of each component, here is what Astera 10.1 has to offer.

New Features

Astera Dataflow Component

MongoDB Connector (Beta)

MongoDB is a source-available cross-platform document-oriented database program. It lets the user store data in flexible, JSON-like documents with optional schemas. This means that fields can vary from document to document and data structure can be changed over time.

With the release of Astera 10.1 comes the MongoDB Connector to the Astera toolbox. This will allow the user to configure a MongoDB Server as a source when creating ETL pipelines on the Astera designer.

The MongoDB Source object can be brought onto the designer through drag and drop. It looks like this:

Once dropped, it can then be configured through the Properties menu by right-clicking on the object and selecting Properties. As you can see below, the connector asks the user to input a lot of different values,

From entering the address of the Primary Server Cluster connection to allowing the object to have a read preference, the MongoDB Source object allows for a range of functionality.

The object is useful for using MongoDB as a source to perform all kinds of transformations as well as loading onto destinations.

Between the configuration of the MongoDB Source is the presence of an in-built filter using which data can be transformed as well,

Parquet Connector (Beta)

Apache Parquet is a column storage file format used by Hadoop systems such as Pig, Spark, and Hive. The file format is language-independent and has a binary representation. Parquet efficiently stores large data sets and has the extension of “.parquet”.

Astera 10.1 also brings the Parquet Source and Parquet Destination connectors to the Astera toolbox. The user can simply drag and drop the respective objects and configure them to read and write from and onto Parquet format files.

Some key features of Parquet, concerning Astera, are,

It encodes the data.
It stores data in a column layout.
It offers the option of compression with a lesser size post-compression.

Parquet File Source

The Parquet File Source object can be found in the Sources section of the Toolbox.

It can then be configured by opening the properties. Simply right-click on the object and select Properties from the context menu.

Parquet File Destination

Similarly, Astera also brings the Parquet File Destination object to the Destination section of the Toolbox.

It lets the user fetch and map data from various kinds of sources that the Parquet format supports.

Once we open the properties of the Parquet File Destination object, we can view the following:

As you can see above, the object even lets the user select from compression methods which include:

Snappy
Gzip

For more information on the Parquet File Source and Parquet File Destination objects, click on the links below,

GIT in Astera Data Stack

GIT is an essential part of data integration and has a high application in the industry. It allows you to create a repository, clone a repository as well as create branches from the master branch to work on. With that essential a tool, Astera 10.1 saw fit to introduce the functionality of GIT in Astera.

Astera provides GIT options where the user can create and work from branches, push and pull from a remote location and make changes, all by their application.

Within Astera, repositories can be cloned and opened:

As you can see above, Astera offers every essential GIT functionality that the user can employ, including Fetch, Merge, and Clone as well as viewing Branch History.

GIT in Astera also lets the user resolve any conflicts that may arise between branches.

Conflict resolution in GIT makes it more feasible for the user to keep track of what branch contains which information.

Full Client with Built-in Server for Centerprise Student

Astera 10.1 brings Centerprise Student, a full client with a built-in server. This means that students, when using Astera, don’t have to install client and server applications separately.

It certainly makes operations more convenient.

Full Client with Built-in Server for Report Models

Astera 10.1 brings the availability of a full client which has a built-in server for Report Models.

This means that users do not have to install the Server and Client applications separately, but rather just the convenient installation of a single server-client integrated application.

RM Enterprise

With the release of Astera 10.1, we bring you Report Models. It lets the user access the complete functionality of Astera's Report Model module, with a separate server for Client-Server communication.

RM Enterprise offers the full services of this module, including defining a report model to create a reusable extraction template.

Astera API Component

Astera is a one-stop platform that allows the user to Consume and Manage APIs in a code-free environment.

With Astera, the user is provided with both a client and a server application to install. Instead of the Integration Server, Astera utilizes an installer called the Astera Server.

Once both are installed, the user can access all features of the product.

API Consumption

Astera 10.1 brings a list of enhancements and new features to the API Consumption.

Additional Notes

For existing customers of Astera 10.0 and earlier versions, a repository upgrade is required, alongside the re-generation of the project “.car” files and re-deployment, to upgrade to 10.1. They can update to 10.1 from any other earlier versions too.

This concludes the round-up of new features and improvements in Astera’s 10.1 line-ups.

Astera 10.0 - Release Notes

Astera’s data management platform has grown by leaps and bounds over the past couple of years, and things are no different this time around. The 10.0 release for Astera is focused on improvements and fixes to further enhance user experience. Moreover, we have added some key new features to the platform, including cloud functionality and a beta version for Auto-Generate Layout (AGL). Read on for more details!

Report Models

What’s New and Improved

The Report Model component in Astera 10.0 has introduced, modified, and enhanced some new and existing features to make the data extraction process even more flexible and user-friendly. The highlights of this release include an AI-enhanced feature of Auto-Generate Layout (Beta) which allows users to create a report model with a single click without having to create data regions or fields manually. Moreover, we’ve also introduced improved options and functionalities for field and region properties. These new additions have made the experience of capturing unstructured data easier than ever before.

Here’s what is new and improved.

Auto-Generate Layout (Beta)

Report Models now provide the functionality of auto-creating the data regions and data fields with just one click. This feature automatically recognizes name-entity pairs and tabular data regions and captures fields in the respective regions. This option makes the extraction process much more efficient as it minimizes the effort of designing report models from scratch. Additionally, to make the extraction template more robust and customized, users can further tweak the auto-generated layout option to fit their requirements.

The following sub-components help in the further customization of the auto-generated layout:

Auto-Generate Table (Beta): Users can now create a tabular data region by selecting the area on the report model’s designer and clicking on this option. The tool automatically identifies the pattern and auto-creates fields within the data region.
Auto-Create Fields (Single Instance) (Beta): Users can create single instance fields by selecting the values on the designer and clicking on this option to automatically create the fields.

Defining Comma Separated Values in Start Position

Users will now have the ability to specify multiple strings to define the Start Position of a data field in the Field Properties panel. This feature is particularly useful when the unstructured document does not have a consistent format or a fixed pattern.

Note: This option is only available when the Follows String In Current Line or the Follows String In Previous Line option is selected.

Case Sensitive and Regular Expression Checkbox

We have introduced two new checkboxes under the Size and Position section of the Field Properties panel:

Case Sensitive: To make the searching for the start position string case sensitive.
Regular Expression: To specify a regular expression as the starting position of the data field.

Note: These checkboxes appear only for the Follows String In Current Line and Follows String In Previous Line options.

Remove Specified String

We have introduced a new checkbox for Specific Strings under the Remove section of the Field Properties panel. Here, you can specify a string that you want to remove from the data field. You can also define multiple strings separated by commas to remove them from all the records for a particular field.

Calculation Box

The UI for the Formula Field has been improved. You can now see the calculation box in the Field Properties panel. You can also make any changes to the expression written inside the box by clicking on the ellipses option on the left. This will take the user to the Calculate Field Properties window where they can choose from the built-in library of functions.

Reorder Field Position

You can re-arrange the positions of data fields within a region under the Model Layout tab using a simple drag-and-drop functionality. This allows users to manage the order of the fields/columns while previewing data or when writing to a destination.

Region End Type - Till Regular Expression/Specific Text

Users can define the endpoint of a region by specifying a particular regular expression or a specific string text. The tool ends the region (by searching the position) on the line where the text/regular expression is found. This allows the user to capture specific areas within an unstructured document in a more robust and flexible manner.

10.0 – Overall Changes and Improvements

The following is an overview of what is being rolled out in the new and improved version of Astera’s platform, including all of its components. This includes features related to cloud accessibility, security, the user interface.

Browsing Files from the Cloud

Users can now browse files from the cloud and write to files that are stored in a cloud destination. 10.0 supports two kinds of cloud connectors, Amazon S3 and Microsoft Azure Blob. There are several ways of connecting to the cloud, such as, by using the Cloud Storage Connection object in a Shared Action (.sact), or by clicking on the Browse Cloud Files option available in certain objects/tabs when selecting a source file.

Password Recovery

In case a user forgets their password, they can now utilize the password recovery feature. After verifying an admin email, users can click on the Forgot Password option at the time of login. An OTP is sent to their verified email, allowing them to reset the password.

UI Changes and Fixes

We’ve made some enhancements to the look and feel of the product. Since it is entirely UI-based, it is of the utmost importance that the UI makes the user feel the ease of access that they expect from it.

Here’s a list of the UI components that have been improved:

Wizards
Job Monitor
Server UI
Job Progress Window
Scheduler UI
Verification Errors/Messages

Security and User Management

The Security and User Management area has also been upgraded:

Issues regarding User Roles have been fixed.
User Credential security has been upgraded.

Server Deployment

Server configuration and deployment, specifically cloud deployment, was a big focus of this release. We’ve fixed several issues that users were encountering when deploying the server on cloud.

Authentication

This concludes the round-up of new features and improvements in Astera’s 10.0 line-up.

SETTING UP

System Requirements

Note: The overall speed and performance of the application depend on the configuration of your machine. More memory and higher processing speed on the system will result in faster performance, especially when transferring large amounts of data as the application takes advantage of the multicore hardware to parallelize operations.

Product Architecture

Astera Data Stack is built on a client-server architecture. The client is the part of the application which a user can run locally on their machine, whereas the server performs processing and querying requested by the client. In simple words, the client sends a request to the server, and the server, in turn, responds to the request. Therefore, database drivers are installed only on the Astera Data Stack server. This enables horizontal scaling by adding multiple clients to an existing cluster of servers and eliminating the need to install drivers on every machine.

The Astera client and server applications communicate on REST architecture. REST-compliant systems, often called RESTful systems, are characterized by statelessness and separate concerns of the client and server, which means that the implementation of both can be done independently if each side knows what format of messages to send to the other. The server communicates with the client using HTTPS commands, which are encrypted using a certified key/certificate signed by an authority. This saves the data from being intercepted by an attacker as the plaintext is encrypted as a random string of characters.

Migrating from Astera 9 to Astera 10

Astera 10 is a major release and not a direct upgrade of Astera 9. This means that migrating from Astera 9 to Astera 10 won’t require you to uninstall Astera 9 since Astera 10 can be installed side by side with Astera 9 on a system. Astera 10 is backward compatible, and therefore, most of the flows created on Astera 9 can run on Astera 10 without modifications. However, the deployment archives (*.car files) created with previous versions are not directly compatible with v10.x. All the deployments must be regenerated and deployed again in the latest version. As with most major releases of any complex software, we recommend that you upgrade your lower-level environment first, so you have an opportunity to test and verify any existing flows. This will make it possible to identify any migration issues you may encounter early in the upgrade cycle.

Note: In this document, we will show how you can migrate from Astera 9 to the all-new Astera 10. However, you can follow these steps to upgrade from version 8 to version 10 as well.

Installing

The installation package for Astera 10 (64-bit) contains two setup (.exe) files:

AsteraDataIntegrator.exe – for Astera client
IntegrationServer.exe – for Astera Integration Server

The setup files for Astera 10 can be downloaded from the following location:

Licensing

Like Astera versions 8 and 9, Astera 10 also comes with a single licensing key (for server) rather than two separate keys for Astera server and client. However, the licensing key for Astera 10 has changed which means you cannot use your Astera 9 key to register Astera 10. The single licensing key for Astera 10 is used to register Astera server and it controls how many clients can connect to the server as well as the functionality available to the connected clients.

Cluster and Server Management in Server Explorer

Astera 10 client can be configured with multiple different servers, however, it can only connect with one server at a time. The jobs scheduled, queued, or running on the other registered server will continue to run without interruption even if the client is not currently connected to the server.

All servers pointing to a single repository database will form a cluster of servers sharing the common workload of queued jobs. A cluster of v10 servers you configure will be up and running and processing jobs in a similar way to v9. You can see which server in the cluster has actually processed a job by right clicking the DEFAULT node in Server Explorer and opening the Job Monitor window.

Repository

Upgrading Existing Repository

Astera provides an option to make an in-place repository upgrade which means that if you don’t wish to set up a new repository when you shift to v10, you can just use your existing one from the previous version and upgrade its cluster database. All the jobs scheduled in the repository used in v9 will appear in v10 after the upgrade.

The following steps explain how to upgrade an existing repository for migration from v9 to v10:

In Astera 10, go to Server > Manage > Upgrade Cluster Database.

Provide credentials of the version 9 repository and click OK:

Now go to Server menu > Manage > Server Properties.

In the Server Connection Properties tab, click on the ellipsis button next to Cluster DB Info.

In the Database Connection window, provide the credentials of the repository you just upgraded and click OK.

After completing this step, we recommend that you restart the Integration service.

Creating a New Repository

If you wish to set up a new repository in Astera 10, go to Server menu > Configure > Build Repository Database and Configure Server.

Manually Migrating Existing Jobs from Astera 9 to Astera 10

In case you chose to build a new cluster repository in Astera 10, your next step is to import all the scheduled jobs created in Astera 9.

For this, open the scheduler in Astera 9 from Server > Job Schedules.

You will see all the scheduled jobs listed in the scheduler. Select the jobs you want to migrate. To select all jobs, you can also use the shortcut key Ctrl+A.

Click on the Export Schedule icon in the Scheduler toolbar.

Point to the directory and folder where you want to save the scheduled jobs and click OK.

Now Astera will create a separate XML file with ‘.Sched’ extension for each scheduled job and save it in the designated folder.

A message window will pop up to notify that your scheduled jobs have been exported successfully. Click OK.

Now you need to import the scheduled jobs in Astera 10 to complete the migration process.

For this, open Astera 10 client and go to Server  >  Job Schedules.

This will open the Scheduler tab. To import the existing jobs, click the ‘Import Schedule’  button in the Scheduler toolbar.

Point the path to the directory where you have saved the schedule files. Select all the scheduled jobs you want to import and click ‘Open.’

You can see that the existing jobs scheduled in Astera 9 have been successfully migrated to Astera 10 and a new Job ID has been assigned to each job.

Migrating Existing Jobs from Astera 9 to Astera 10 Using Pre-Designed Flows

After building a new cluster repository in Astera 10, an alternate way of shifting all the scheduled jobs from Astera 9 to Astera 10 is to use pre-designed flows.

Prior to any upgrade, we strongly recommend that you take a full backup of your repository database. Also, upgrading a lower-level environment first (such as QA, UAT, etc.) is recommended prior to upgrading the production environment. This will make it possible to sort out/resolve any issues before upgrading production.

Steps to Upgrade:

Using Astera 9 client, run the following dataflow to export existing schedules into a comma-delimited file.

Next, open the downloaded file in Astera 9. The dataflow will look like this:

Note: Prior to running the dataflow, you will need to update the Database Table Source object to point to the database where the Astera repository resides. Also, in the properties of Delimited File Destination, set up an appropriate file path where you want your file to be saved.

Once the objects have been configured, run the dataflow. This will create a CSV file containing data of all the schedules that existed in v9.

Next, take note of any existing Cluster Settings. You can check it by right clicking the cluster in the Server Explorer and selecting Cluster Settings in the context menu. These settings will need to be re-configured manually after the upgrade.

It may be helpful to take screenshots of those settings for later reference. The settings include Staging Directory, Purge Job Frequency Options, Email Notification Setup, etc.

Open Astera 10 client. Go to Server menu > Manage > Build Cluster Database. Point it to the database hosting the Astera repository.

Important Note: This will reset the repository.

Use the dataflow below to import the schedules you exported previously in Astera 9.

The dataflow will look like this:

Note: Prior to running the dataflow, in the properties of Delimited File Source, you must import the CSV file you created in v9 that has data of all the schedules. Also, you will need to change the configuration of the Database Table Destination object to point to the database where Astera 10 repository resides.

Once the objects are properly configured, save and run the dataflow.

Next, open Server Explorer, right-click on DEFAULT, and select Cluster Settings.

Now, manually re-configure the relevant settings. You can use the screenshots of Cluster Settings you previously took for reference in version 9. Optionally, you can manually reconfigure the Server Profiles setting if a non-default profile was used prior to the upgrade.

Now, restart the Astera server.

This completes the upgrade.

You can download the flows by clicking on the links below:

Migration Best Practices:

When you are starting the migration process, it is recommended to keep Astera 9 and Astera 10 servers running in parallel. This is to avoid any interruption in jobs that are currently running.
We also recommend that you initiate the migration process with a lower-level, testing environment and then promote your deployment to a higher-level environment as needed. This will help ensure smooth migration process with any flow compatibility issues spotted early in the transition cycle.

Migrating from Astera 7.x to Astera 10

Astera 10 is a major release and not a direct upgrade of Astera 7.6. This means that migrating from Astera 7.x to Astera 10 won’t require you to uninstall Astera 7 since Astera 10 can be installed side by side with Astera 7 on a system. Astera 10 is backward compatible, and therefore, most of the flows created on Astera 7 can run on Astera 10 without modifications. However, as with the most major release of any complex software, we recommend that you upgrade your lower level environment first, so you have an opportunity to test and verify any existing flows. This will make it possible to identify any migration issues you may encounter early on in the upgrade cycle. In this document, we will cover how you can migrate from Astera 7.x to the all-new Astera 10.

Installing

The installation package for Astera 10 (64-bit) contains two setup (.exe) files:

AsteraDataIntegrator.exe – for Astera client, and
IntegrationServer.exe – for Astera Integration Server

The setup files for Astera 10 can be downloaded from the following location:

https://www.astera.com/download-center/

Licensing

Unlike the previous releases of Astera, Astera 10 comes with a single licensing key (for server) rather than two separate keys for Astera server and client. The licensing key for Astera 10 has changed which means you cannot use your Astera 7 key to register Astera 10. The single licensing key for Astera 10 is used to register Astera server and it controls how many clients can connect to the server as well as the functionality available to the connected clients.

Cluster and Server Management in Server Explorer

The Astera 10 client can be configured with multiple different servers, however, it can only connect with one server at a time. The jobs scheduled, queued or running on the other registered server will continue to run without interruption even if the client is not currently connected to the server.

All servers pointing to a single repository database will form a cluster of servers sharing the common workload of queued jobs. A cluster of v10 servers you configure will be up and running and processing jobs in a similar way to 7.6, despite the fact that the v10 client can only connect to and manage one v10 server at a time. You can see which server in the cluster has actually processed a job by right clicking the Cluster and opening Server Jobs window.

Repository

You need to set up a new repository to communicate with the Astera 10 server. While upgrading the previous releases of Astera 7, you would simply go to Server > Upgrade Cluster Database. However, while migrating to Astera 10, you need to set up a repository in a new database from scratch to communicate with the server(s) and store the record of server activity. To set up a repository in Astera 10, go to Server menu > Configure > Build Repository Database and Configure Server.

Manually Migrating Existing Jobs from Astera 7 to Astera 10

Once you have built a cluster repository in Astera 10, the next step is to migrate the scheduled jobs you created in Astera 7.

For this, open the Job Scheduler in Astera 7 from Server > Job Schedules.

You will see all the scheduled jobs listed in the Scheduler. Select the jobs you want to migrate.

Click Export Schedule button in the Scheduler toolbar.

Point to the directory and folder where you want to save the scheduled jobs and click OK.

Now Astera will create a separate XML file with ‘.Sched’ extension for each scheduled job and save it in the designated folder.

A message window will pop up to notify that your scheduled jobs have been successfully. Click OK.

Now you have to import the job files in Astera 10 to complete the migration process. For this, open Astera 10 client and go to Server > Job Schedules.

This will open the Scheduler  tab. To import the existing jobs, click the Import Schedule  button in the Scheduler toolbar.

Point the path to the directory where you have saved the schedule files. Select all the scheduled jobs you want to import and click Open.

You can see that the existing jobs scheduled in Astera 7 have been successfully migrated to Astera 10 and a new Job ID has been assigned to each job.

Migrating Existing Jobs from Astera 7 to Astera 10 Using Pre-Designed Flows

Prior to any upgrade, we strongly recommend that you take a full backup of your repository database. Also, upgrading lower-level environment first (such as QA, UAT, etc.) is recommended prior to upgrading the Production environment. This will make it possible to sort out/resolve any issues before upgrading Production.

Steps to Upgrade:

Using Astera 7 client, run the following dataflow to export existing schedules into a comma-delimited file.

Note: Prior to running the dataflow, you will need to update the Database Table Source object to point to the database where the Astera repository resides.

Take note of any existing Cluster Settings. You can check it by right-clicking the cluster in Server Explorer and selecting Cluster Settings in the context menu. These settings will need to be re-configured manually after the upgrade.

It may be helpful to take screenshots of those settings for later reference. The settings include: Staging Directory, Purge Job Frequency Options, Email Notification Setup, and optionally, Server Profiles if a non-default profile was used prior to the upgrade.

Open Astera 10 client. Go to Server menu > Manage > Build Cluster Database. Point it to the database hosting the Astera repository.

Important Note: This will reset the repository.

Use the dataflow below to import the schedules you exported previously in Astera 7.

The dataflow will look like this:

Note: Prior to running the dataflow, in the properties of Delimited File Source, you must import the CSV file you created in v7 that has data of all the schedules. Also, you will need to change the configuration of the Database Table Destination object to point to the database where Astera 10 repository resides.

Once the objects are properly configured, save and run the dataflow.

Next, open Server Explorer, right-click on DEFAULT and select Cluster Settings.

Now, manually re-configure the relevant settings. You can use the screenshots of Cluster Settings you previously took for reference in version 7. Optionally, you can manually reconfigure the Server Profiles setting if a non-default profile was used prior to the upgrade.

Now, restart the Astera server.

This completes the upgrade.

Migration Best Practices:

When you are starting the migration process, it is recommended to keep Astera 7 and Astera 10 servers running in parallel. This is to avoid any interruption in jobs that are currently running.
We also recommend you initiate the migration process with a lower-level, testing environment, and then promote your deployment to a higher-level environment as needed. This will help ensure smooth migration process with any flow compatibility issues spotted early in the transition cycle.

Connecting to an Astera Server using the Client

How to connect to an Astera Server from the Client Startup Screen

After you have successfully installed Astera client and server applications, open the client application and you will see the Server Connection screen as pictured below.

Enter the Server URI and Port Number to establish the connection.

The server URI will be the IP address of the machine where Astera Integration server is installed.

Server URI: (HTTPS://IP_address)

Note: You can get help of your network administrator to get the IP address of the machine where Astera Integration server is installed. Or you can launch the command prompt and type the command ipconfig to get the IP configuration details for the machine and use that information to provide Server URI.

The default port for the secure connection between the client and the Astera Integration server is 9262.

If you have connected to any server recently, you can automatically connect to that server by selecting that server from the Recently Used drop-down list.

Click Connect after you have filled out the information required.

The client will now connect to the selected server. You should be able to see the server listed in the Server Explorer tree when the client application opens.

To open Server Explorer go to Server > Server Explorer or use the keyboard shortcut Ctrl + Alt + E.

The yellow icon with exclamation mark means that the server is not configured. Before you can start working with the Astera client, you will have to create a repository and configure the server.

How to Connect to a Different Astera Server from the Client

You can connect to different servers right from the Server Explorer window in the Client. Go to the Server Explorer window and click on the Connect to Server icon.

A prompt will appear that will confirm if you want to disconnect from the current Server and establish connection to a different server. Click Yes to proceed.

Note: A client cannot be connected to multiple servers at once.

You will be directed to the Server Connection screen. Enter the required server information (Server URI and Port Number) to connect to the server and click Connect.

If the connection is successfully established, you should be able to see the connected server in the Server Explorer window.

How to Build a Cluster Database and Create Repository

Before you start using the Astera server, a repository must be set up. Astera supports SQL Server and PostgreSQL for building cluster databases, which can then be used for maintaining the repository. The repository is where job logs, job queues, and schedules are kept.

To see these options, go to Server > Configure > Step 1: Build repository database and configure server.

The first step is to point to the SQL Server or PostgreSQL instance where you want to build the repository and provide the credentials to establish the connection.

Note: Astera will not create the database itself, just the tables. A database will have to be created beforehand or an existing database can be used. We recommend Astera to have its own database for this purpose.

Building a Repository on SQL Server

Go to Server > Configure > Step 1: Build repository database and configure server.
Select SQL Server from the Data Provider drop-down list and provide the credentials for establishing the connection.
From the drop-down list next to the Database option, select the database on the SQL instance where you want to host the repository.

Click Test Connection to test whether the connection is successfully established or not. You should be able to see the following message if the connection is successfully established.

Click OK to exit out of the test connection window and again click OK, the following message will appear. Select Yes to proceed.

The repository is now set up and configured with the server to be used.

The next step is to log in using your credentials.

Building a Repository on PostgreSQL

Go to Server > Configure > Step 1: Build repository database and configure server.
Select PostgreSQL from the Data Provider drop-down list and provide the credentials for establishing the connection.
From the drop-down list next to the Database option, select the database on the PostgreSQL instance where you want to host the repository.

Click Test Connection to test whether the connection is successfully established or not. You should be able to see the following message if the connection is successfully established.

Click OK and the following message will appear. Select Yes to proceed.

The repository is now set up and configured with the server to be used.

The next step is to log in using your credentials.

How to Verify Admin Email

Once you have logged into the Astera client, you can set up an admin email to access the Astera server. This will also allow you to be able to use the “Forgot Password” option at the time of log in.

In this document, we will discuss how to verify admin email in Astera.

Verifying Admin Email

1. Once logged in, we will now proceed to enter an email address to associate with the admin user by verifying the email address.

Go to Server > Configure > Step 3: Verify Admin Email

2. Unless you have already set up an email address in the Mail Setup section of Cluster settings, the following dialogue box will pop up asking you to configure your email settings.

Click on Yes to open your cluster settings.

Click on the Mail Setup tab.

3. Enter your email server settings.

4. Now, right-click on the Cluster Settings active tab and click on Save & Close in order to save the mail setup.

5. Re-visit the Verify Admin Email step by going to Server > Configure > Step 3: Verify Admin Email.

This time, the Configure Email dialogue box will open.

6. Enter the email address you previously set up and click on Send OTP.

7. Use the OTP from the email you received and enter it in the Configure Email dialogue and proceed.

On correct entry of the OTP, an email successfully configured dialogue will appear.

8. Click OK to exit it. We can confirm our email configuration by going to the User List.

Right click on DEFAULT under Server Connections in the Server Explorer and go to User List.

9. This opens the User List where you can confirm that the email address has been configured with the admin user.

Using Forgot Password feature

The feature is now configured and can be utilized when needed by clicking on Forgot Password in the log in window.

This opens the Password Reset window, where you can enter the OTP sent to the specified e-mail for the user and proceed to reset your password.

This concludes our discussion on verifying admin email in Astera.

DATAFLOWS

Data Logging and Profiling

Text Processors

Astera 10.3 - Release Notes

Astera 10.3 is here, brimming with excitement and a sea of new features!

With Astera's new AI Automapping feature, field mapping becomes easier, alongside seamless connectivity through providers such as Azure SQL, Google BigQuery, and many more.

Finally, experience the power of the Data Analytics Workbench, and refine your data with ease using Astera Data Prep. Elevate your data game with Astera 10.3!

Windows Authentication

Windows authentication is a security feature that allows users to log into a system using their Windows credentials.

Astera Data Stack leverages this authentication method and provides the option to register new users using Windows authentication through its Server Browser interface.

Modern Authentication in SendMail and ClusterSetting

In response to Microsoft Outlook discontinuing support for Basic Authentication and advocating the use of Modern Authentication, the SendMail object in workflow tasks has been updated.

You can view what Microsoft had to say about this .

It now incorporates a new feature of Modern Authentication, enabling users to seamlessly add authentication credentials directly from within the SendMail object.

This enhancement simplifies the process and ensures compatibility with the latest authentication standards recommended by Microsoft.

The option has also been added in the Cluster Settings as seen below:

Repository Upgrade Utility

Existing Astera customers can easily upgrade to version 10.3.1 by executing an exe. script, which automates the repository update to the latest release.

This streamlined approach enhances the efficiency and effectiveness of the upgrade process, ensuring a smoother transition for users.

Note: This upgrade applies to v10.0 and later ones. Previous versions cannot be upgraded and will need a clean repository as part of the upgrade.

Resource Catalog

The Catalog feature in Astera Data Stack is a centralized repository where one can store artifacts and share them with users as per the application.

For more information on Resource Catalog, please visit the documentation site

Connectors

With the release of Astera 10.3, there have been quite a few new developments in the connectors’ domain.

Microsoft SharePoint

The SharePoint provider is present within the Cloud Connection object dropdown.

Simply drag and drop the Cloud Storage Connection object from the toolbox onto the designer.

Right-click on the object header and select Properties from the context menu.

This will open a new window.

Select Microsoft SharePoint Document Library from the Providers dropdown menu.

Note: The SharePoint connection can be accessed in any object where Cloud files are available.

You can learn more about the SharePoint connector

Azure SQL

Google BigQuery – Preview

This kind of performance is achieved without having to manage any infrastructure and without having to create or rebuild indexes.

Azure Datalake Gen 2 Storage - Preview

The Azure Data Lake Gen 2 provider is present within the Cloud Connector object in Astera.

Cloud Browser Usability

Improvements have been made to the user-friendliness of the Cloud Browser.

Changing the Cloud Connection, browsing cloud folders has been made easier in Astera 10.3. Cloud Browser’s functionality with SharePoint has also been improved.

File System Items Source

In the File System Items Source object, multiple filters are supported for both local and cloud connections.

Report Model

In Astera 10.3, the overall user interface of Astera has been improved and revamped. There are quite a lot of new features being introduced in this version. Let us take a look at them.

Pages to Read Option for PDFs

Auto Create Table (Preview)

Report Model Path Parameterization

AI-Powered Data Extraction

With this new feature, you can streamline your workflow and eliminate the need for manual data extraction. In this document, we will see how to use this feature to create the report models.

To view more information on AI-Powered Data Extraction, click

Extracting data from Scanned PDFs via OCR

Astera now provides the functionality to extract data from PDFs that contain scanned documents using Optical Character Recognition.

Edit Mode

Additionally, to ensure correct data extraction, as noise elements can cause erroneous data to be extracted, an Edit Mode is also available for the users to clean and tweak the extracted data.

Edit Mode allows you to deal with the data as a text file and make changes accordingly.

To learn more about loading PDFs with OCR, click

Data Field Verification

To learn more about Data field verification, click

Users can now access wildcards and other additional features for patterns in a report model through a context menu by right-clicking on the pattern box.

Append Region and Data Region Interchangeability

Now, if need be, users can change a data region to an append region and vice versa within the Model Layout panel.

This allows users flexibility in changing the model layout as they are creating their extraction template. To learn more about this feature, click

Dataprep

ATL Commands and ATL Intellisense

ATL Editor

It hosts all ATL commands and command-related operations. It is a multipurpose artifact that also serves as a preparation process navigation browser.

Data Source Browser

Note: While the Data Source Browser is essential in Dataprep, it is not specific to it.

Dataprep Source and Transformation

Dataprep scripts are reusable and hence can be used as a source as well as a transformation in other artifacts such as dataflow, workflow, and analytics workbench.

Grid View

Dataprep Profile Browser

Operations via drag and drop from Data Source Browser

Within Dataprep, this is a borderless, and headerless 2x2 grid that enhances the experience of data reading, joining, union, and lookup.

Data Models

Astera Data Stack's Data Model component has also introduced a handful of new features for the Astera 10.3 release.

AI Select

Infer Relationships with AI

By leveraging AI algorithms, Astera Data Stack facilitates automated inference of intricate entity relationships for efficient data modeling.

API Flow

It provides a comprehensive set of tools to create, publish, secure, and monitor APIs.

Import Custom CAPIs

This option allows users to easily access and integrate these connectors into their projects, ensuring reliable and efficient connectivity with the associated APIs.

Developer Portal (Beta)

Astera 10.3 brings a beta release of this portal.

Multipart/form-data support for designing file transfer APIs

Multipart/form-data is a MIME (Multipurpose Internet Mail Extensions) media type used for sending binary data or files along with other form fields in HTTP requests.

The Request Publish object now supports this format type, allowing the designing of APIs that function to upload files and download files.

Certificate Store: generate, import, or export certificates

These generate, import, and export options allow us to manage client certificates effectively and integrate them into our security infrastructure.

Show Swagger UI

We have integrated our tool with the Swagger UI component, allowing us to display the Swagger files of deployed APIs in a well-formatted and user-friendly manner.

This integration provides an enhanced user interface and experience for viewing and interacting with the API documentation.

Testflow generation enhancements, from the server browser

We have introduced the option to generate the test flows from the server browser. The test flow can now be generated after the API(s) deployment.

We can either create a test flow for a singleton API for the entire group of APIs.

Multipart (API Consumption)

The multipart format is a way of structuring data in an API request or response that allows multiple files or data types to be transmitted together as a single unit.

Now, you can consume APIs In Astera Data Stack using an API client which supports multipart content.

AWS Signature Authentication

AWS Signature authentication is the process of verifying the authenticity of requests made to Amazon Web Services (AWS) using the AWS Signature method.

The AWS Signature authentication method ensures that requests are securely transmitted and that only authorized users can access AWS resources.

NTLM Authentication

NTLM (NT LAN Manager) authentication is a Microsoft proprietary authentication protocol used to authenticate users in a Windows-based network.

NTLM authentication is used in various Microsoft products, including Windows, Internet Explorer, and Microsoft Office.

Raw preview request/response

A raw preview request and response feature allows API developers to view the exact request and response payloads being exchanged between clients and servers in their APIs.

Copy CURL Command

Curl is a command-line tool that can be used to send HTTP requests to APIs and retrieve the respective responses.

It is a simple yet powerful tool that is widely used in API development and management.

API Logging

API logging is the process of keeping track of how an application programming interface (API) is being used.

The logs can be stored locally or in a cloud-based system, where they can be analyzed to provide insights.

Support

The following feature of Astera Data Stack has had its support added to the product.

XML/Soap APIs in the API Client

Data Analytics Workbench

Analytics Workbench

This artifact streamlines the process, enabling efficient data analysis and visualization in a user-friendly manner.

SMD Dashboard Designer

This intuitive interface enhances the ease and efficiency of model design and visualization tasks.

Linear Regression

Linear regression is a diagnostic and predictive analytics technique, that offers insights into data relationships and makes future predictions based on observed patterns.

In Analytics Workbench, users have the flexibility to choose between four model estimation types:

Ordinary Least Square
Weighted Least Square
Generalized Least Square
Penalized Least Square

Decision Tree

These features enhance the flexibility and customization of decision tree-based classification tasks.

Generalized Linear Model

There are two components of the Generalized Linear Model (GLM).

Family Parameter

Gaussian
Binomial
Poisson
Gamma

Link Function

Identity
Log
Probit
Logit
Square Root
Inverse

A Generalized Linear Model uses a specific combination of the link functions and family parameters for a suitable fit to the data.

Pre-Analytics Testing

The Pre-Analytics Testing object wraps several statistical tests, that a user performs on the data, to determine an accurate statistical model to fit the source data.

Hence, the Pre-Analytics Testing object presents users with established parametric and non-parametric tests to evaluate data on these assumptions.

The Pre-Analytics Testing object hosts the following tests and graphs on each screen,

Heteroscedasticity scatter plot
Multicollinearity bar chart
Outlier Detection box plot
Normality Detection histogram

Correlation Analysis

The strength of the association is measured by computing correlation coefficients. In Analytics Workbench, users have the option to compute different types of correlation coefficients.

Contingency Table

In the analytics workbench, we support multiple contingency types,

Frequency
Probability
Percentage

Distributional Plots

The Distributional Plots object allows users to visualize categorical data variables using mainstream plots such as,

Bar charts,
Pie charts,
Histograms,
Frequency Polygons,
Spike plot

The Distributional Plots object has a drill-down functionality and an interactive interface with several configuration options. It is used to visualize a general profile of the user’s data.

Basic Plots

The Basic Plots object allows a user to understand and analyze their data/transformations through visual graphs such as

Line charts.
Scatter plots.

They provide interactive visuals with growing capabilities and features that enable an in-depth understanding of the nature of a user’s data and its trends.

Predictive Analysis

Once the champion model is selected in the analytics workbench, we can then use this workbench in the dataflow in the predictive analysis object for predictions.

Undo/Redo manager

The Undo/Redo manager allows users to undo/redo any actions that they have performed in the analytics workbench.

Astera Install Manager

The Install Manager installs the dependencies required for running auto-generate layout (AGL) and optical character recognition (OCR) on your system/machine.

Auto-generate layout allows for the generation of an extraction template at the click of a button. With optical character recognition, scanned pdfs can be processed by Astera to get extracted data.

AGL was introduced in Astera 10.0, and OCR has been introduced in Astera 10.2.

When you install the client and the server, you’ll see two install managers (one for the client and one for the server) installed.

You can learn more about the install manager and its setup .

UI Fixes and Improvements

Project

Project refresh has been greatly improved.
Project loading times have been improved.
A new UI has been implemented for the ‘Add New Item’ window in the Project with better-looking icons and a side panel that shows a description.

Jobs

Improvements in the Job trace window have been made.

Deployment and Scheduling

Deployment selections have been improved.
Scheduler refresh has been improved to work more efficiently.

This concludes the release notes for Astera 10.3.

Setting Up Sources

Each source on the dataflow is represented as a source object. You can have any number of sources in the dataflow, and they can feed into zero or more destinations.

The following source types are supported by the dataflow engine:

Flat File Sources:

Tree File Sources:

Database Sources:

Data Model

All sources can be added to the dataflow by picking a source type on the Toolbox and dropping it on the dataflow. File sources can also be added by dragging-and-dropping a file from an Explorer window. Database sources can be drag-and-dropped from the Data Source Browser. For more details on adding sources to the dataflow, see Introducing Dataflows.

Flat File Sources

Delimited File

Adding a Delimited File Source object allows you to transfer data from a delimited file. An example of what a delimited file source object looks like is shown below.

To configure the properties of a Delimited File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.

Fixed-Length File

Adding a Fixed-Length File Source object allows you to transfer data from a fixed-length file. An example of what a Fixed-Length File Source object looks like is shown below.

To configure the properties of a Fixed-Length File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.

Excel File

Adding an Excel Workbook Source object allows you to transfer data from an Excel file. An example of what an Excel Workbook Source object looks like is shown below.

To configure the properties of an Excel Workbook Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.

Tree File Sources

COBOL File

Adding a COBOL File Source object allows you to transfer data from a COBOL file. An example of what a COBOL File Source object looks like is shown below.

To configure the properties of a COBOL File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.

XML/JSON File

Adding an XML/JSON File Source object allows you to transfer data from an XML file. An example of what an XML/JSON File Source object looks like is shown below.

To configure the properties of an XML/JSON File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:

General Properties window:

File Path – Specifies the location of the source XML file. Using UNC paths is recommended if running the dataflow on a server.

Schema File Path – Specifies the location of the XSD file controlling the layout of the XML source file.

Note: Astera can generate a schema based on the content of the source XML file. The data types will be assigned based on the source file’s content.

Optional Record Filter Expression – Allows you to enter an expression to selectively filter incoming records according to your criteria. You can use the Expression Builder to help you create your filter expression. For more information on using Expression Builder, see Expression Builder.

Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.

Database Sources

Database Table

Adding a Database Table Source object allows you to transfer data from a database table. An example of what a Database Table Source object looks like is shown below.

To configure the properties of a Database Table Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:

Source Connection window – Allows you to enter the connection information for your source, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected source.

Pick Source Table window:

Select a source table using the Pick Table dropdown.

Select Full Load if you want to read the entire table.
Select Incremental Load Based on Audit Fields to perform an incremental read starting at a record where the previous read left off.

Incremental load based on Audit Fields is based around the concept of Change Data Capture (CDC), which is a set of reading and writing patterns designed to optimize large-scale data transfers by minimizing database writing in order to improve performance. CDC is implemented in Astera using Audit Fields pattern. The Audit Fields pattern uses create time or last update time to determine the records that have been inserted or updated since the last transfer and transfers only those records.

Advantages

Most efficient of CDC patterns. Only records that were modified since the last transfer are retrieved by the query thereby putting little stress on the source database and network bandwidth

Disadvantages

Requires update date time and/or create date time fields to be present and correctly populated
Does not capture deletes
Requires index on the audit field(s) for efficient performance

To use the Audit Fields strategy, select the Audit Field and an optional Alternate Audit Field from the appropriate dropdown menus. Also, specify the path to the file that will store incremental transfer information.

Where Clause window:

You can enter an optional SQL expression serving as a filter for the incoming records. The expression should start with the WHERE word followed by the filter you wish to apply.

For example, WHERE CreatedDtTm >= ‘2001/01/05’

General Options window:

The Comments input allows you to enter comments associated with this object.

SQL Query

Adding a SQL Query Source object allows you to transfer data returned by the SQL query. An example of what an SQL Query Source object looks like is shown below.

To configure the properties of a SQL Query Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:

Source Connection window – Allows you to enter the connection information for your SQL Query, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected database.

SQL Query Source window:

Enter the SQL expression controlling which records should be returned by this source. The expression should follow SQL syntax conventions for the chosen database provider.

For example, select OrderId, OrderName, CreatedDtTm from Orders.

Source/Destination File Options

Source or Destination is a Delimited File

If your source or destination is a Delimited File, you can set the following properties

First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Field Delimiter - Allows you to select the delimiter for the fields. The available choices are , and . You can also type the delimiter of your choice instead of choosing the available options.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
Quote Char - Allows you to select the type of quote character to be used in the delimited file. This quote character tells the system to overlook any special characters inside the specified quotation marks. The options available are ” and ’.

You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.

Source or Destination is a Microsoft Excel Worksheet

If the Source and/or the Destination chosen is a Microsoft Excel Worksheet, you can set the following properties:

First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Worksheet - Allows you to select a specific worksheet from the selected Microsoft Excel file.

You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.

Source or Destination is a Fixed Length File

If the Source and/or the Destination chosen is a Fixed Length File, you can set the following properties:

First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination , carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary.
Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)

You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.

Using the Length Markers window, you can create the layout of your fixed-length file, The Length Markers window has a ruled marker placed at the top of the window. To insert a field length marker, you can click in the window at a particular point. For example, if you want to set the length of a field to contain five characters and the field starts at five, then you need to click at the marker position nine.

In case the records don’t have a delimiter and you rely on knowing the size of a record, the number in the RecordLength box is used to specify the character length for a single record.

You can delete a field length marker by clicking the marker.

Source or Destination is an XML file

If the source is an XML file, you can set the following options:

Source File Path specifies the file path of the source XML file.
Schema File Path specifies the file path of the XML schema (XSD file) that applies to the selected source XML file.

Record Filter Expression allows you to optionally specify an expression used as a filter for incoming source records from the selected source XML file. The filter can refer to a field or fields inside any node inside the XML hierarchy.

The following options are available for destination XML files.

Destination File Path specifies the file path of the destination XML file.
Encoding - Allows you to choose the encoding scheme for the XML file from a list of choices. The default value is Unicode (UTF-8).
Format XML Output instructs Astera to add line breaks to the destination XML file for improved readability.
Read From Schema File specifies the file path of the XML schema (XSD file) that will be used to generate the destination XML file.
Root Element specifies the root element from the list of the available elements in the selected schema file.
Generate Destination XML Schema Based on Source Layout creates the destination XML layout to mirror the layout of the source.
Root Element specifies the name of the root element for the destination XML file.
Generate Fields as XML Attributes specifies that fields will be written as XML attributes (as opposed to XML elements) in the destination XML file.
Record Node specifies the name of the node that will contain each record transferred.

Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.

Advanced Flat-File Reading Options

When importing from a fixed-width, delimited, or Excel file, you can specify the following advanced reading options:

Header Spans x Rows - If your source file has a header that spans more than 1 row, select the number of rows for the header using this control.

Skip Initial Records - Sets the number of records which you want skipped at the beginning of the file. This option can be set whether or not your source file has a header. If your source file has a header, the first record after the specified number of rows to skip will be used as the header row.

Raw Text Filter - Only records starting with the filter string will be imported. The rest of the records will be filtered.

You can optionally use regular expressions to specify your filter. For example, the regular expression ^[12][4] will only include records starting with 1 or 2, and whose second character is 4.

Note: Astera supports Regular Expressions implemented with the Microsoft .NET Framework and uses the Microsoft version of named captures for regular expressions.

Raw Text Filter setting is not available for Excel source files.

Managing Differences between Source Layout and Source File

If your source is a fixed-length file, delimited file, or Excel spreadsheet, it may contain an optional header row. A header row is the first record in the file that specifies field names and, in the case of a fixed-length file, the positioning of fields in the record.

If your source file has a header row, you can specify how you want the system to handle the differences between your actual source file, and the source layout specified in the setting. Differences may arise due to the fact that the source file has a different field order from what is specified in the source layout, or it may have extra fields compared to the source layout. Conversely, the source file may have fewer fields than what is defined in the source layout, and the field names may also differ, or may have changed since the time the layout was created.

By selecting from the available options, you can have Astera handle those differences exactly as required by your situation. These options are described in more detail below:

Enforce exact header match – Lets Astera Data Stack proceed with the transfer only if the source file’s layout matches the source layout defined in the setting exactly. This includes checking for the same number and order of fields and field names.

Columns order in file may be different from the layout – Lets Astera Data Stack ignore the sequence of fields in the source file, and match them to the source layout using the field names.

Column headers in file may be different from the layout – This mode is used by default whenever the source file does not have a header row. You can also enable it manually if you want to match the first field in the layout with the first field in the source file, the second field in the layout with the second field in the source file, and so on. This option will match the fields using their order as described above even if the field names are not matched successfully. We recommend that you use this mode only if you are sure that the source file has the same field sequence as what is defined in the source layout.

Creating Field Layout

The Field Layout window is available in the properties of most objects on the dataflow to help you specify the fields making up the object. The table below explains the attributes you can set in the Field Layout window.

The table below provides a list of all the attributes available for a particular layout type.

Using Data Formats

Astera supports a variety of formats for each data type. For example, for Dates, you can specify the date as “April 12” or “12-Apr-08”. Data Formats can be configured independently for source and for destination, giving you the flexibility to correctly read source data and change its format as it is transferred to destination.

If you are transferring from a flat file (for example, Delimited or Fixed-Width), you can specify the format of a field so that the system can correctly read the data from that field.

If you do not specify a data format, the system will try to guess the correct format for the field. For example, Astera is able to correctly interpret any of the following as a Date:

April 12

12-Apr-08

04-12-2008

Saturday, 12 April 2008

and so on

Astera comes with a variety of pre-configured formats for each supported data type. These formats are listed in the Sample Formats section below. You can also create and save your own data formats.

To select a data format for a source field, go to Source Fields and expand the Format dropdown menu next to the appropriate field.

Sample Formats

Dates:

Booleans:

Integers:

Real Numbers:

Numeric Format Specifiers: