Astera 10.3 - Release Notes

Astera 10.3 is here, brimming with excitement and a sea of new features!

With Astera's new AI Automapping feature, field mapping becomes easier, alongside seamless connectivity through providers such as Azure SQL, Google BigQuery, and many more.

Astera’s AI-powered data extraction is a game-changer, while Astera Data Stack's warehousing component introduces AI-Select and more. Unlock the full potential of your APIs with enhanced connectivity options and new features in Astera Data Stack.

Finally, experience the power of the Data Analytics Workbench, and refine your data with ease using Astera Data Prep. Elevate your data game with Astera 10.3!

Dataflows

AI Automapper

In Astera Data Stack, the AI Automapper utilizes semantic relationships to facilitate field mapping. By analyzing the context and meaning of the fields, it intelligently establishes connections and maps them accordingly.

This advanced approach streamlines the process, enhancing accuracy and efficiency in field mapping tasks.

Windows Authentication

Windows authentication is a security feature that allows users to log into a system using their Windows credentials.

Astera Data Stack leverages this authentication method and provides the option to register new users using Windows authentication through its Server Browser interface.

Modern Authentication in SendMail and ClusterSetting

In response to Microsoft Outlook discontinuing support for Basic Authentication and advocating the use of Modern Authentication, the SendMail object in workflow tasks has been updated.

You can view what Microsoft had to say about this here.

It now incorporates a new feature of Modern Authentication, enabling users to seamlessly add authentication credentials directly from within the SendMail object.

This enhancement simplifies the process and ensures compatibility with the latest authentication standards recommended by Microsoft.

The option has also been added in the Cluster Settings as seen below,

Repository Upgrade Utility

Existing Astera customers can easily upgrade to version 10.3.1 by executing an exe. script, which automates the repository update to the latest release.

This streamlined approach enhances the efficiency and effectiveness of the upgrade process, ensuring a smoother transition for users.

Note: This upgrade applies to v10.0 and later ones. Previous versions cannot be upgraded, and will need a clean repository as part of the upgrade.

Resource Catalog

The Catalog feature in Astera Data Stack is a centralized repository where one can store artifacts and share them with users as per the application.

The artifacts within the catalog are stored in a Catalog table. The security aspect of the Resource Catalog lets the user give permission to only the people whom they wish to share the artifacts in the catalog with.

For more information on Resource Catalog, please visit the documentation site here.

Connectors

With the release of Astera 10.3, there have been quite a few new developments in the connectors’ domain.

Microsoft SharePoint

Microsoft SharePoint is an enterprise document management and collaboration platform that helps organizations manage critical content. Its enterprise content management capabilities streamline flows and centralize important content to enhance collaboration.

The SharePoint provider is present within the Cloud Connection object dropdown.

Simply drag and drop the Cloud Storage Connection object from the toolbox onto the designer.

Right-click on the object header and select Properties from the context menu.

This will open a new window.

Select Microsoft SharePoint Document Library from the Providers dropdown menu.

Note: The SharePoint connection can be accessed in any object where Cloud files are available.

You can learn more about the SharePoint connector here.

Azure SQL

The Azure SQL Database is a fully managed platform as a service (PaaS) engine that handles most database management functions such as upgrading, patching, backups, and monitoring without user involvement.

In Astera Data Stack, users can access Azure SQL Databases using Database Table Source or Database Table Destination, DB lookup, SQL Statement Lookup, and Database Write Strategies objects. They can also connect with the Run SQL Script task in a workflow.

Google BigQuery – Preview

Google BigQuery is a serverless, highly scalable data warehouse that comes with a built-in query engine. The query engine can run SQL queries on terabytes of data in a matter of seconds, and on petabytes in minutes.

This kind of performance is achieved without having to manage any infrastructure and without having to create or rebuild indexes.

In Astera Data Stack, users can connect with Google BigQuery as a database source or destination. As a source, flat and hierarchical data both can be read. Only flat data can be written to the Google BigQuery destination.

Azure Datalake Gen 2 Storage - Preview

Azure Data Lake Gen 2 is a cloud-based big data storage and analytics solution provided by Microsoft Azure. It offers scalable and cost-effective storage for structured, semi-structured, and unstructured data.

The Azure Data Lake Gen 2 provider is present within the Cloud Connector object in Astera.

Cloud Browser Usability

Improvements have been made to the user-friendliness of the Cloud Browser.

Changing the Cloud Connection, browsing cloud folders has been made easier in Astera 10.3. Cloud Browser’s functionality with SharePoint has also been improved.

File System Items Source

In the File System Items Source object, multiple filters are supported for both local and cloud connections.

Report Model

In Astera 10.3, the overall user interface of Astera has been improved and revamped. There are quite a lot of new features being introduced in this version. Let us take a look at them.

Pages to Read Option for PDFs

To provide users with greater control over their report editor, we have introduced a new feature called “Pages to Read.” This feature enables users to filter and selectively display specific PDF pages within the report editor, ensuring a focused and efficient document analysis experience.

Auto Create Table (Preview)

The Auto Create Table option empowers users to select and create tables within the document seamlessly. With this addition, users can automate the table creation process, saving valuable time and effort.

Worksheet Parametrization

The Worksheet Parameterization feature empowers users to efficiently extract data from specific worksheets in an Excel file through convenient looping mechanisms. When accessing the configuration parameter screen, the option to define the desired worksheet is displayed. This selection is made flexible and customizable as the worksheet option can be parametrized using variables or a configuration file.

Consequently, when parametrized, the chosen worksheet will override the default one used as the source file. By leveraging this parametrization capability, the Report Source function becomes capable of systematically iterating through each worksheet within the Excel files, enabling targeted data extraction from the worksheet specified in the configuration parameters.

This comprehensive feature enhances data retrieval processes, simplifying and optimizing data handling from multiple worksheets in Excel.

Report Model Path Parameterization

The Report Model Path Parameterization feature introduces enhanced flexibility to dataflows by enabling users to specify a customizable Report Model path in addition to the Report Source path. This crucial enhancement allows for runtime parameterization, granting the ability to dynamically change the Report Model or apply different templates to various report sources.

As a result, automating data extraction from Report Models becomes significantly easier, streamlining the overall data processing and analysis workflows. This feature empowers users with increased control and adaptability, facilitating more efficient and versatile data-driven decision-making processes.

AI-Powered Data Extraction

Astera now uses AI to recommend report model templates, allowing you to automatically generate models for multiple source files at once. By specifying the layout and document type, Astera recommends the most suitable model templates, saving you valuable time and energy when building your data extraction processes.

With this new feature, you can streamline your workflow and eliminate the need for manual data extraction. In this document, we will see how to use this feature to create the report models.

To view more information on AI-Powered Data Extraction, click here.

Extracting data from Scanned PDFs via OCR

Astera now provides the functionality to extract data from PDFs that contain scanned documents using Optical Character Recognition.

When provided such a PDF, the tool recognizes it as an image PDF. However, the Use OCR option must be enabled manually by the users first. This option has now made scanned documents available for extraction to users, minimizing the effort of manual data entry from such documents. Users can select the Resolution for OCR, allowing them to get the best result for their documents.

Edit Mode

Additionally, to ensure correct data extraction, as noise elements can cause erroneous data to be extracted, an Edit Mode is also available for the users to clean and tweak the extracted data.

Edit Mode allows you to deal with the data as a text file and make changes accordingly.

To learn more about loading PDFs with OCR, click here.

Data Field Verification

Report Models now have the functionality to verify if the data fields have been captured properly for all data instances by checking for any non-blank character being adjacent to instances of data fields. This option gives users a one-click check for the data fields they have created.

Additionally, to allow users better visibility of the erroneous fields, navigation between instances of the data field is also provided along with an option to auto-adjust field lengths for all data fields within the selected data region.

To learn more about Data field verification, click here.

Pattern Box Context Menu

Users can now access wildcards and other additional features for patterns in a report model through a context menu by right-clicking on the pattern box.

Append Region and Data Region Interchangeability

Now, if need be, users can change a data region to an append region and vice versa within the Model Layout panel.

This allows users flexibility in changing the model layout as they are creating their extraction template. To learn more about this feature, click here.

Dataprep

Astera Dataprep is a dynamic platform designed for rigorous data cleansing, transformation, and preparation activities. With its user-oriented and preview-focused interface, it offers significant functionality and visibility to streamline the preparation process.

The system allows for quick operations by seamlessly interchanging between scripting and point-and-click methodologies. Serving as a crucial intersection of data engineering and data science, Astera Data Prep is an invaluable tool in any data-driven operation.

Astera Dataprep is a data manipulation tool that offers interactive data correction, ATL scripting with auto-completion, and UI-ATL synchronization. It provides smooth navigation, action history tracking, and comprehensive data quality assurance.

The tool promotes scripting efficiency and reusability with template scripts and supports real-time visual insights for data analysis. It enables rich data transformations, including resolving cardinalities and merging datasets. Astera Dataprep streamlines data preparation, enhances productivity, and ensures data accuracy and integrity.

ATL Commands and ATL Intellisense

Dataprep offers 60+ ATL commands to enable a comprehensive set of data preparation strategies. ATL is a smart scripting language integrated with IntelliSense, where data engineers benefit from script auto-completion, reducing the need for constant reference to documentation. This propriety language can generate code snippets, enabling users to fill in the required fields effortlessly.

This streamlined approach enhances productivity, enabling data engineers to focus on the specific requirements of their data preparation tasks without the burden of repetitive syntax or command structure.

ATL Editor

It hosts all ATL commands and command-related operations. It is a multipurpose artifact that also serves as a preparation process navigation browser.

Data Source Browser

The Data Source Browser, while not a new feature, is a vital part of Dataprep. It hosts all file sources, catalog sources, cloud sources, and project sources to be imported into the Dataprep artifact.

Note: While the Data Source Browser is essential in Dataprep, it is not specific to it.

Dataprep Source and Transformation

Dataprep scripts are reusable and hence can be used as a source as well as a transformation in other artifacts such as dataflow, workflow, and analytics workbench.

Grid View

The grid view hosts a preview-centric interactive grid that automatically updates in real-time to display the transformed data upon each transformation/modification. It is a dynamic grid that provides instant feedback on data quality.

Dataprep Profile Browser

The Dataprep Profile Browser is a side window providing a comprehensive view of the data with graphs, charts, and field-level profile tables. It keeps a check on data health and highlights the presence of invalid entries, missing values, duplicates, etc.

Operations via drag and drop from Data Source Browser

Within Dataprep, this is a borderless, and headerless 2x2 grid that enhances the experience of data reading, joining, union, and lookup.

Dataprep Navigation

Users can navigate smoothly using point-and-click actions in the ATL editor. This includes action history tracking, allowing users to review and backtrack changes made during the data preparation process for transparency, editing, and control.

Data Models

Astera Data Stack's Data Model component has also introduced a handful of new features for the Astera 10.3 release.

AI Select

The AI Select feature in Astera Data Stack assists users in identifying potential Fact and Dimension candidates from their selected entities. To do that, first users can select Build Dimensional Model from the main menu bar.

In cases where users are unsure about classifying entities as Facts or dimensions, this feature leverages AI capabilities to automatically determine the appropriate classification, streamlining the data modeling process.

Infer Relationships with AI

Astera Data Stack incorporates an advanced AI-powered feature enabling users to deduce relationships between entities. This capability extends to recognizing self-referencing relationships as well as associations between the fields of different entities.

By leveraging AI algorithms, Astera Data Stack facilitates automated inference of intricate entity relationships for efficient data modeling.

API Flow

With the release of Astera 10.3, the area of APIs also brings a plethora of new features and improvements with it. Astera is a robust platform that enables seamless integration and efficient management of APIs.

It provides a comprehensive set of tools to create, publish, secure, and monitor APIs.

For more information on Astera, please visit the documentation site here.

Import Custom CAPIs

The API browser provides a convenient option to import pre-built and pre-tested CAPI connectors directly from Astera’s GitHub repository. These connectors are carefully curated and include a comprehensive list of endpoints that have been thoroughly tested and configured for seamless consumer use.

This option allows users to easily access and integrate these connectors into their projects, ensuring reliable and efficient connectivity with the associated APIs.

Developer Portal (Beta)

A developer portal, also known as an API portal or API developer portal, is a website or platform that serves as a central hub for developers who are interested in consuming or integrating with APIs. It provides documentation, and support for developers to understand, explore, and use APIs effectively.

Astera 10.3 brings a beta release of this portal.

Multipart/form-data support for designing file transfer APIs

Multipart/form-data is a MIME (Multipurpose Internet Mail Extensions) media type used for sending binary data or files along with other form fields in HTTP requests.

The Request Publish object now supports this format type, allowing the designing of APIs that function to upload files and download files.

Certificate Store: generate, import, or export certificates

Client certificates are digital certificates that are used by clients (such as web browsers or client applications) to authenticate themselves to a server during a secure communication process, typically over HTTPS (HTTP over SSL/TLS).

These generate, import, and export options allow us to manage client certificates effectively and integrate them into our security infrastructure.

Show Swagger UI

We have integrated our tool with the Swagger UI component, allowing us to display the Swagger files of deployed APIs in a well-formatted and user-friendly manner.

This integration provides an enhanced user interface and experience for viewing and interacting with the API documentation.

Testflow generation enhancements, from the server browser

We have introduced the option to generate the test flows from the server browser. The test flow can now be generated after the API(s) deployment.

We can either create a test flow for a singleton API for the entire group of APIs.

Multipart (API Consumption)

The multipart format is a way of structuring data in an API request or response that allows multiple files or data types to be transmitted together as a single unit.

In Astera for the consumption side, the multipart format can be used to simplify the process of uploading or downloading large files, or when sending a single request that contains both file data and metadata.

Now, you can consume APIs In Astera Data Stack using an API client which supports multipart content.

AWS Signature Authentication

AWS Signature authentication is the process of verifying the authenticity of requests made to Amazon Web Services (AWS) using the AWS Signature method.

This authentication process involves calculating a digital signature for each request using the requester’s access key and secret access key, along with details about the request being made. AWS verifies the signature against the user’s access credentials and grants access to the requested resources if the signature is valid.

The AWS Signature authentication method ensures that requests are securely transmitted and that only authorized users can access AWS resources.

NTLM Authentication

NTLM (NT LAN Manager) authentication is a Microsoft proprietary authentication protocol used to authenticate users in a Windows-based network.

It provides secure authentication by using a challenge-response mechanism, where the server sends a challenge to the client, and the client sends a response that is encrypted using a hash of the user’s password.

NTLM authentication is used in various Microsoft products, including Windows, Internet Explorer, and Microsoft Office.

Raw preview request/response

A raw preview request and response feature allows API developers to view the exact request and response payloads being exchanged between clients and servers in their APIs.

This feature provides a detailed look at the headers, body, and metadata of the HTTP request and response, which can help API developers debug issues, test APIs, and optimize performance. By using raw preview request and response capabilities, API developers can gain a deeper understanding of how their APIs are being used and troubleshoot issues quickly and efficiently.

Copy CURL Command

Curl is a command-line tool that can be used to send HTTP requests to APIs and retrieve the respective responses.

It allows API developers and testers to easily interact with APIs and perform tasks such as testing, debugging, and troubleshooting. Curl supports various HTTP methods such as GET, POST, PUT, and DELETE, and can handle HTTP headers, cookies, and authentication.

It is a simple yet powerful tool that is widely used in API development and management.

API Logging

API logging is the process of keeping track of how an application programming interface (API) is being used.

It helps to understand how often the API is being used, how long each request takes, and any errors that occur. API logging can be used for troubleshooting, monitoring performance, and identifying security threats.

The logs can be stored locally or in a cloud-based system, where they can be analyzed to provide insights.

Support

The following feature of Astera Data Stack has had its support added to the product.

  • XML/Soap APIs in the API Client

Data Analytics Workbench

The Data Analytics component of the Astera Data Stack brings us the Data Analytics Workbench, alongside quite a lot of new features. From Linear Regression to Distribution Plot objects, this component has quite a lot to offer to the users.

Analytics Workbench

The Analytics Workbench is a powerful tool for designing and visualizing data science models and analytical graphs. With its intuitive drag-and-drop interface, users can easily construct complex analytical workflows and explore data patterns.

This artifact streamlines the process, enabling efficient data analysis and visualization in a user-friendly manner.

SMD Dashboard Designer

The SMD designer offers a grid-based interface that facilitates the design and visualization of models using drag-and-drop functionality. Users can easily arrange components, connect data flows, and configure properties to create sophisticated models.

This intuitive interface enhances the ease and efficiency of model design and visualization tasks.

Linear Regression

The Linear Regression object empowers users to establish and model the correlation between a quantitative response variable and one or multiple independent variables. It achieves this by fitting a linear equation to the provided data.

Linear regression is a diagnostic and predictive analytics technique, that offers insights into data relationships and makes future predictions based on observed patterns.

In Analytics Workbench, users have the flexibility to choose between four model estimation types:

  1. Ordinary Least Square

  2. Weighted Least Square

  3. Generalized Least Square

  4. Penalized Least Square

Decision Tree

A Decision Tree is a supervised learning algorithm utilized for classifying a dependent variable based on features within a dataset. In the Analytics Workbench, the Decision Tree object offers users a range of options such as test-train split configurations, null value handling, scaling and normalization techniques, decision tree criteria and splitting methods, as well as pruning strategies.

These features enhance the flexibility and customization of decision tree-based classification tasks.

Generalized Linear Model

The Generalized Linear Model object enables users to define a flexible generalization of linear regression that allows for response variables with error distribution models other than a normal distribution.

There are two components of the Generalized Linear Model (GLM).

  1. Family Parameter

  • Gaussian

  • Binomial

  • Poisson

  • Gamma

  1. Link Function

  • Identity

  • Log

  • Probit

  • Logit

  • Square Root

  • Inverse

A Generalized Linear Model uses a specific combination of the link functions and family parameters for a suitable fit to the data.

Pre-Analytics Testing

The Pre-Analytics Testing object wraps several statistical tests, that a user performs on the data, to determine an accurate statistical model to fit the source data.

Hence, the Pre-Analytics Testing object presents users with established parametric and non-parametric tests to evaluate data on these assumptions.

The Pre-Analytics Testing object hosts the following tests and graphs on each screen,

  1. Heteroscedasticity scatter plot

  2. Multicollinearity bar chart

  3. Outlier Detection box plot

  4. Normality Detection histogram

Correlation Analysis

The Correlation Analysis object enables users to compute Covariance and different types of Correlation such as Heterogenous Correlation, Partial Correlation, and Correlation - Significance Level between data fields.

The strength of the association is measured by computing correlation coefficients. In Analytics Workbench, users have the option to compute different types of correlation coefficients.

Contingency Table

A contingency table, also known as a cross-tabulation or a two-way frequency table, is a table used to summarize the relationship between two categorical variables. It displays the distribution of the data by counting the number of occurrences of each combination of categories.

In the analytics workbench, we support multiple contingency types,

  1. Frequency

  2. Probability

  3. Percentage

Distributional Plots

The Distributional Plots object allows users to visualize categorical data variables using mainstream plots such as,

  • Bar charts,

  • Pie charts,

  • Histograms,

  • Frequency Polygons,

  • Spike plot

The Distributional Plots object has a drill-down functionality and an interactive interface with several configuration options. It is used to visualize a general profile of the user’s data.

Basic Plots

The Basic Plots object allows a user to understand and analyze their data/transformations through visual graphs such as

  • Line charts.

  • Scatter plots.

They provide interactive visuals with growing capabilities and features that enable an in-depth understanding of the nature of a user’s data and its trends.

Predictive Analysis

The Predictive Analysis object helps users predict the behavior of the dependent variable on a given test dataset. To make predictions, the object requires information about the analytical model fitted on a particular training dataset.

Once the champion model is selected in the analytics workbench, we can then use this workbench in the dataflow in the predictive analysis object for predictions.

Undo/Redo manager

The Undo/Redo manager allows users to undo/redo any actions that they have performed in the analytics workbench.

Astera Install Manager

The Install Manager installs the dependencies required for running auto-generate layout (AGL) and optical character recognition (OCR) on your system/machine.

Auto-generate layout allows for the generation of an extraction template at the click of a button. With optical character recognition, scanned pdfs can be processed by Astera to get extracted data.

AGL was introduced in Astera 10.0, and OCR has been introduced in Astera 10.2.

When you install the client and the server, you’ll see two install managers (one for the client and one for the server) installed.

You can run this from here directly or go inside the client, when it runs under an Admin account, and go to Tools > Run Install Manager and it would run the install manager to install the dependencies.

Note: If the client and server are on the same machine, then you need to run only one Install Manager out of the two (client and server). However, if the client and server are on different machines, then you’ll have to run the install manager for the client and the server on their respective machines.

You can learn more about the install manager and its setup here.

UI Fixes and Improvements

Project

  • Project refresh has been greatly improved.

  • Project loading times have been improved.

  • A new UI has been implemented for the ‘Add New Item’ window in the Project with better-looking icons and a side panel that shows a description.

Jobs

  • Improvements in the Job trace window have been made.

Deployment and Scheduling

  • Deployment selections have been improved.

  • Scheduler refresh has been improved to work more efficiently.

This concludes the release notes for Astera 10.3.

Last updated

© Copyright 2023, Astera Software