Report Model Tutorial
Last updated
Last updated
In this tutorial, we will explore the new and improved features of Astera's Report Model component. To extract data from a document, you need to create a report model, customize it using properties and multiple options available, and then select a destination of your choice to write the extracted data to, for instance, an Excel sheet or a database table.
Once you have designed a report model, test it by previewing the data and collecting statistical information.
The extracted data can be massaged further to conform to downstream needs, verified for quality, and sent to the destination of your choice. The created project can be deployed to automate the entire process of data extraction and data validation from documents that have a similar layout.
This tutorial will demonstrate how Astera creates and automates the data extraction process and expedites data preparation with features such as Data Exporting, Workflow Orchestration, Email/FTP/Folder Integration, Data Verification and Scheduling Extraction.
The process involves two important steps as shown in the figure below:
Astera uses a template-based extraction model to extract data from unstructured file sources. The template is to be designed by the users that directs and guides the data extraction process by Astera. We refer to this template as a Report Model or an Extraction Template. We will continue to use these terms throughout this tutorial.
In the next section, we will discuss the anatomy of a report model.
A report model comprises of seven main components that appear in the Model Layout tab as shown below:
A brief description for each of these components is given in the table below:
Feature
Description
Data Region
It is a section (in an unstructured file) that contains desired data points which are to be extracted. The data region can cover any number of lines in a source report and is specified by repeating pattern(s).
Header and Footer Regions
The data region at the top/bottom of a page is referred to as the header/footer region respectively. Add these regions if you are trying to extract information that is repeating on each page in the header/footer sections.
Single Instance Data Region
It is a sub-region that extracts a single set of data points within a data region. In the data preview, it is shown as a single record.
Collection Data Region
It is a sub-region that extracts multiple sets of data points within a data region. It is a collection of records captured in a hierarchical structure. For example, a collection of product items under a single Order ID.
Append Data Region
It is a region that you can add as part of a report model that would otherwise be left out of a data region.
Data Fields
It is the area within a data region containing the information that is to be extracted.
Astera's user-friendly interface enables business users to easily accomplish a wide range of data extraction tasks without relying on or employing expensive IT resources. With its easy-to-use, visual interface, the tool walks you through the process of identifying your desired data points, building the extraction logic, and sending it to the destination of your choice. The screenshot below displays important panels/windows in a Report Model.
The windows and panels shown in the screenshot above are discussed in the subsequent sections.
The unstructured file is loaded onto the Report Model designer where data regions are defined by identifying and specifying repeated patterns in the source report. Data fields are then captured from the defined data regions.
Astera extracts data based on the repeatedly appearing patterns in the source report. You have to identify and specify those repeating patterns in a report model to create data regions.
You have to write the pattern in the Pattern Box (the orange region in the screenshot) to define a data region. A pattern can be any combination of alphabets, words, numeric, or alphanumeric characters. Report Models in Astera have built-in wild cards to define patterns.
Sometimes a single pattern cannot cover and define a data region. In such a case, you can make use of multiple patterns together to define the extraction logic.
Note: Astera supports a maximum of five different patterns for a single data/collection region in a report model.
The toolbar has various options that facilitate the data extraction process. The purpose and functionality of the icons present in the toolbar are discussed below:
The toolbar can be repositioned to any of the sides of the designer as preferred.
Report Browser contains features and layout panels for building extraction models and exporting extracted data.
There are two main tabs in a Report Browser panel:
The Model Layout panel displays the layout of your report model or extraction template. It contains data regions and fields built according to a custom extraction logic.
You can add and delete regions and fields, edit their properties, and export data directly to an Excel sheet, a CSV file, or a database table using the options available in this window.
Extracted data can be directly exported to an Excel sheet, a delimited file, or to a database table such as Microsoft SQL Server, Access, PostgreSQL, MySQL, or ODBC.
This exported data can then be used in various flow documents (a dataflow, a workflow, or a subflow).
For more information on Report Browser, refer to this article.
Region Name – Allows you to change the name of the data region.
Region Type – Tells the type of your region.
The Region Details section lets you further customize your data region.
Region End Type – With the options available in the Region End Type drop-down list, you can specify where you want to end your data region. The options available are as follow:
Line Count – Ends your region after a specified number of lines.
Overlapping Container – Used when there are multiple data regions with overlapping lines.
Container Region – Used when a data region contains a sub-region within its boundaries.
For more information on Region Properties panel, refer to this article over here.
Let’s discuss the options available on this panel:
Case Sensitive Pattern Match – This option matches the data on a case-sensitive basis. For example, ‘Account’ and ‘account’” will be treated as two different patterns by Astera if the icon is selected for Case Sensitive Pattern Match.
Pattern is a Regular Expression – When this option is selected, Astera reads the specified pattern as a regular expression. A regular expression is a special text string used to describe a search pattern. You can think of regular expressions as wildcards. For example, wildcard notations such as *.txt are used to find all text files in a file manager.
Floating Pattern – The Floating Pattern option within the report model component allows you to capture each data field that matches the specified pattern no matter where it is located on the report model’s designer.
Float Fields – The Float Fields option will automatically be highlighted to the right of the Floating Pattern option when it is checked. The Float Fields option ensures that the line spacing also floats and is based on the line used to capture the first field. This option is selected by default but can also be unselected if you would like the field position to remain fixed.
Pattern Count - This option helps you increase the pattern count.
Apply Pattern to Line – This option is useful when the specified pattern does not capture the first line of the desired data region. For instance, when there is some information above the pattern keyword, then we increase Apply Pattern to Line from 0.
Multi-Column - This option is used when you have data residing in multiple columns.
For more information on Pattern Properties panel, refer to this article.
This panel appears when you define a data field within a data region.
The Field Properties panel allows users to customize the defined fields with the help of the following options:
Field Name – Allows users to assign a name to a data field.
Data Type – Provides option to set the data type to string, real, date etc.
Composite Type – Resolves a composite field such as full address or full name into components.
Format – Allows users to change the format of a date field.
Value If Null – Performs actions in cases where the field value is null.
None: This is the default setting. If ‘None’ is selected, the field will remain the same. For example, if the field in question is an empty address field, the cell will be displayed as empty in the preview.
Apply specified default: A string can be typed in for use here, such as ‘N/A.’ When the program finds a null value, the specified value will appear in the previewed cell instead of an empty cell.
Use from previous record: This returns the value of the preceding record in the same field.
Here, you can specify the size and position of a data field.
Start Position - Allows users to manually specify the start position of a data field.
Line/Column - Allows users to define co-ordinates to specify the starting position of the data field.
Length - Allows users to set the length of a data field.
Height - Allows users to set the height of a data field.
For more information on Field Properties panel, refer to this article over here.
Astera provides a complete solution for automated extraction of data. Its user-friendly interface enables business users with little or no programming knowledge to easily accomplish a wide range of data extraction tasks without employing expensive IT resources.
In the following sections, we will learn how to built an extraction template, verify it against a sample data and export it to dedicated destination.
Open a Report Model in Astera by going to File > New > Report Model.
Provide the File Path for the unstructured file from you local or shared directory.
Click OK. The text file containing Orders invoice data will display on report model’s designer.
Astera will use this file to create a report model. Astera supports extraction of unstructured data from text, Excel, RTF, PRN, EDI, or PDF files.
There are many options available on the Report Options panel to configure how you want Astera to read your file. The reading options depend on the file type and content type of your data. For example, if you have a PDF file, you can specify the Scaling Factor, Font, Tab Size, Passwords and Pages to Read.
You can read more about these options here.
In this example, we are extracting Orders invoice data from a text file.
1. Header Region
Let’s take a look at the report document we have opened in the report model editor. At the top of the document is some general information, including company name and report dates. Then we have some account information, followed by order information including individual order items. Notice that this document also has a repeating header on each page. To extract the data from the header, we will need to add a Header to our report model.