Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Report Browser contains features and layout panels for building extraction models and exporting extracted data.
There are two main panels in a Report Browser panel:
Model Layout panel
Data Export Settings panel
The Model layout panel displays the layout of your report model or extraction template. It contains data regions and fields built according to a custom extraction logic.
You can add and delete regions and fields, edit their properties, and export data directly to an excel sheet, a CSV file, or a database table using the options available in this window.
In the figure below, you can see an extraction model that has a hierarchical structure. This model contains a single instance region (Account_Info) as well as collection regions (Orders_Info and Order_Details).
There are icons placed at the top of the Model Layout panel. These icons enable you to perform various tasks. A brief description for each of these icons is given in the table below:
Options provided in the Data Export Settings panel enable users to export data to an Excel sheet, a CSV file, or a database table. The exported data can then be used in various flow documents (a dataflow, a workflow, or a subflow).
The icons placed on the top of the Data Export Settings panel perform different functions related to data export. A brief description for each of these icons is given in the table below:
Learn about how to export data here.
Icon | Name | Purpose & Functionality |
---|---|---|
Icon | Name | Purpose & Functionality |
---|---|---|
This section introduces the interface of a report model in Astera.
Add Single Instance Data Region
Adds a new single instance data region
Add Collection Data Region
Adds a new colleciton data region
Delete Region
To delete a selected data region
Create New Export Setting and Run (to Excel)
To export extracted data to an excel file
Create New Export Setting and Run
To export extracted data to an excel, CSV or a database table.
Preview Selected Export
Displays a preview of the selected export
Run Selected Export
Runs the selected export setting
Edit Export Setting
Allows editing the selected export setting
Remove Export Setting
Removes/deletes the selected export setting
Create Dataflow and Open
Creates a dataflow with the report model source and opens it
To extract data from an unstructured document via template-based extraction, it is important to capture the parts of the document from where data can be extracted. In terms of Astera, the area captured within your source report is called a data region.
Creating data regions is the first step in designing a reusable extraction template. An extraction template is called a report model, and data regions are the backbone of report models as they direct Astera where to extract the data from.
Data regions are defined by specifying a pattern and they may span over any number of lines in a source file as per the use-case.
There are different types of data regions, such as header, footer, single-instance, collection, append, and some special regions like name-entity, and table region.
In this article, we will learn about the types of regions and their purpose in Astera.
As the name explains, header and footer regions are used to extract the information occurring at the beginning or end of a page. The data extracted in these regions can be a date, page number, author/company name, or any other recurring information at the top/bottom of every page.
Note: It is important to note here that a report model is incomplete without a main data region. So, users must create a main data region first, and then create other types of regions.
In the page 1 of the Orders Report (shown below), the first 3 lines can be extracted in a header region as they contain information about the time, date, from and title of the report. Moreover, these lines occur on every page of the report.
The last line contains the page number, repeating on each page. Therefore, page numbers can be extracted in a footer region.
A single-instance data region is a sub-region that extracts a single set of data points.
A single-instance data region is used when the relationship between parent and child data regions is a one-to-one relationship. For example, in the sample data of Orders Report, an order can only be placed by one Account. Therefore, we will use single-instance data region to extract the details of the Account from the data source.
A main data region is always a single-instance data region.
Single-instance data region node has a blue icon in the Model Layout panel.
A collection data region is a sub-region that extracts multiple set of data points.
We use a collection data region when the relationship between parent and child data regions is a one-to-many relationship. For example, in the same example of Orders Report, there can be multiple items in one order. Therefore, we will use a collection data region to extract details of the items ordered.
Collection data regions have a yellow icon in the Model Layout panel.
Append Region is a region where the data extracted is concatenated or linked with the main data region. It is a region that you can add as part of a report model that would otherwise be left out of a data region. It can be before or after the main data region.
In the example below, the grand total can be extracted in an Append region as it needs to be concatenated with the main data region, Account. Notice that the grand total does not contain a pattern or logic like the lines containing the details of the account, therefore it needs to be in a separate append region.
If needed, a data region can be converted to an append data region and vice versa within the Model Layout panel.
Right-click on the data region in the Model Layout panel and select the Change to Append Region option from the context menu.
Once changed, now the Model Layout represents the node as an append region.
The Region Properties panel also changes to reflect the append region.
To change your append region to a data region, right-click on the append data region in the Model Layout panel and select the Change to Data Region option from the context menu.
In addition to the aforementioned data regions, Astera has some additional regions which are created when the layout is auto-generated using the Auto-Generate Layout (AGL) feature. The primary difference between AGL regions and other data regions is that AGL regions cannot be created manually. A user must run AGL for these regions to be created.
A table data region is the first region that is created when AGL is run. A table data region contains multiple sets of data points, just like collection regions. A table region is detected based on the names of the columns of a table. In Astera, column names are called headers. Users can adjust the height of the header after AGL extracts a table region to improve the accuracy of the area detected within the region.
A table region cannot be created independently or manually. It can only be created when an AGL operation detects a table in an unstructured document.
Note: Column headers in a table region are different from a header region mentioned at the beginning of this article.
A name-entity region contains a single set of data points as key-value pairs. Name-entity regions are like the single-instance data regions in theory, meaning, there is a one-to-one relationship between the keys and values.
Name-entity regions cannot be created independently or manually. They are only created when an AGL operation detects key-value pairs in an unstructured document.
In this article, we will explore and discuss the various properties of data regions in a report model. But first, let us explore what a data region is.
A Data Region is an area of data captured within your report. It can cover any number of lines in a source report. This area is defined by specifying a pattern.
Setting up a data region is one of the first steps of extracting data from a report. Data regions are the backbones of Report Models – they direct Astera where to extract data from.
To support the selection of data regions, Astera contains a set of Region Properties.
Move All Field Markers Left One Character: This option moves all the field markers towards the left by one character.
Move All Field Markers Right One Character: This option moves all the field markers towards the right by one character.
Auto Create Fields: This option automatically creates the data fields.
Auto Determine Field Names: This option automatically determines the field names.
Auto Create Fields (Single Instance - Beta): This option automatically creates the data fields in a single instance data region.
Region Name: Allows you to change the name of the data region.
Region Type: Tells the type of your region.
The Region Details group-box lets you further customize your data region.
Region End Type: With the options available in the drop-down list, you can specify where you want to end your data region. The options in this dropdown menu are as follow:
Line Count: Ends your region after a specified number of lines.
Blank Line: Ends your region where a blank line occurs.
Last Field Ends: Ends your region at the last data field within your data region.
Another Region Starts (100 Rows Max): Ends your first data region where the next data region begins. This is used for variable-length data regions.
Till Regular Expression: Ends your region when the specified regular expression occurs.
Till Specific Text: Ends your region when the specific text string occurs.
Line Count: Allows the user to vary the line count according to the requirement of the data region.
Note: The Line Count option appears dynamically, depending on which option is selected from the dropdown menu of Region End Type.
Overlapping Container: Used when there are multiple data regions with overlapping lines.
Container Region: Used when a data region contains a sub-region within its boundaries.
This is how you can define and modify Region Properties in a report model.
In this document, we will discuss the Field Properties Panel and various options it offers in Astera. But first, let us briefly discuss the concept of data fields.
A data field is the area within a data region containing the useful information. It captures data points and writes them in the columns of a table.
Data fields, together with data regions, make up the template for extracting information from unstructured source files.
Once a data field is added within a data region, a Field Properties panel will appear right above the designer, providing options for basic configuration settings.
The Field Properties panel allows users to customize the captured data fields with the help of the following options in the toolbar:
Move field marker left one character: This option moves field marker towards left by one character.
Move field marker right one character: This option moves field marker towards right by one character.
Decrease field length by one character: This option allows user to decrease the field length by one character.
Increase field length by one character: This option allows user to increase the field length by one character.
Auto determine field length: This option allows user to determine the length of selected data field automatically.
Delete field: This option deletes the selected data field.
Name: Allows user to assign a name to a data field. You can type in any name depending on the content of the extracted data points. The assigned field name must be unique and without spaces in between.
Data Type: Provides the option to specify the data type of the field, such as string, real, date, etc.
Note: The data type of every data field appears next to the field name in the Model Layout tab.
Format: Allows user to change the format of a date field.
Composite Type: Resolves a composite field such as full address or full name into parsed components.
Composite data contains details about a record that can be further split into smaller elements. For example, a record about a customer transaction might contain a date field. Date fields are processed by a built-in parser that splits the date into hour, day, month, year etc.
Value If Null: Performs action in cases where the extracted field value is null.
None: This is the default setting. If None is selected, no action is taken to replace the value in an empty cell. For example, if the field has some null records, the cells within the field are displayed as empty in the preview.
Apply Specified Default: A specific string value can be assigned in case the extracted data point is null. When the program finds a null value, the specified value will appear in the output instead of an empty cell.
Use from Previous Record: Returns the value of the previous record in the same data field.
You can find some additional options in this section to clean the extracted data.
Let's discuss the available options.
In this section, you can specify the size and position of a data field.
The Start Position option allows users to define the start position of a data field.
Fixed: Set a fixed start position of a data field from where you select the data field while capturing it.
Follows String in the Current Line: Set the position of a data field to start after the specified string in the same line.
Follows String in the Previous Line: Set the position of a data field to start after the specified string in its preceding line.
Case Sensitive: Allows users to search the specified string on a case sensitive basis.
Regular Expression: Allows users to use a regular expression to search the string followed by the data field.
The user can define multiple strings separated by commas to define the Start Position.
Note: These two options are only enabled when the selected Start Position is either Follows String in the Current Line or Follows String in the Previous Line. Both options are applicable on the string specified in the text box.
Line/Column: There is an invisible grid with coordinates that overlay every report model. These coordinates can be used to specify the start position of a data field in a report model by referencing a certain line and column.
The values for Selection Length, Line, and Column can be found at the bottom-right corner of the report model when a point/area on the source file, opened in the designer, is selected.
Length: This menu allows users to set the length of a data field. You can select from the following options:
Characters: Allows users to set the length of a data field up to a certain number of characters. For example, if the value for this option is set to 5 for a data field, James123 will be extracted as James.
Ends At Two Consecutive Blanks: Ends a data field once it reaches two consecutive blank characters.
Till the End Of Line: Ends a data field on the last character in the line.
Till Specified String: Ends a data field once it reaches a specified string.
Height: This menu allows users to set the height of a data field. You can select from the following options:
Line Count: Set the height of the data field to a certain number of lines.
Till Blank Row: Ends the data field once a blank row is reached.
Ends At Row With Blank First Character: Ends the data field once it reaches a row starting with a blank character (a space).
Ends at Row with Blank Last Character: Ends the data field once it reaches a row ending with a blank character (a space).
Till Region Ends: The data field continues till the end of its data region. This option determines the height of a data field based on the height of the data region.
Note: The default height of data fields is set to Line Count.
This is how you can use the options available in the Field Properties panel to configure the settings that help you capture the data points in a field.
Report Options are used to specify or adjust the settings for the source file used for your report model. This window appears as you load a new source file to build an extraction template.
Data File Location
Path - This option shows the path of the source file you have imported in Astera. You can also change the source file by clicking on the folder icon in this option.
Reading Options
Remove Blank Lines - If selected, this option will remove all the blank lines within your source file.
Depending on the file type, the options change. For PDFs, as shown above, you get the option to change the Scaling Factor, Owner and User Passwords and Pages to Read.
Owner and User Passwords - Passwords needed for reading password protected files.
Pages to Read - This option allows you to select any particular pages to read and not the entire file.
Other Options
Tab Size - Increase or decrease the tab size for your report. The default value is 8 characters.
Displayed Line Count - Specifies the number of lines you want to view in the report viewer.
Font - Here you can change the font of the text that appears within your report. You have the option to select the font size and bold or italicize the text.
Style - Here you can select the theme for the Astera editor. Choose which color scheme you prefer.
Encoding - You can choose the Encoding for your file. The default is Unicode (UTF-8) but you can change this if you have a file with specific characters.
Culture - You can define the Culture and select to use the specified culture.
Scaling Factor - This applies to the PDF's spacing value on reading. To learn more about this, click .
Region Type
Description
Can it be created manually?
Header and Footer
Extracts data occurring at the beginning/end of each page.
Yes
Single-Instance
Extracts single set of data points with a one-to-one relationship between parent and child region.
Yes
Collection
Extracts multiple sets of data points with a one-to-many relationship between parent and child region.
Yes
Append
Extracts data which needs to be concatenated with the main data region.
Yes
Name-Entity
Extracts area with key-value pairs. Clone of Single-Instance region.
No (AGL region)
Table
Extracts area which contains a table based on headers detected. Similar to collection region.
No (AGL region)
In this document, we will discuss the usage and various properties of Patterns in a report model.
A Pattern is a logic or rule based on which we define data regions. You can specify a pattern that Astera matches on the unstructured data file to capture data into a structured format. A pattern can be an alphabet, a character, a number, a word or a combination of these.
We specify a pattern in the pattern box right above Report Model designer.
Astera has built-in wild cards to facilitate region selection by specifying flexible patterns. These wildcards are found in the toolbar located at the top of the Report Model designer.
A short description of each wildcard is given below.
You can also access the wildcards and additional features for patterns in Astera by right-clicking on the pattern box.
A context menu opens with the following options available:
The Pattern Properties panel allows users to specify and modify the properties of any pattern in a report model.
You can specify up to five patterns at a time in a report model by increasing the number of Pattern Count in the Region Properties group-box in the Pattern Properties panel.
Note: This option increases the number of patterns to capture lines within one particular data region.
You can learn more about how the Pattern Count works by clicking here.
The Apply Pattern to line option is useful when the specified pattern does not capture the first line of the desired data region. Simply put, when there is some information above the pattern keyword then we increase Apply Pattern to Line from 0.
You can learn more about how the Apply Pattern to Line option works by clicking here.
Let’s discuss the options available in the Pattern Properties group-box:
This concludes our discussion on the Pattern Properties panel in Report Models.
Icons | Name | Purpose |
---|---|---|
Case Sensitive Pattern Match
This option matches the data on a case-sensitive basis. For example, "Account" and "account" will be treated as two different patterns by Astera if this checkbox is enabled.
Pattern is a Regular Expression
It directs Astera to read the specified pattern as a regular expression. A regular expression is a special text string used to describe a pattern.
Floating Pattern
A floating field pertains to data points that appear in various locations throughout your report. The Floating Pattern option allows you to capture patterns within a report model even when the repeated phrases are misaligned vertically. Once the Floating Pattern option is selected, it will capture all the lines that match the specified pattern.
Floating Fields
The Float Fields option will automatically appear once the Floating Pattern option is selected. The Float Fields option ensures that the line spacing also floats and is based on the line used to capture the first field. This is checked by default but can also be unchecked if you'd like the field position to be fixed.