Astera gives users the functionality to auto-create the data regions and data fields with just one click for the entire document provided or a table. There is an option to Auto-Generate Layout (AGL) in the toolbar on top of the designer, which automatically creates an extraction template for your unstructured document. There are two additional sub-features: Auto Create Table, and Auto Create Fields (Single-Instance), which are used specifically for creating a table region and capturing key-value pairs, respectively.
These features make the extraction process much more efficient as it reduces the effort of designing report models from scratch. To make the extraction template more robust and customized, users can further tweak the auto-generated layout to fit their business requirements.
You can install the dependencies necessary for these features by following the steps mentioned here.
This feature detects key-value pairs and tables in an unstructured document and creates clones of single-instance and collection data regions, respectively. It uses the same techniques of specifying patterns and defining region properties to capture data regions. Furthermore, these data regions have additional properties of their own. Then, it detects useful information within a region, whether it is key-value pairs or tables, and creates data fields.
After loading the unstructured document in a report model, the option to Auto-Generate-Layout will be enabled in the toolbar above the report designer.
By clicking on this icon, a user can initiate the process of automatically creating an extraction template.
Once the report model is created, an Output window will open automatically, showing the details of the data regions created.
Here, we can see that this window shows the following details:
Whether the region created is a Table or Data (key-value) region.
The lines on which the data region spans in the unstructured document.
The number of fields and records within a data region.
The total time duration for the entire extraction process.
Click on the Preview Data option in the secondary toolbar above the designer to check if the data has been extracted correctly.
A Data Preview window will open, displaying all the information extracted using the Auto Generate Layout feature.
Although the Auto-Generate Layout feature can help speed up the process of extraction, it does not completely detect which fields to extract and which ones to leave out according to the user’s requirement. Therefore, users must not exclusively rely on this feature for creating an end-to-end extraction template. The Auto Create Fields (Single-Instance) and Auto Create Table sub-components are specially designed to help the user tweak the report model that has been generated automatically.
Here are some pointers to keep in mind to increase the accuracy of the Auto Generate Layout feature:
Auto Generate Layout supports PDF format files written in English language only.
Files with a simple format, where the key-value pairs are at the top and the table is at the bottom, are ideal. Examples of such files include invoices, purchase orders, etc. The file used in this article serves as a perfect example.
PDF files with at most 50 pages are optimal.
It is recommended for key-value pairs to have a separator (“:”) for accurate detection.
Make sure the file does not have an alignment issue. If it is not completely aligned, it may be due to incorrect scaling factor. You can change the Scaling Factor in the Report Options panel.
The option to Auto Create Fields (Single Instance) in the Region Properties panel allows the user to create specific fields by selecting the relevant data on the report designer. This feature is specifically designed as a sub-component of the AGL feature to extract useful key-value pairs from the unstructured document. That is, if AGL fails to pick up some key-value pair, then the ACF (Single-Instance) option can be used to create those highlighted fields instantly.
Note: The existing Auto-Create Fields (Collection) option works best for collection regions as it accurately picks the value of the headers. The new Auto Create Fields (Single Instance) option is specifically designed for extracting key-value pairs and that can only be done in a single-instance data region.
This feature extracts the field data and determines the field names automatically.
Note: In order to use this option, make sure the correct data region is selected.
Users can auto-create a table by using this simple two-step process:
Select the data you want to extract.
Right-click on the selected data and select Auto Create Table.
This option automatically creates a table region and extracts the data fields within this region.
Here are some pointers to keep in mind to increase the accuracy of the Auto Create Table feature:
Auto Create Table supports PDF format files written in English language only.
For table detection, the best results are achieved with:
Single line rows and headers in the table.
Table with clear boundaries.
Uniform column spacing in tables without boundaries.
Consistent format (identical header name, header span, and field length) if table spans over multiple pages.
This concludes our discussion on the Auto Generate Layout feature. You can also go through this use case to better understand the feature and its usage. Also look at AI Powered Data Extraction in Astera for data extraction from files based on specific layouts.
This document is valid for Astera 10.1. For later versions please refer to the Install Manager.
The Auto-Generate Layout (AGL) feature lets you auto-create all the data regions and data fields with the click of a button. This feature helps save time and effort spent on manually creating a report model from scratch, making the entire extraction process quicker and more efficient.
In this document, we will see how to set up Java and Python to run AGL in Astera.
1. Visit this link and proceed to download the Java 17 x64 installer.
2. Once downloaded, run the installer with the default settings, Java will be installed.
For AGL, we require the 3.9.7 version of Python.
1. Visit this link and proceed to download the 3.9.7 Windows Installer (64-bit).
2. Run the installer by clicking on it. Proceed by selecting the default settings. Python will be installed on to your system.
3. Next, launch the client application and go to Tools > Package Install.
4. Provide the path of the Python39 folder in the File Path textbox. With the default directory installation, the file path would be C:\Users{username}\AppData\Local\Programs\Python\Python39.
Alternatively, you can also click on the folder icon on the right and select the Python39 folder from the directory.
Note: The AppData folder is normally hidden in the directory. Go to view in explorer and unselect Hide Selected Items
5. Click on the Run Py Package Installation option.
6. A green progress bar will show the installation progress. Wait till it is complete.
7. Once the installation is completed, a dialogue box will pop up, notifying you that the package installation was successful. Click OK.
8. Right click on the Package Installation window and select save and close.
You have successfully set up Java and Python for AGL in the client application. After restarting the client, you can proceed to use the feature.
This section talks about the Auto Generate Layout feature in report models and how to use it.
Astera’s AI capture process is the first step towards innovation in the traditional pattern-based template extraction via Artificial Intelligence. The Auto-Generate Layout feature lets you auto-create all the data regions and data fields with the click of a button. This feature helps save time and effort spent on manually creating a report model from scratch, making the entire extraction process quicker and more efficient.
The algorithm first identifies the tables and key-value pairs in the document and sends their meta-data to the report designer in Astera. Information from this detection process is then reverse engineered to create a model layout. It is important to note here that this feature is a design-time convenience to jump-start the extraction process. Users can then fine-tune/tweak the layout which has been created automatically to fit their business requirements.
AGL comes with two sub-components called Auto-Create Fields (Single Instance) and Auto Create Table. These drag-and-click options enable the user to create selected fields and selected tables respectively. In this article, we will see how we can use the AGL feature to create an extraction template within seconds and modify it using its sub-components.
In this case, we have a PDF invoice for Lumos State Solutions opened in a report model.
As we can see, the invoice contains some name value pairs at the top, followed by a table below it.
Notice that the name value pairs are separated by colons and the headers of the table are all in one line. This is an ideal file to extract data from using the Auto-Generate Layout option.
First, we need to create a report model. Go to File > New and select Report Model.
A pop-up window will open, asking you to specify the path of the source file you want to extract data from. Locate the directory containing the source file, select the file, and click Open.
You can now see the source file on the report designer. Click on the Auto Generate Layout option present in the toolbar to auto-create a layout.
A Progress window will pop-up, showing the status of the process. You can stop the process by clicking on Cancel, if you wish. For now, wait for a few seconds.
An Output window will appear at the bottom, showing the details of the regions extracted from the unstructured invoice.
Notice that it only took roughly 5 seconds to generate this layout, which is much faster than manually creating the layout from scratch.
The Model Layout panel shows the regions and data fields extracted in this report model. The Data node contains name-value pairs, whereas the Table region contains the information stored in the table.
The Data_1 node contains only one field, Total. Since we do not want to extract that information in this use-case, let’s delete that region. Right-click on the node and select Delete Region from the context menu.
The Bill Address field has not been captured correctly.
We can fix this by deleting extra fields (Bill_Address_TX and Ash) and increasing the length of the field in the Field Properties panel.
However, not all the information has been captured in the name-value data region. Some lines with useful information (line 20-21) are not even the part of the data region (highlighted in grey). This is because the key-value pairs with no separator in between were not captured by the AGL algorithm. You may refer to the Best Practices section in the UI walkthrough document to understand how to get the most accurate results using the AGL feature.
To fix this issue, we can increase the Line Count of the region in the Region Properties panel.
2.
a. To capture Sales Order and Packing Slip fields, select the data on the designer, right-click on the designer, and select the *Auto Create Field (Single-Instance) option from the context menu.
b. Alternatively, after selecting the data, you may click on the Auto Create Fields (Single-Instance) (Beta) option in the Region Properties panel.
Notice that there is another option, Auto Create Fields (Collection), in the Region Properties panel. You do not have to select the data on the designer to use the latter, this is how the two options differ in usability.
Note: Make sure the correct region is selected while using this option.
An Output window will open showing the status of the process.
We can check the Model Layout panel to see if two new fields have been created.
You can also preview the data using the Preview Data option from the toolbar. A Data Preview window will open showing the extracted regions and fields.
In case you want to extract the table only from the unstructured document, you can use the Auto Generate Table option.
Select the complete data which is a part of the table region. Make sure the first line of the selection contains the headers of the table.
Note: For best results, the headers must all be in one line. If they are not, you may have to specify the number of rows the header spans over in the Region Properties panel.
Note: In case the data for a record spans over multiple lines, increase the region line count and field height from the Region Properties and Field Properties panels.
Right-click on the designer and select the Auto Create Table option from the context menu.
Again, a Progress window will show the progress of the operation and the Output window will show the status of the newly created table region.
You can see all the fields created in the table region in the Model Layout panel.
This concludes the discussion on the auto generate layout, fields and table features in Astera.