Astera gives users the functionality to auto-create the data regions and data fields with just one click for the entire document provided or a table. There is an option to Auto-Generate Layout (AGL) in the toolbar on top of the designer, which automatically creates an extraction template for your unstructured document. There are two additional sub-features: Auto Create Table, and Auto Create Fields (Single-Instance), which are used specifically for creating a table region and capturing key-value pairs, respectively.
These features make the extraction process much more efficient as it reduces the effort of designing report models from scratch. To make the extraction template more robust and customized, users can further tweak the auto-generated layout to fit their business requirements.
You can install the dependencies necessary for these features by following the steps mentioned here.
This feature detects key-value pairs and tables in an unstructured document and creates clones of single-instance and collection data regions, respectively. It uses the same techniques of specifying patterns and defining region properties to capture data regions. Furthermore, these data regions have additional properties of their own. Then, it detects useful information within a region, whether it is key-value pairs or tables, and creates data fields.
After loading the unstructured document in a report model, the option to Auto-Generate-Layout will be enabled in the toolbar above the report designer.
By clicking on this icon, a user can initiate the process of automatically creating an extraction template.
Once the report model is created, an Output window will open automatically, showing the details of the data regions created.
Here, we can see that this window shows the following details:
Whether the region created is a Table or Data (key-value) region.
The lines on which the data region spans in the unstructured document.
The number of fields and records within a data region.
The total time duration for the entire extraction process.
Click on the Preview Data option in the secondary toolbar above the designer to check if the data has been extracted correctly.
A Data Preview window will open, displaying all the information extracted using the Auto Generate Layout feature.
Although the Auto-Generate Layout feature can help speed up the process of extraction, it does not completely detect which fields to extract and which ones to leave out according to the user’s requirement. Therefore, users must not exclusively rely on this feature for creating an end-to-end extraction template. The Auto Create Fields (Single-Instance) and Auto Create Table sub-components are specially designed to help the user tweak the report model that has been generated automatically.
Here are some pointers to keep in mind to increase the accuracy of the Auto Generate Layout feature:
Auto Generate Layout supports PDF format files written in English language only.
Files with a simple format, where the key-value pairs are at the top and the table is at the bottom, are ideal. Examples of such files include invoices, purchase orders, etc. The file used in this article serves as a perfect example.
PDF files with at most 50 pages are optimal.
It is recommended for key-value pairs to have a separator (“:”) for accurate detection.
Make sure the file does not have an alignment issue. If it is not completely aligned, it may be due to incorrect scaling factor. You can change the Scaling Factor in the Report Options panel.
The option to Auto Create Fields (Single Instance) in the Region Properties panel allows the user to create specific fields by selecting the relevant data on the report designer. This feature is specifically designed as a sub-component of the AGL feature to extract useful key-value pairs from the unstructured document. That is, if AGL fails to pick up some key-value pair, then the ACF (Single-Instance) option can be used to create those highlighted fields instantly.
Note: The existing Auto-Create Fields (Collection) option works best for collection regions as it accurately picks the value of the headers. The new Auto Create Fields (Single Instance) option is specifically designed for extracting key-value pairs and that can only be done in a single-instance data region.
This feature extracts the field data and determines the field names automatically.
Note: In order to use this option, make sure the correct data region is selected.
Users can auto-create a table by using this simple two-step process:
Select the data you want to extract.
Right-click on the selected data and select Auto Create Table.
This option automatically creates a table region and extracts the data fields within this region.
Here are some pointers to keep in mind to increase the accuracy of the Auto Create Table feature:
Auto Create Table supports PDF format files written in English language only.
For table detection, the best results are achieved with:
Single line rows and headers in the table.
Table with clear boundaries.
Uniform column spacing in tables without boundaries.
Consistent format (identical header name, header span, and field length) if table spans over multiple pages.
This concludes our discussion on the Auto Generate Layout feature. You can also go through this use case to better understand the feature and its usage. Also look at AI Powered Data Extraction in Astera for data extraction from files based on specific layouts.