Smart Document Source
The Smart Document Source object in Astera is designed to parse and map data from documents with varying formats and field structures. It supports formats such as JSON, CSV, Excel, TXT, and delimited files. The object adapts to differences in document layouts at runtime, including changes in field names, data structures, or formats. It also supports exact matching, synonym dictionaries, and AI-based semantic matching to align extracted fields with target layouts.
Getting the Smart Document Source Object
To get a Smart Document Source object, go to Toolbox > Sources > Smart Document Source. If you cannot see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
Drag-and-drop the Smart Document Source object onto the designer.

Configuring the Smart Document Source Object
To configure the Smart Document Source object, right-click on its header and select Properties from the context menu.
The Layout Builder window will open, here you can create an output layout. This will act as the standard output layout that you want all your final data to have.
Once configured, click Next. Properties window will open, there are a few configuration options here

Source File Path: Provide the file path for the source document to establish connectivity to the source data.
Map Options

Mark Unmapped Fields As Error: Marks unmapped fields as errors during the data preview if any fields in the output layout are not mapped to the input fields. By default, unmapped fields are marked as warning.
Write Source File Layout to File: Creates a layout file with the structure of the source file (including field names, headers, and data types).
Write Field Maps to File: Creates a mapping file that details the field names from the input and output layouts, along with the matching steps used.
These files are generated every time a document is processed through the Smart Document Source object. Once you have configured the layouts and mappings, you can convert them to a synonym dictionary for reuse. If no file paths are provided, these files will not be generated.
Smart Match Options
The Smart Document Source object uses several matching strategies to identify fields in the input document and match them with fields in the target layout:
All: Uses all the sequences available until a match is found.
Exact: Searches for an exact, case-insensitive match between input and output fields.
SynonymDictionary: Searches for alternate field names from a pre-defined synonym dictionary. If selected, you must provide the file path for the synonym dictionary. To learn more on how to create a synonym dictionary, click here.
AiSemanticMatch: Uses AI to semantically match the input fields with the appropriate output fields (using the SemanticMatching LLM Template).
Template Name: Select the relevant LLM Template to use for AI Matching
Synonym Dictionary File Path: Provide the file path for the synonym dictionary to be used for matching purposes.
Once the properties are configured, click Next. Config Parameters window will open. Here, you can define parameters that provide easier deployment and better flexibility. Parameters allow for easier configuration changes without having to modify the flow itself.
Note: Any parameters left blank will use the default values assigned in the properties page.
Once you've gone through all the configuration options, click OK to finalize the setup. The Smart Document Source object is now configured and ready to be used in your dataflow.
You can also use the Preview Output option to see the Data Preview.
Last updated
Was this helpful?