Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Yes. Data from unstructured documents can be extracted and utilized in dataflows using the Report Source object. The Report Source object takes the file paths of the unstructured document and report model to deliver the extracted data in the output.
To use the Report Source object, you will need to create a report model first. Click here for more information on creating a Report Model in Astera.
To process files in a directory, the File System Items Source object is used. This object is available in dataflows, workflows and API flows. File System Items Source takes the path of a folder or directory and executes the object connected to its output for every file in the directory, when configured as a ‘loop’.
For processing multiple files in a dataflow, file source objects, such as Delimited File Source, and Excel File Source, can be used as a transformation object, and the input file path can be provided dynamically for all the files in a folder using the File System Item Source object.
Click to learn how File System Items Source is used in a dataflow and workflow.
File System Items Source object allows reading from multiple files. In a dataflow, this object allows reading rows from multiple source files by providing fullpath as input to the source object changed to transformation. These data rows can be transformed as needed and written to a destination.
The filepath can also be written to the destination if required, as shown in the above screenshot.
If we want to write other fields from the File System Items Source to the destination, that can be done by using a Join object.
We will map fields from the source object along with the file path as the first input to the Join object, and fields from the File System object, with the fullpath as the second input of the join object.
We will use filepath from the source and fullpath from the FileSystem as the Join fields and enable the option sort right and left input in the Join properties.
We can map the fields to the destination object. The destination can then get data from the source files along with the associated data from the File System Item source object such as the file name, extension, last access time etc.
Yes. It is possible to split a source file based on the maximum number of records you want in each file. You can use the Sequence Generator and Write to Multiple Files Option in your preferred file destination type to do this.
Create a dataflow as below. A default file path for the output file and the preferred batch size for files are provided as constants. An in-memory sequence generator starting with 0 and step size 1 increment for each record of the Expression object.
The Expression object creates a file path for each record of the source file using the formula shown below:
This formula adds a number at the end of the default output file path, such that the number changes each time the batch size surpasses.
For example, the default output file path is
.........\OrderDetails.xls
For the first 100 records, the NextVal is 0 to 99 and the file path evaluates to be:
.........\OrderDetails0.xls
For the next 100 records, the NextVal is 100 to 199 and the file path evaluates to be: ........\OrderDetails1.xls
And so on.
If we run the dataflow, multiple output files will be created in the same folder as the default output file path, each carrying records up to the batch size.
Yes. Extraction from fillable PDF forms does not require designing a report model. PDF Form Source object available in Dataflow’s toolbox extracts data from fillable PDFs, taking only the file path for the PDF file as input.
Click here for more information on how to use the PDF Form Source object.
During a full load, the entire data is processed on each run whereas during an incremental load, an Audit Field, is specified on the basis of which, the Database source reads the incremented data only. The Incremental and full load options can be found in the properties of the Database Source object. For more information on full and incremental loading options, click here.
A File System Items Source is used to point towards a folder or directory containing multiple files that are to be processed in a flow. In a Dataflow, it is used when we use a source as a transformation to provide a dynamic file path to a source object. It can be either run in singleton mode or in a loop. It requires a file path to the directory where the source files reside. It also provides an option for including items in a subdirectory which, if checked, will include all the files present in the sub-folders.
To learn more about how the File System Items Source works in using sources as transformations, click here.
File System Items Source can be found in Toolbox > Sources > File System Items Source.
The PDF Form Source in Astera is used to extract data from PDF files only, whereas the Report Source in Astera extracts data from unstructured files (including .txt and PDF files) using a report model.
With PDF Form Source, you need to provide the file path in the properties to extract your data.
However, with , first you point the path to a source file and then to a corresponding report model which will serve as an extraction template.
Yes, Astera supports the extraction as well as integration of data from EDI files. There is a dedicated section for EDI data integration in the Toolbox that includes features such as EDI Source File, EDI Destination File, EDI Message Parser, and EDI Serializer. To use these features in a Dataflow, go to Toolbox > EDI.
Yes, Astera supports the extraction of data residing in online resources. There are various built-in features such as Cloud Storage Connection, which allows users to connect to supported cloud providers, such as Microsoft Azure Blob Storage, Amazon S3, Microsoft SharePoint and Google Cloud Storage. After connecting to the provider, the user will be able to browse files available on that cloud platform within the relevant source object.
Other features to extract online data include File Transfer Protocol (FTP) which enables users to access files from a remote directory, Email Source through which we can extract file attachments from emails, Electronic Data Interchange (EDI) allowing users to exchange data with partners over all X12 and EDIFACT formats, and REST API connector which is used for sourcing data from an API.
The Raw Text Filter option is available in the Fixed Length File Source and Delimited File Source properties.
Raw text filter is used to filter out the incoming records which you do not want to process in your Dataflow. This option is also useful in processing files that contain multiple record types. It also provides the flexibility to source selected data by applying filters at the point of extraction, without modifying the original source data in any way. You can choose from the following three options to filter data:
No filter. Process all records: This is the default option, and it processes all the records from the source without filtering them out.
Process if begins with: This option filters out records that start with a certain letter, digit, character, word, numeric value, or phrase.
Process if matches the regular expression: This option is useful if you want to use a regular expression to extract matching records.
Yes, Astera supports data extraction from unstructured docs and text file formats such as text, Excel, PRN, and PDF files through ReportMiner.
To open a Report Model, go to the menu bar and select File > New > Report Model.
Click here for more information on creating a Report Model in Astera.