Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
In this section we will go through different scenarios that might arise when extracting data and how report models in Astera can cater to them.
Astera allows users to process an input stream of name data and returns it into its constituent elements such as first name, middle name and last name, as parsed output with the help of the built-in auto-parsing feature.
In this document, we will learn how to automatically parse name data fields using the auto-parsing feature in Astera.
In this case, we have some unstructured data stored in a PDF file.
Download the sample PDF file from here.
This file is a customer list report that contains information such as full contact name, full address and account details of customers.
If you look at it, the Account field contains full name with title. We want to extract and parse this information into Suffix, First Name, Middle Name, Last Name and Prefix.
1. Load the unstructured source file in the report model designer.
2. Add a data region. Specify the pattern by typing "ACCOUNT:" in the pattern bar.
3. Highlight the data region after "ACCOUNT:". Right-click on it and select Add Name Field from the context menu.
You can see that the parsed name field components such as NamePrefix, NameFirst, NameMiddle and NameLast have been added to the report model in the Model Layout panel under the Report Browser panel.
4. Preview the extracted data to make sure everything, including the different fields and data, is in place.
The concludes working with the Auto-Parsing feature in Astera.
Apply Pattern to Line is useful when the specified pattern does not capture the first line of the desired data region or when there is some information above the pattern keyword. In that case, we increase Apply Pattern to Line from 0.
1. Open a Report Model in Astera by going to File > New > Report Model.
2. Select your file in the window that opens.
Download the sample txt file from here.
Astera supports extraction of unstructured data from text, EDI, Excel, PRN, and PDF files. In this example, we are extracting data from a text file. Download the sample text file from here.
There are many options available on the Report Options panel to configure how you want Astera to read your file. The reading options depend on the file type. For example, if you have a PDF file, you can select the scaling factor, font, tab size, and passwords.
You can read about these options here.
3. Click Open. A text file containing contact information will open in Report Model designer.
Now that our file has been loaded to Astera, we will create an extraction template.
1. Right-click on the Record node in Model layout under the Report Browser panel. Select Add Data Region.
A pattern box, Region Properties panel, and Pattern Properties panel appears above the Report Model designer.
2. Specify a pattern that Astera can match on your file to capture data. In this example, we want to capture the data region highlighted in yellow.
For this, write "Contact Information" in the pattern box to match it on the file as shown below.
3. Notice that specifying a pattern alone is not enough to capture the entire data region in this example. Hence, we will increase its Line Count to 9.
Observe that the data in lines above the data region (highlighted in grey) is still not captured.
4. To capture data in these lines, increase the Apply Pattern to Line to 1 in the Pattern Properties panel as shown below.
Now, the region is completely captured.
5. Once our data region has been defined, the next step is to create fields. For that, highlight each field area, right-click and select Add Data Field.
Repeat the process to create more data fields and name them as shown below. You can see the layout of the extraction template in the Model Layout panel.
6. Preview data by clicking on the Preview Data icon placed in the toolbar at the top of the designer window. It will first ask you to save the file.
Once saved, a Data Preview window will open displaying a preview of the extracted data.
As the name suggests, the Pattern is a Regular Expression feature in Astera reads the specified pattern as a regular expression. A regular expression is a special text string used to define a search pattern. You can think of regular expressions as wildcards. For example, wildcard notations such as *.txt is used to find all text files in a file manager.
In this case, we have some unstructured data stored in a .txt file.
Download the sample txt file from here.
This file contains contact details of the business dealers of the company.
Here, we want to capture the information including name, company, and state of these dealers along with their phone and fax numbers. Notice that the information in the phone and fax fields is written in a different format than the rest of the data on this file. In order to capture this information, we will use the Pattern is a Regular Expression feature.
Load the unstructured source file in the report model designer window.
Right-click on the Record node in the Model layout panel and select Add Data Region from the context menu.
A pattern-matching box, properties panel, and a Data node is added on the Report Model screen.
Select the Pattern is a Regular Expression icon present in the Pattern Properties panel. This is done so that Astera can capture phone and fax numbers that have been formatted differently, through a regular expression.
Specify the pattern in the form of a regular expression to capture the required data region.
In this case, the first symbol is \ (a backslash) which means starts with, which indicates that the required data starts with the character/symbol used in the following pattern.
You can see that the entire data has been highlighted.
Next, put ( in the pattern box after a backslash, which indicates that the required data starts with an open bracket.
Notice that only the data lines with phone numbers starting with an open bracket have been highlighted. Since some phone numbers do not have an open bracket as the first character, put ? in the pattern box after (, which will indicate that the data may or may not start with an open bracket.
Notice that all the numbers are using x character. So, write x in the pattern box, after the question mark. You can see that all the data lines with phone and fax numbers have been selected now.
To be more precise, you can put + (a plus sign) after x, indicating that the character x can appear more than once.
The pattern in the form of a regular expression has been specified.
To capture the rest of the data, increase the Line Count to 3 and Apply Pattern to Line to 1.
The required data region has been selected. Now, let's create data fields.
Highlight the information inside the data region, right-click on it and select Add Data Field from the context menu. In this case, let's rename it to Name.
Repeat the same process to create more data fields and name them as shown below. You can see the layout of the extraction template in the Model Layout panel.
Preview the data by clicking on Preview Data icon placed in the toolbar at the top of the designer window.
A Data Preview window will open displaying a preview of the extracted data.
To be able export data from report models, first you need to create data regions and data fields. One way to do this is through the Auto Create (Fields or Regions) option. This method takes two lines as a sample of the data that is to be extracted, and creates the desired region or fields with them. This is the simplest, fastest method for creating data regions and fields in a report model.
1. To auto-create a data region, click on the grey column to the left of one of the lines that is to be included in your region.
A green marker will appear which means that the line is to be included.
Note: If you have placed the marker on the wrong line, click on it twice. It will first turn red and then would be removed.
2. Using this method, specify the two lines that you wish to capture. Once the two lines are indicated, Astera creates a data region based on the selections.
Look at the pattern matching bar above the report model designer. It has used Match any digit wildcard to capture all the data fields. This is because the QUANTITY of an item would always be a digit.
Your data region has been created. The data that can be captured is highlighted, and a data region node (with the default name “Data”) will appear underneath Record Node in the Model Layout panel on the left-hand side of your report model designer.
Data regions can be renamed by using the Region Name option in the Region Properties panel. It is recommended to rename every data region and field to avoid confusion.
Your data region has now been created and renamed.
Once your data region has been created, the next step is to create data fields. Data fields show exactly what data will be extracted from the source file. As with the data region, Astera can also automatically create your data fields.
Right-click on your data region and select Auto Create Fields option.
Your data fields have been automatically created and named.
In the figure below, the data fields are nested under the data region in the Model Layout panel. As you click the ITEM data field, you can see a grey highlight around ITEM in the model hierarchy. This is the field that has been selected.
Line Count option in a report model in Astera enables users to specify the number of lines over which a data region spans. This feature is useful when transposing data that appears in rows in an unstructured file and convert it into vertical fields (columns) inside a report model.
In this document, we will explore how Line Count feature helps with the selection of a data region in Astera.
In this case, we have unstructured data in a PDF file.
Download the sample PDF file from the following link:
This file contains a customer list report including their account name, contact, and address details.
You can see that a single record spans over 4 lines on this PDF file. In order to capture this data and place it into different fields, we will use the Line Count feature.
First, let's load this unstructured file onto the report model designer.
Go to File > New > Report Model and select the source file from your directory.
There are many options available in the Report Options panel to configure how Astera reads the unstructured PDF file such as specifying the scaling factor, font, tab size, and password.
You can read about these options here.
Click OK and the source file will open on the designer.
Now that we have loaded the source file in Astera, let's create an extraction template.
1. Right-click on the Record node under the Model layout panel and select Add Data Region from the context menu.
A pattern-matching bar and Region Properties panel will appear and a Data node will be added under the Record node in the Model Layout panel.
2. Specify a pattern with which Astera can match your source file to capture the desired data. You can use an alphabet, character, number, word, or a wild card or any combination of these to define your pattern.
In this case, write "ACCOUNT" in the pattern-matching bar as shown below.
You can see that only one field information is captured through matching pattern. In order to capture the data spanning over 4 lines, we must increase the Line Count to 4.
3. Increase the Line Count value to 4 from the Region Properties panel.
Each data region block contains information of a single record. Let's create data fields.
4. Highlight the data region after "ACCOUNT:", right-click on it and select Add Data Field from the context menu. Rename it to Account.
5. Repeat the process to create more data fields. You can see the layout of the extraction template in the Model Layout panel.
6. You can preview data by clicking on the Preview Data icon placed in the toolbar at the top of the designer window.
A Data Preview window will open displaying a preview of the extracted data.
This concludes using the Line Count option while building an extraction template in Astera.
A floating field pattern in Astera refers to the data points that are scattered throughout the report. The Floating Pattern option in Astera allows you to capture each data field no matter the location. With this option you must first specify a pattern in the pattern-matching bar and then check the Floating Pattern option. When the Floating Pattern option is enabled, it will capture all the lines that match the specified pattern.
The Floating Field option will automatically appear when the Floating Pattern option is checked. This option will capture all the misaligned data fields. It is enabled by default, however, you can disable it if you want the field positions to be fixed.
To work with floating patterns and floating fields, follow the steps mentioned below:
Go to File > New > Report Model and select the report on which you want to apply the Floating Pattern option. Click Open. The source report will open on designer window.
You can download the sample text file from here.
Right click on the Record node present in the Model Layout panel and click Add Data Region.
You will see a yellow bar, which is the pattern-matching bar appear along with Pattern Properties and Region Properties section.
Write pattern in the pattern-matching bar to capture the desired data region in your report. In this case, three match any alphabet and five match any digit wildcards are used to specify the pattern. All data regions and data fields except the misaligned ones will be captured.
(Read: To know more about using wildcards)
Now, select the Floating Pattern icon in the Pattern Properties section. You will see that all the misaligned patterns and fields will be captured.
Right-click on data that you want to add in your data field and click Add Data Field. This will create a sub-node in the model layout. This sub-node is a data field and contains data that you have selected.
In this example, a sub-node for ITEM has been created with all the ITEM data being captured.
Note: The ITEM column in the first data field contains null values under OFFICE CHAIRS. Astera allows you to use any value from “Value if Null” option in the Field Properties. In this case, we have selected the Use From Previous Record option.
Repeat these steps to create other fields.
You can now preview your data by clicking on the Preview Data option in toolbar. You will be prompted to save the report model first. Once saved, the Data Preview window will open.
The Scaling Factor applies to your PDF’s spacing value. This option is located in the Report Options panel. The Scaling Factor depends on the PDF writer that was used to create the file. In many cases, PDFs contain different fonts, headers, and footers. This is why the scaling may vary considerably in different PDFs. These variations are converted into text form based on the software’s standard. This includes an equal size font for each character and equal space to allow for proportionate spacing. In many cases, these font and header variations used by the PDF writer cause the scaling to look off while converting. If this is the case, you can fix it by changing the Scaling Factor option.
Go to File > New > Report Model, select the PDF and click OK.
You can download the sample PDF file from here:
As you load the PDF file on to the Report Model designer, a Report Options panel will appear. Here, you can notice that the data is not properly aligned.
To adjust the scaling of the source file, there is a Scaling Factor option present in the Reading Options section of the Report Options panel.
By default, the Scaling Factor is set to 0. When trying to adjust the scaling of PDF files, you can adjust the Scaling Factor to any value between 0 to 9.
Note: It is important to note that there is no definite rule to follow while specifying the Scaling Factor. It is usually set by trial and error method.
For this PDF source file, the scaling factor is adjusted to 3. You can see that the data in the PDF is now aligned.
This concludes using the Scaling Factor option in Astera.
In Astera, you can manually specify the start position of a selected data field in the Field Properties panel. There are three options available in the drop-down menu to specify the Start Position, as shown below.
The start position can be fixed or can follow a particular string in the current or previous line. To learn more about the Start Position options, click here.
In this document, we will see how we can define multiple strings using comma-separated values to specify the start position of a data field. This feature is useful when the document does not follow a fixed pattern or has an inconsistent format. The purpose of this option will be clear with the use-case given below.
In this case, we have an unstructured document which contains the orders information and the details of items against each order. As we can see below, the order number is followed by different strings, “ORDER NUMBER:”, “ORDER ID:” and “ORDER:”.
Let’s see how we can capture the order number for all the orders by specifying comma-separated strings to define the start position.
1. Here we have loaded the unstructured document in a report model. To learn how to load a document in a report model, click here.
2. Create the data region to capture the field containing the order number by defining an appropriate pattern. Define the pattern as 'ORDER' and rename the data region to Orders in the Region Properties panel.
3. To create the data field, select the data, right-click on it and select the Add Data Field option from the context menu. Rename the field to Order_ID.
4. As we can see in the blue highlighted area, the data in the field is misaligned as it does not reoccur at the same position.
5. To account for this variation in our document, go to Field Properties > Size and Position > Start Position and select Follows String in Current Line option from the drop-down menu. This option will start reading the field data following a particular string.
6. Since our data field follows multiple different strings, specify the comma-separated values in the text box. You may adjust the length of the field to capture the data correctly.
Here, we have a comma as part of the string which precedes the data for a particular field. To cater to this case, we can specify a list of comma-separated strings with double quote text qualifiers.
Here, you can see that a comma character is extracted along with the data because the comma after 'SHIPdate' is read as a separator and not as a part of the string.
To resolve this, enclose the comma-separated strings with text qualifiers (as shown below) so that the comma is read as part of the string.
This is how you can extract data in a particular field if it follows multiple strings using comma separated values in Astera.
Region End Type options are useful for defining where to end a particular data region. This option appears in the Region Details group-box in the Region Properties panel. There are several options to define the end-type of a region.
In this article, we will discuss the use of Till Regular Expression and Till Specific Text options.
Click here to learn more about the rest of the options.
In this case, we have an invoice containing details of the dealer, Global Cars, and the list of vehicles available for purchase. This is what the data looks like:
Here, we want to extract the dealer’s information by defining end-type of data regions using the Till Regular Expression and Till Specific Text options.
1. Go to File > New > Report Model and load the unstructured document in a new report model. This is how the file looks like in the report model designer.
2. Add a new data region to the Model Layout panel by right-clicking on the Record node and selecting the first option from the menu.
3. Define a pattern in the orange bar above the designer to capture the first line of the region. In this case, we are using "DEALER NAME" as the pattern.
Note: Make sure the pattern is vertically aligned with the data on the canvas.
4. Here, only the line containing the pattern is a part of the region. Let’s use the Till Specific Text option to define the end-type of the region in the Region Properties Panel.
We have used "EMAIL" as the specific text to capture all the lines until line 14.
The Line Count option determines the minimum number of lines for the region. Astera looks for the specific text or regular expression after a set number of lines (defined by Line Count) from where the pattern is matched. Note that Line Count takes precedence and determines the end-point of the region when the specific text or regular expression is not found in the document.
5. Now, the entire data region, starting from where the pattern is matched till the specific text, highlighted by grey area, has been captured.
6. Alternatively, you can also specify a regular expression to define the end-type of the region. Here, we have selected Till Regular Expression from the Region End Type drop-down menu and specified a regular expression to define the format of the email.
7. Now, create data fields in this region to capture all the required information.
8. You have successfully created a data region and extracted relevant data fields containing the dealer's information.
This is how you can capture data regions using the Till Specific Text and Till Regular Expression options as the Region End Type.
Pattern Count is the number of patterns that Astera matches on your file to capture a data region. This is useful if more than one pattern is required to identify the beginning of your data region. You can specify up to five patterns in a report model at a time.
In this document, we will explore how the Pattern Count feature helps with the selection of a data region.
Open a Report Model in Astera by going to File > New > Report Model.
Provide the File Path for the unstructured file from your directory.
Astera supports extraction of unstructured data from Excel, csv, text, PRN, PDF, word, rtf and xls files. In this case, we are extracting data from a text file.
Click Open. A text file containing information regarding orders to a fictitious furniture store will open in the report model.
Now that the file is open, we will create an extraction template.
Right-click on the Record node in Model layout under the Report Browser panel and select Add Data Region from the context menu.
A pattern-matching bar and Region Properties panel will appear. And a subnode "Data" is added to the Record node in the Model Layout tab.
2. Specify the pattern that the report model can look for and match in your file to capture data. You can use an alphabet, character, number, word, a wild card or any combination of these to define your pattern.
Astera has built-in wild cards to facilitate region selection.
In this example, we want to capture the data highlighted in yellow. Notice that each item has a specific item code, which we can use as a pattern to extract all the item details.
3. The pattern is a combination of three alphabets, a hyphen, and five digits. You can use the relevant wildcards to specify the pattern. In this case, notice that some item-codes are different from this pattern. The digits in the codes appear before the alphabets. As a result, RUGS has not been captured in the data region.
4. In this scenario, to capture the region completely, we'll specify another pattern. You can specify up to five patterns in a single data region. We'll go to the Pattern Properties panel and increase the Pattern Count to 2. Another pattern bar appears.
5. On the second pattern bar, we'll specify another pattern where the 5 digits come before the 3 alphabets, separated by a hyphen. Now, all the lines with item details have been captured completely in the data region.
6. Once our data region is defined, the next step is to create data fields. To do that, you can highlight each field area, right-click and select Add Data Field.
7. Repeat the process to create more data fields and name them as shown below.
8. Preview data by clicking on the Preview Data icon placed in the toolbar at the top of the designer window.
9. A window will open, asking you to save the file before proceeding. Save the report model at your required path.
10. Once saved, a Data Preview window will open, displaying a preview of the extracted data.
This concludes our discussion on working with an increased Pattern Count in Astera.
Once data fields are created to extract data in Astera, the Field Verification option verifies if the data fields have been captured properly for all the data instances. It does so by checking if any non-blank character is present adjacent to the instances of data fields.
In this document, we will look at how to use Field Verification in a report model.
We have a report model with data regions and data fields defined.
With this sample file, we can see that from row 28 onwards for the Description and Quantity columns, and row 22 onwards for the Line_Total column, the data fields do not properly contain the data.
With files containing much larger amounts of data, it would not be possible to find such discrepancies.
With the data region selected, click on the Start Field Verification icon in the toolbar above the designer.
All the data fields having discrepancies in their data points are marked with a warning sign in the Model Layout panel for the selected data region. Field verification checks for any non-blank character adjacent to an instance for a data field to determine the discrepancy.
Select the data field Description and click on the Previous and Next icons in the toolbar above the designer to traverse among the discrepant instances for that particular data field.
Astera also specifies where the discrepancy has occurred for each instance of the data field, below the designer.
You can click on the Auto adjust all fields icon on the toolbar above the designer to fix the discrepancies in the data fields for the selected region.
This option has adjusted the lengths of the data fields in the selected data region to contain the complete data in each data point.
Note: In cases where two data fields are adjacent to one another, manually adjust the length of the data field to avoid inconsistencies.
Field Verification still gives warnings on row 29 for the Unit_Price and Line_Total fields.
It suggests that there is a non-blank character on the right of the Unit_Price field. In this case, the character is part of the Line_Total field.
We can uncheck the Verify Right Boundary option in the Field Verification section of the Field Properties panel for Unit_Price and vice versa for Line_Total and Astera will ignore the adjacent non-blank characters.
These Verify Left/Right Boundary checkboxes allow the user to ignore the verification in cases when the adjacent character does not belong to the instance of a data field.
This concludes our discussion on how to use Field Verification in Astera.
In this article, we will learn how to create multi-column regions in a report model so that you can read data from multiple columns.
To create multi-column data regions in your report model, go to Files > New > Report Model.
Now, specify the File Path of your report for which you want to build the report model.
Download the sample PDF file from here.
Your source file will be opened in the designer window where you can start building its extraction template.
Right-click on the Record node present in Model Layout panel on the left side of your report model designer and select Add Data Region.
When you click Add Data Region, you will see a subnode added to Record node in the Model Layout panel. The Region Properties panel and pattern-matching bar will also appear.
In the Region Properties section, you will see a check box for Multi-Column data region. Check this option.
You will see another bar appear just below the pattern-matching bar for specifying column boundaries. Click on the second bar above your report where your first column starts. This will create a black dotted line adjacent to your text on the left side. Repeat this for each column. If alignment of your dotted line is wrong, click on it again to delete it.
The arrows show the start point for each column.
You can also adjust the Number of Columns and Page Margin by clicking on the “..“ option next to Multi-Column option.
Starting Points box will show the points from where each column is starting. For instance, in this case, Column 1 is starting from “Point 0”, Column 2 from “Point 26” and Column 3 from “Point 52”.
The number of columns correspond to the number of starting points that you will see on this screen. If there are 4 columns, you will see 4 starting points.
There is an option for automatically calculating columns – Automatically Calculate Columns. Here, you can specify:
- Page Margin – Where the text is starting from
- Number of Columns – Number of columns you want to have in your layout
Astera will automatically calculate and place the margins on your layout.
For instance, there are 3 columns in this report and the Page Margin is 0.
After specifying the Page Margin and Number of Columns, click on Calculate. Your starting points will be automatically calculated.
After creating multi-column data regions, give a suitable pattern in the pattern-matching bar to select your data region.
For this model shown in the screenshot below, write “Name:” in the pattern-matching bar. You can see that it has selected data regions where the pattern matches.
Increase the line count to 7 to capture all data fields.
Highlight the data you want to capture in a field, right click on it and select Add Data Field. You will see that the data in the highlighted data region has been selected and a new field is added to your extraction template. Rename your field using the Name option in the Field Properties panel.
Repeat the steps for all the fields to build your extraction template.
To preview your report model, click on the Preview Data option.
Your final output will look like this:
This concludes working with multi-column data regions in Astera.
In this article, we will see how a user can create a shared connection to access and store files from and to the cloud in Astera. To browse cloud files in a report model, the report model should be a part of a project containing the Shared Action Connection.
Note: This feature of browsing files from the cloud only works if the user has the Cloud Connector add-on.
To create a new cloud connection in a project, you need to add a shared connection within the scope of that project. Once that is done, you can browse for cloud files from the Report Options panel in a report model that is present within the same project.
To add a shared connection, we need to create a new project first. Here, we have named our project RMCloud.cprj.
Next, create a folder, Shared Connections, inside the project. We will save our shared connections (.sact files) in this folder.
Right-click on the folder, and select Add New Item…
Select SharedAction from the list of file types, give it a meaningful name, and click Add.
A .sact file will open. We have saved it with the name SharedAction.sact.
Go to Toolbox > Resources and drag-and-drop the Cloud Storage Connection object onto the designer.
Right-click on the header of the object and select Properties.
A Cloud Connection Properties window will open, where you can configure your Cloud Connection object by providing appropriate credentials. Select the Provider from the drop-down menu. In this case, we will select Amazon S3.
Provide appropriate credentials, Access Key ID and Access Key. Click OK.
Your SharedAction.sact is now configured. You can access files from this cloud connection in the project.
To create a new cloud connection through Report Options, we need to create/add a report model within our project first.
Open the report model. Go to the Report Options panel, and click on the arrow next to the folder icon in the Data File Location group-box. Select the Browse Cloud Files option.
A Browse Files window will open. We can see the Amazon S3 SharedAction connection we created above in this window.
Click on the Add New Connection icon.
Note: This is an alternative way for you to add a shared action (shared cloud connection) to the project you are working in. If you have already created a shared action for the cloud connection you want to use in your project, there is no need to add a new connection.
A window to Add a Cloud Connection will open.
Here, we will select Microsoft Azure Blob Storage as our Provider.
There are two ways to authenticate the connection, using an Access Key or through Shared Access Signature. For now, we will use the Access Key.
Provide relevant credentials by specifying the Storage Account Name and Access Key. Click OK.
The newly created connection is now a part of the project.
Select the file you want to extract data from from the cloud connection and click Open.
Create a data region and relevant fields in the report model to extract relevant data from the source file.
Click on Preview Data to see if the report model is extracting data from the specified fields correctly.
The Data Preview window shows the data extracted from the file located on the cloud.
Here is how we can export our extracted data to an Excel or a Delimited file and save it on the cloud destination.
Go to Report Browser > Data Export Settings and select Create New Export Setting and Run (to Excel) option.
A configuration window will open. Click on the arrow next to the folder icon and select Browse Cloud Files option.
Locate the cloud directory where you want to save the Excel file, give it a meaningful name, and click Open.
The file path is now pointing towards the Amazon S3 cloud provider. Click OK.
The Job Progress window shows the job status and the cloud path where the destination file has been created.
If you want to create a Delimited File Destination, click on this icon in the Data Export Settings window and follow the same steps (step 2 onwards) as mentioned above.
We can access files from the cloud in a dataflow using the Report Source object. For that, the dataflow must be a part of the project or must contain a Cloud Storage Connection object pointing to the cloud location.
In this case, we will extract data from a file saved on a cloud location using a report model from the local directory.
Go to Toolbox > Sources and drag-and-drop the Report Source object onto the designer.
Right-click on top of the Report Model object header and select Properties from the menu.
Click on the arrow button next to the folder icon in the Report Location group-box to browse files from the cloud.
Locate the file in cloud directory and click OK.
The File Path is now pointing towards the Amazon S3 cloud provider. Click on the folder icon to browse the report model from your local directory.
Note: The report model must be a part of your project.
Click OK to close the window.
The Report Source object is now configured with a file path coming in from the Amazon S3 cloud connection and the report model from the local directory.
Right-click on the header and select Preview Output.
The Data Preview window shows that the file is correctly being read by the report model.
We can access files from the cloud in a workflow as well, using the File System object. For that, the workflow must be a part of the same project as your shared connection.
In this case, we want to read all the files from a particular cloud directory (in a workflow) and extract data from those files to write it in an Excel Workbook Destination (in a dataflow).
We will do this by using the Report Source as a Transformation in a dataflow where we can parametrize the path of the file we want to extract data from using the Variables object. This is what the dataflow will look like:
Then, we will call the dataflow in a workflow using a Run Dataflow object and read the files from the cloud using a File System Items Source in a loop. We will send the cloud file path of the source file to the File Path variable in the dataflow as shown below.
Go to Toolbox > Sources and drag-and-drop the File System Items Source object onto the designer.
Right-click on the object header and select Properties.
Click on Browse Cloud Folders to access the cloud folder from where you want to read files.
Here, we have selected the folder containing the source files from our Amazon S3 cloud connection. Click Open.
Our File Path is now pointing towards a cloud file path. Click OK.
Our workflow is now complete. Click on Start Workflow to run the workflow.
The Job Progress window shows the status of the job and the path of the Excel destination.
This is how you can create cloud connections and browse files from the cloud in Astera.
Start Position options are useful for defining the start position of a selected data field. They appear in the Size and Position group-box in the Field Properties panel. There are three options available in the drop-down menu of the Start Position option:
1. Fixed
2. Follows String in Current Line
3. Follows String in Previous Line
In this document, we will discuss how to work with the Follows String In Current Line and Follows String In Previous Line options to define the start position of a data field.
Before creating an extraction template, we need to import the unstructured source file that we want to extract data from in Astera.
The Report Options panel provides the configuration options for loading the unstructured file. You can change the source file by specifying its Path in the Data File Location group-box.
There are also some other configuration options. To learn more about the Report Options, click here.
Add a new data region and specify an appropriate pattern to capture all the lines in that region.
Here, we have defined the pattern as ‘ACCOUNT:’ and the Region End Type is set to Another Region Starts, which means that the current data region will end when another one starts.
We have captured the data region of our interest in the report model. Let’s extract relevant data points by Adding Data Fields.
1. To create the data field, highlight the desired field area, right-click on it and select the Add Data Field option from the context menu.
2. As you can see below, the data is misaligned and therefore is not being captured correctly.
3. To solve this problem, we can use the Follows String in Current Line option from the Start Position drop-down menu in the Field Properties panel to specify a string which defines the start position of this field.
Here, we have defined ‘contact:’ as the string in the textbox and Length Till End Of Line to define the start position. Now, all the data points in this field have been captured completely.
4. Notice that two checkboxes, Case Sensitive and Regular Expression, have appeared in the Field Properties panel.
Case Sensitive: Allows users to search the specified string on a case sensitive basis.
Regular Expression: Allows users to use a regular expression to search the preceding string of the data field.
5. Let’s go ahead and select the Case Sensitive option and see what happens.
The fields are no longer being captured as they are no longer highlighted in blue. This is due to the case sensitive comparison of the two strings ‘CONTACT:’ and ‘contact:’ (they are currently not matching due to the difference in upper and lower cases). Uncheck the Case Sensitive option.
6. Select the Regular Expression checkbox and define a regular expression in the textbox to capture the data points.
The data is now being captured correctly.
7. Next, capture the data points for the field, Address. This time, define the Start Position as Follows String in Previous Line as the address starts from the following line of the string ‘ADDRESS:’.
Notice that we have matched the case of the string specified in the textbox with the case of the string in the document since the Case Sensitive checkbox is selected. Therefore, the data is being captured correctly.
8. Lastly, capture the data in the field Account, by using the Follows String in Current Line option.
9. You can rename the fields in the General section of the Field Properties panel. This is what our Model Layout looks like:
10. Preview the data by clicking on the Preview Data icon to check if all the fields are being extracted correctly from the unstructured document.
The Data Preview window shows the data extracted from the unstructured document.
This is how we can extract data from an unstructured file by specifying Start Position options in Astera.
Wild Cards
Description
It matches any alphabet on the file.
It matches any digit on the file.
It matches any alphabet or digit on the file.
It matches any non-blank character on the file.
It matches any blank character such as line, space, tab etc. on the file.