In Astera, you can manually specify the start position of a selected data field in the Field Properties panel. There are three options available in the drop-down menu to specify the Start Position, as shown below.
The start position can be fixed or can follow a particular string in the current or previous line. To learn more about the Start Position options, click here.
In this document, we will see how we can define multiple strings using comma-separated values to specify the start position of a data field. This feature is useful when the document does not follow a fixed pattern or has an inconsistent format. The purpose of this option will be clear with the use-case given below.
In this case, we have an unstructured document which contains the orders information and the details of items against each order. As we can see below, the order number is followed by different strings, “ORDER NUMBER:”, “ORDER ID:” and “ORDER:”.
Let’s see how we can capture the order number for all the orders by specifying comma-separated strings to define the start position.
1. Here we have loaded the unstructured document in a report model. To learn how to load a document in a report model, click here.
2. Create the data region to capture the field containing the order number by defining an appropriate pattern. Define the pattern as 'ORDER' and rename the data region to Orders in the Region Properties panel.
3. To create the data field, select the data, right-click on it and select the Add Data Field option from the context menu. Rename the field to Order_ID.
4. As we can see in the blue highlighted area, the data in the field is misaligned as it does not reoccur at the same position.
5. To account for this variation in our document, go to Field Properties > Size and Position > Start Position and select Follows String in Current Line option from the drop-down menu. This option will start reading the field data following a particular string.
6. Since our data field follows multiple different strings, specify the comma-separated values in the text box. You may adjust the length of the field to capture the data correctly.
Here, we have a comma as part of the string which precedes the data for a particular field. To cater to this case, we can specify a list of comma-separated strings with double quote text qualifiers.
Here, you can see that a comma character is extracted along with the data because the comma after 'SHIPdate' is read as a separator and not as a part of the string.
To resolve this, enclose the comma-separated strings with text qualifiers (as shown below) so that the comma is read as part of the string.
This is how you can extract data in a particular field if it follows multiple strings using comma separated values in Astera.