Text Splitter Transformation

Overview 

The Text Splitter object in Astera is designed to split text into multiple tokens.

Getting the Text Splitter Object 

  1. To get a Text Splitter object from the Toolbox, go to Toolbox > Transformations > Text Splitter. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.

  1. Drag-and-drop the Text Splitter object onto the designer. 

Configuring the Text Splitter Object 

  1. To configure the Text Splitter object, right-click on its header and select Properties from the context menu.  

As soon as you have selected the Properties option from the context menu, a dialog box will open. 

This is where you can configure the properties for the File to Text Converter object. 

  1. The first step is to choose the splitter type from the Splitter Type dropdown. The options available are:

  2. Character

  3. Word

  4. Sentence

  5. Paragraph  

Next, choose the Token Length. For example, setting the Splitter Type as “Word” and Token Length as 1 will split the text at the end of every word.

  1. Once finished, click Next.

  • General Options window: This window consists of options common to most objects in a dataflow

  • Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.

  • Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then feed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect statistics.

  • Enable Sort Optimization: This option enables users to optimize ETL (Extract, Transform, and Load) performance and minimize job execution time. By enabling sort optimization, the system can efficiently choose the most suitable join order and algorithm, leading to faster query processing and resource utilization.

  • The Comments input allows you to enter comments associated with this object.

  1. Once finished, click OK.

The Text Splitter object is now configured according to the changes made. 

For this scenario, we have used the Text Splitter object with the File to Text Converter object. This is done since the Text Splitter object can only take Text as an input.

Right-click on the Text Splitter object’s header and select Preview Output from the context menu. 

View the data through the Data Preview window. 

You have successfully configured your Text Splitter object. The fields from the source object can now be mapped to other objects in the dataflow. 

© Copyright 2023, Astera Software