Sort Transformation
Sort Transformation can be used to sort incoming data streams. In addition, the Sort Transformation distinct option can be used to remove duplicate values from the input.
Sort Transformation is a blocking transformation meaning that the input records are accumulated until the end of input. Blocking transformations affect the performance of overall dataflow because subsequent steps cannot execute until all the records have been received and processed by the blocking transformation.
Sort Transformation uses storage on the server for temporary data during sorting. The server must have enough capacity to store the entire data set and index.
Usage
To add a Sort transformation, drag the Sort object from the Transformations group in the Flow toolbox and drop it on the dataflow.
To configure the properties of a Sort object after it was added to the dataflow, right-click on it and select Properties from the context menu. The following properties are available:
Meta Object Builder screen: This allows you to add or remove fields in the field layout, as well as select their data type. The fields added in Meta Object Builder will show in the Input node inside the object box.
Sort Transformation Properties screen:
Return Distinct Values Only – check this option to remove duplicate values from the output.
Treat Null as the Lowest Value – turn this option on to have a Null value returned first in the Ascending sort order, and conversely, have the Null value returned last in the Descending sort order.
Case Sensitive – turn this option on if you require case-sensitive comparison for strings.
Note: Only the combination of key fields, that you select in the grid, is checked for duplicate values. If you specify a subset of key fields instead of the entire layout, you may lose non-duplicate records when the combination of key fields returns a duplicate.
Finally, select key fields in the grid and apply Ascending or Descending sort order to each field.
General Options screen: This screen shares the options common to most objects on the dataflow:
Clear Incoming Record Messages: When this option is on, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors: When this option is on, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then feed into downstream objects on the dataflow, for example, a destination file that will capture record messages or a log that will capture the messages and as well as collect their statistics.
The Comments input allows you to enter comments associated with this object.
Example
In this example, the Full_Name field is being parsed into First Name and Last Name. It is then added to the fields for the Sort Transformation. The LastName and FirstName fields can then be used on the Sort Transformation Properties to sort the incoming data from the Source.
Last updated