Source Diff Processor
The Source Diff Processor object is one of the Database Write Strategies offered in the Astera Data Stack. It works like the Database Diff Processor, however, unlike the Database Diff Processor, it is used to perform write actions (such as Insert, Update and Delete) on file destinations.
It stores a snapshot of your data processed in the first run in a CDC file. So, the next time you run it, it will only import the new records.
Use Case
We have a sample Employees dataset coming in from an Excel Workbook Source. Initially, we had records of 10 employees but later on, 2 more were added to the source dataset.
We wish to apply a database write strategy that can read the data incrementally from file sources. To achieve this, we will use the Source Diff Processor in Astera.
Working with Source Diff Processor
Drag and drop the Source Diff Processor object from Toolbox > Database Write Strategy > Source Diff Processor onto the dataflow designer and map the source data to it.
Right-click on the Source Diff Processor object’s header and select Properties.
A Layout Builder window will open where you can modify your layout. Click Next.
The next window is the Incremental Write Options window.
Here, you have to specify the Record Matching field. This field is used to match and compare the incoming and existing records. We will select EmployeeID as the Record Matching field.
Case Sensitive – Check this option if you want to compare records on a case-sensitive basis.
Sort Input – Check this option if you want to sort the incoming data.
Now, if the incoming dataset has a new record with a new EmployeeID i.e. the ID is not present in the existing file which is being compared against the incoming file, Astera will perform the INSERT action.
If the EmployeeID is already present in the existing file, Astera will compare the records against that ID and will perform an UPDATE action in the fields where the information has been updated.
If the EmployeeID is there in the existing file, but not present in the incoming file, it means that the particular record has been deleted. In this case, Astera will perform the DELETE action.
In the Output Options section, you can either select the Single Output option or One Port for Each Action.
Single Output:
The Single Output option is selected if you wish to load your data into the destination without modifying it further on the basis of individual write actions.
If you select Single Output, the database action such as INSERT, UPDATE, SKIP, or ERROR will be chosen by the database write strategy’s logic rather than being specified by the user. Using a Single Output is recommended when a database write strategy is applied.
One Port for Each Action:
One Port for Each Action is used when you want to further transform or log your data. If you select One Port for Each Action, you will get separate nodes for each Diff action in the Source Diff Processor’s object.
In this example, we will select Single Output.
The third section in the Incremental Write Options window is the Incremental Transfer Information File Path option. Here, you must specify the file path where you want to store information related to the last run.
Now, if you have worked with Excel Workbook and Database Table Sources in Astera, you would have noticed that the Database Table Source object gives you the option to read incremental changes.
However, no such option is available in Excel or other file source objects. This option in the Source Diff Processor enables you to read incrementally from different file formats such as Excel, Delimited, and Fixed Length.
Click OK.
Now, right-click on the Source Diff Processor object’s header and select Preview Output.
Output preview for Single Output:
Output preview if One Port for Each Action is selected:
You can now write your data to any destination or perform any transformation on the dataset.
This concludes using the Source Diff Processor write strategy in Astera.
Last updated