© Copyright 2023, Astera Software
The Incremental CDC option in Astera allows you to load the data stored in your database tables incrementally instead of complete loads at every run. This becomes useful with large databases where re-loading complete tables at every run becomes inefficient.
In this article, we’ll be looking at how you can activate and use the Incremental CDC option in Astera.
In this example, we have a database table containing the basic data of shipping companies. We want to store it in a new database table such that we can update the new table on any change in the source without having to completely load the source table again.
Here, we have a pre-configured database table source which contains data for a shipping company.
Firstly, to access the Properties for the Database Table Source object, right-click on its header and select Properties from the context menu.
After successful connection with the database, click on Next to proceed.
In the Pick Source Table and Reading Option window you can see the configured database table. On this screen, we have the Incremental Read Options section.
In this section, select the Read Strategy named Incremental Load Based On Audit Fields. Further options become available on this selection.
Audit Fields are fields that are updated when a record is created or modified. Examples of audit fields include created date time, modified date time, and auto number.
Incremental read can cater to both updates and inserts when the audit field is selected as date time or modified date time (it must be a timestamp field). Without a timestamp field as the audit field, the incremental read can only cater to inserts.
Incremental read works by keeping a track of the highest value for the specified audit field. On the next run, only the records with a value higher than the saved value are retrieved. This feature is useful in situations where two applications need to be kept in sync and the source table maintains audit field values for rows.
In case you require the complete database table to be loaded, select the checkbox titled Perform full load on next run.
Provide a file path for the Incremental Transfer Information File, this file is created by Astera to store information on the last entry of the database table. On each run, the file is compared with the database table to check if any new entry was performed in the table.
Next, set up the destination table for the data to be kept as it updates in the source table. Add a database destination object by going to Toolbox > Destinations > Database Table Destination.
Next, configure it by opening its properties. Click Next after selecting the database.
In our case, the table is already present in our database, and we can select it from the Pick Table window. We pick the Shippers_Stored table as our destination table.
Next in the Define Input Ports for Mapping section select the Individual Ports for Actions and then further select the Upsert checkbox as we require new entries to be stored in the destination. In the Select Fields for Matching Database Record select the ShipperID which is also the primary key and is unique for each record in our database table.
Now, map all the fields from the Database Table Source to the Database Table Destination object.
Now, you can check the destination table and find it to be empty. To check a database table’s contents in Astera, right-click on the object, in this case DatabaseDestination.
In the context menu, go to Database Table > View Table Data.
This opens a new tab with the SQL query run for viewing the data in the table.
Whereas the source table has three entries.
When the dataflow is executed, in the Job Progress window you can see that the three entries from the source table have been output and written to the destination table.
We can check this by previewing the destination table.
Now, if you run the dataflow again, you can see in the Job Progress window that there is no input at all. Meaning that the Database Table Source object did not load any data from the database table.
No records were loaded when the dataflow was executed, Astera checked the CDC file and found the last entry for the previous run to be still the last entry in the table currently. Hence, there was no need to load the data.
To check if the incremental CDC is working, let’s take an example where a fourth entry is inserted in the source table.
Run the dataflow again and see in the Job Progress window that there is one input and consequently one output written to the destination table. That is to say that our Database Table Source object only accessed the last entry of the source table.
Now, let’s take an example where the Phone is updated for the fourth entry.
Upon running the dataflow, you can see in the Job Progress that the database table source object has not registered any data and therefore the destination is not updated. The update is not considered, as the value of the Audit Field in this case, ShipperID stored in the CDC file is the same as it was in the last run.
The CDC file stored by Astera can be accessed, where it stores information on the last value for the Audit Field.
The Incremental CDC option in the Database Table Source object allows for new entries to be loaded into Astera and caters to any updates to the entries present as per the last run of the dataflow if the audit field is based on a time stamp.
This concludes our article on Incremental CDC in Astera Data Stack.