Trigger Based Change Data Capture (CDC) is a read strategy which allows you to load the data in your database tables incrementally instead of loading it completely at every run. This becomes useful with large databases where re-loading complete tables is not feasible every time. Trigger Based CDC uses triggers to identify any changes in the data source on being run and loads data accordingly.
In this article, we’ll be looking at how you can enable and use the Trigger Based CDC option in Astera.
In this example, we have a database table containing basic data of a company’s employees. We want to store it at a new database table such that we can update the new table on any change in the source without having to completely load the source table again.
Here, we have the pre-configured database table source containing basic data of a company’s employees.
You can enable CDC from the context menu of the Database Table Source object as shown below.
This opens a dialogue box titled Select Columns. Here, select the required fields on which you want to enable the CDC.
In this case, we’ll select all the fields to monitor changes, including Employee ID which is our primary key field. It is important to note here that it is mandatory to select the primary key field of a table. After the selection, click on OK.
A confirmation dialogue box appears, click on OK again. Trigger Based CDC has now been enabled.
Now, right-click on the source object and proceed to Change Data Capture.
Here, you can see that the Change Data Capture is enabled on the table.
In case you require the complete database table to be loaded regardless of any changes on the first run of the dataflow, select the checkbox titled Perform full load on first run from the properties of the source object.
Next, set up the destination table where a store of the data is to be kept as it updates in the source table.
Add a database destination object by going to Toolbox > Destinations > Database Table Destination.
Configure it by opening its properties. Click Next after selecting the database where you want the store to be.
In our case, the table is already present in our database, and can be selected from the Pick Table option.
In the Define Input Ports for Mapping section, it is preferable to select the Upsert checkbox with a CDC enabled source as the incoming data can contain new as well as updated records. In the Select Fields for Matching Database Record, choose the EmployeeID which is also the primary key and is unique for each record in our database table.
To further learn about the Database Table Destination object, click here.
Now, map all the fields from the Database Table Source to the Database Destination object.
Run the dataflow, in the job progress window you can see that the nine entries from the source table have been read and written to the destination table.
Now, run the dataflow again, you can see in the Job Progress window that there is no input at all. Meaning that the Database Table Source object did not load any data from the database table.
A new record has now been added to the Employees source table. On running the dataflow again, you can see in the job progress window that there is one input and consequently one output written to the destination table. That is to say that our database source object only accessed the new entry of the source table.
Similarly, if any records get updated, they will be automatically detected and processed.
In the Data Source Browser in the relevant database, after enabling the CDC in the relevant source you can see that Astera creates tables for the management of CDC. An info table and an Astera table for the Employees table were created for this particular use case.
The CDC_CDCTABLESINFO_CP shows the tables for which the CDC has been enabled in the database and also the last time and date at which the data in the table was processed.
The Astera table is an audit table which stores the changes that occur with the data in the table for the fields for which CDC was enabled, the time at which the change occurred and the action (insertion, deletion or update) for that change. In our use case, CDC_dbo_Employees_CP is the Astera table.
When the read strategy Trigger based CDC is enabled, Astera uses these two tables on every run to check for any changes and loads them accordingly.
This concludes our article on Trigger Based CDC in Astera Data Stack.