CDC or CT - SQL 2008

So, I've had a little downtime at work and I started messing around with "Change Data Capture" and "Change Tracking" in SQL Server 2008. Books Online says that "Change Data Capture" (CDC) would be ideal for ETL applications but doesn't mention this in the "Change Tracking" (CT) sections. I will assume Microsoft has done it's research into this but it seems that all the ETL projects I have been involved with are only concerned about having the current data. I don't care that a row has changed 100 times since the last time I ran the sync, I just want to know what it is now.

Now, in saying that, both of these features can accomplish that goal. However, CDC (by default) will record all columns for any row that has changed. So if you update the name of an employee and the table has 30 columns CDC will record all 30 columns for that single change. Not only does it store all of the columns but it stores them all at least once for every insert/update/delete statement ran against that table (it stores 2 rows per update).

Change Tracking (CT) on the other hand, only stores the primary key for a change in a monitored table and it only stores it once. You could change the same row 100 times and CT will only have 1 record for it. So my thoughts on these 2 features in SQL 2008 is that CDC will allow you to track each modification to the data but CT is better for ETL projects. While I don't have any ETL scripts for CT it wouldn't be hard to create.

Step 1) Enable "Change Tracking"

Step 2) Make some changes to the table

Step 3) select * from changetable(changes CT_Table,0) A inner join source_table B on A.Primary_Key = B.Primary_Key

Now the only thing to add to that is a way to keep track of the change version since the last sync and you are pretty much done.

Now back to CDC, before this feature came out really the only way to track data changes was by using triggers, aka "performance killers" if you were dealing with a highly used database. The great thing about CDC (CT also) is that it uses the transaction log reader. There is a good link here from Microsoft on how to tune CDC and their performance results. Now there could be a space issue depending on how many tables and how many data modifications are occurring. The CDC tables are stored in the same database but you can move them to a different filegroup. I have included a script based on the AdventureWorks database that will move a CDC table to a new filegroup. There is little error checking in this script so use at your own risk.

I am also including a script that will rollback the insert/update/delete statements against a table that has all columns included in the CDC. It is a little clunky and relies on a user defined function (UDF), script also included. But basically the "Rollback Script" creates a stored procedure for a given table. Once the stored procedure is created the only arguments to it are "start time" and, optionally, "end time." If you just want to rollback the table to a specific point in time then the "end time" parameter is not needed. However, I included it in case there is only a specific transaction that is needed. In the comments of the newly created stored procedure there is some SQL that will allow you to view all of the transactions that have occurred. One caveat to this is that CDC only keeps 3 days worth of transactions. (This is cleaned up by a SQL Server Agent job) This can be changed using sp_cdc_change_job @job_type='cleanup', @retention=minutes but the job only runs once a day so you might also have to change the agent job to match your retention period. Anyway, that's how I have been keeping busy.

Page updated

Google Sites

Report abuse