Detecting modified rows before to write them to a table - SQL Server 2012

Hi,

I've a table with more columns and 1 identifier. I need to write this table when a modified row is detecting respect to the columns not to the identifier.

So I've created a temporary table to put the potential rows to write on the real table, but I want to detect the modified rows. I've thought to use the checksum function, but I don't know how to use it and if it could be useful in this scenario.

Moreover, in the temporary table I've collected daily the rows to write: the first day a row could have a value respect to his columns, the next day a different value and the next one the same value respect to the first day.

Any suggests to me in order to manage this scenario, please?

Many thanks

April 17th, 2015 12:10pm

The trigger can be used with AFTER UPDATE. I am inserting the data before to be updated in to destination table. See the sample below:

CREATE TRIGGER [dbo].[trUpdate] ON [dbo].[User] AFTER UPDATE  
as
begin
	declare @sAMAccountName varchar(20)
	declare @displayName varchar(50)
	declare @Age int
	declare @Address varchar(100)
	declare @TelephoneNumber varchar(20)

	IF EXISTS(SELECT * FROM DELETED)
    BEGIN
		if EXISTS(SELECT * FROM INSERTED)  -- Set Action to Updated.
		begin
			if EXISTS (SELECT * FROM DELETED) -- Before Update
			begin
				SELECT 
					@sAMAccountName=sAMAccountName,
					@displayName=displayName,
					@Age=Age,
					@Address =[Address],
					@TelephoneNumber=[TelephoneNumber]
				FROM 
					deleted  
				insert into User_Destination(sAMAccountName,displayName,Age,[Address],TelephoneNumber,[Action]) 
				values (@sAMAccountName,@displayName,@Age,@Address,@TelephoneNumber,'Before')
			END
			
		END			
	END
END


Free Windows Admin Tool Kit Click here and download it now
April 17th, 2015 2:09pm

Hi Ricardo,

thanks for your reply, but using a trigger to feed a table of a dwh it seems a best practice.

April 17th, 2015 2:19pm

Dear Pscorca,

It's recommended the use of CDC - CHANGE DATA CAPTURE - SSIS for business intelligence scenarios.

"Source tables change over time. A data mart or data warehouse that is based on those tables needs to reflect these changes. However, a process that periodically copies a snapshot of the entire source consumes too much time and resources. Alternate approaches that include timestamp columns, triggers, or complex queries often hurt performance and increase complexity. What is needed is a reliable stream of change data that is structured so that it can easily be applied by consumers to target representations of the data. Change data capture in SQL Server provides this solution.

The change data capture feature of the Database Engine captures insert, update, and delete activity applied to SQL Server tables, and makes the details of the changes available in an easily-consumed, relational format. The change tables used by change data capture contain columns that mirror the column structure of the tracked source tables, along with the metadata needed to understand the changes that have occurred on a row by row basis..." font: Site Microsoft: https://msdn.microsoft.com/en-us/library/bb895315.aspx

I did an article that introduces this resource : http://social.technet.microsoft.com/wiki/contents/articles/30471.ssis-data-maintenance-with-microsoft-azure-sql-database.aspx

Free Windows Admin Tool Kit Click here and download it now
April 17th, 2015 4:01pm

Hi Ricardo,

using a SSIS pkg is a good thing, but before that I can establish that a row is changed respect to the previous one I need to perform some calculations for some column. So I need to operate at stored procedure level and not at SSIS level.

Thanks

April 17th, 2015 4:58pm

You don't need to work with SSIS. The SSIS is one way to get the change, but you can choose the better way . You get the data only enabling the Change Data Capture on the table and database. What matters to us is the __$operation attribute.

The following system stored procedure will enable change data capture for the
database:

USE WSS_Customs 
EXEC sys.sp_cdc_enable_db

The following system stored procedure will enable change data capture for the
USER_LOCAL table: 

execute sys.sp_cdc_enable_table
@source_schema='dbo',
@source_name ='USER_LOCAL',
@role_name='cdc_Admin',
@supports_net_changes=1

The table is named for the schema and table name of the source table as
follows:
CDC.DBO.USER_LOCAL_CT is found in the
System Tables folder in SQL Server Management Studio. 

The CDC.DBO.USER_LOCAL_CT has one
column that mirrors each column in the source table where changed data is being
captured. These columns have the same name and the same data type as the
corresponding column in the source table. 

In addition to the columns
that mirror the CDC.DBO.USER_LOCAL_CT table, it contains several columns of metadata, as
follows:



  • __$start_lsn : The LOG SEQUENCE NUMBER (LSN) assigned to this
    change. All changes committed within the same transaction will have the same
    LSN. The LSN shows the order in which transactions occurred.
  • __$end_lsn: This column may be utilized in future versions of SQL SERVER.
  • __$seqval: A sequence number used to determine the order of changes that are
    within the same transaction and thus have the same LSN.
  • __$operation: The operation contains a code
    identifying whether the row has been inserted, updated, or deleted as
    follows: 
     1- delete | 2- insert | 3-update
    (values prior to the update) | 4-update (values after the update)
  • __$update_mask: A bit map indicating which
    columns where affected by an update. 

Free Windows Admin Tool Kit Click here and download it now
April 17th, 2015 5:19pm

Hi Ricardo,

many thanks for your reply. Unfortunately I haven't ever used the CDC feature and I haven't enough time to learn it. With the CDC feature, can I retrieve the historical data related to facts and dimension data? Can I retrieve only the last daily changes for a table? In my case, I need to capture the daily changes for a table but only the last ones for the same day.

Many thanks

April 18th, 2015 1:30am

Do you have in your table a datetime column that tracks the changes on that record? By which you can get the latest or previous versions of it, so by every update you set its value to getdate(). Or you can add a version counter field and increment with ea
Free Windows Admin Tool Kit Click here and download it now
April 18th, 2015 4:30am

CDC requires Enterprise Edition, so if you only have Standard or BI edition, CDC is not an option for you.

Next, Ricard Lacerda should be very ashamed for posting that trigger. A trigger fires one per statement and must be able to handle multi-row operations.

Your actual requirements are somewhat vague to me. It seems that you have a fact table, and you want to track changes in it, and you want to be able to see the different values that a certain column had a certain point in time, or more precisely what values they had at the end of the day.

An important thing here is how the table is updated. Are changes written to the table continuously, and rows are typically only written when there are changes? Or is the table scratched and reloaded? Or is data loaded in batches and rows updated even if there are no actual changes?

April 18th, 2015 5:23am

Hi Erland, many thanks for your reply and interest.

My post is regarded to follow a good strategy to manage the modified rows for a fact table.

As a first thing, in order to feed my fact table I read a SQL Server staging area having data read from an Oracle database. I cannot act at Oracle source level, I can act at SQL Server staging area/data warehouse level.

My fact table of the SQL Server data warehouse now is composed of more 50 fields, other the surrogate key. Several of these 50 fields aren't simply read from the staging area but are derived from a transformation and/or calculation.

I need to:

a. import data from the staging area to the fact table each day if a fact row is changed (before to write a potential row in the fact I write it into a temporary table);

b. understand if a fact row is changed respect to the previous fact row (related to a previous day respect to the current day when the daily import is executed).

In order to understand if a fact row is changed I need to control if a change is occurred for the 50 fields above mentioned; so, I'd like to find a good manner to detect a change for all these fields avoiding to write many IF statements: using CHECKSUM function could be a good solution?

I've already created some procedures to retrieve the historical data for my fact table and for the daily import: I'd like to improve the written code in particular to manage better the historical data retrieval by simulating backwords the daily data import.

For my work I need to conform to the management of the Oracle data source and to his related data quality. It is clear for me to have for the fact table a surrogate key, and dates to indicate when a row is valid or not, and so on.

Do I need other informations about this issue? Thanks for your help.


  • Edited by pscorca 17 hours 51 minutes ago
Free Windows Admin Tool Kit Click here and download it now
April 18th, 2015 9:50am

b. understand if a fact row is changed respect to the previous fact row (related to a previous day respect to the current day when the daily import is executed).

And more exactly do you need to understand that? To know whether you should update it only? Or because if it has changed, you want to send it to an archive table?

In order to understand if a fact row is changed I need to control if a change is occurred for the 50 fields above mentioned; so, I'd like to find a good manner to detect a change for all these fields avoiding to write many IF statements: using CHECKSUM function could be a good solution?

No. CHECKSUM can tell you that two rows are different. But CHECKSUM cannot tell you whether two rows are the same. CHECKSUM returns a 32-bit value, and it is computed in a fairly simple-minded way, why the risk that two rows with different values produces the same checksum.

The hash_bytes function is better in that regard, but it has other limitations, such as the input being limited to 8000 bytes.

Possibly, you could write a CLR function to compute a hash for the rows, and you could make this a computed column that you persist.

The other, boring choice is to compare all columns.

Or, if all you want to know os whether you should update, use brute force and always update.

In the scenario you describe, I don't see a place for CDC.

April 18th, 2015 2:32pm

Hi Erland,

I need to detect a change value for at least one of the 50 fields in order to write the changed row (respect to the previous row) in the fact table: writing into the fact table is the unique operation to accomplish for at least one changed field respect to the all 50 fields.

I hoped to consider the CHECKSUM function more simple to use than HASHBYTE function, that requires to build an input string. So writing a long-time IF seems a good solution, also if I think to using a MERGE statement where the source is the set of the last rows inserted in the fact table to match with the potential changed rows saved in the temporary table.

Thanks

Free Windows Admin Tool Kit Click here and download it now
April 18th, 2015 5:53pm

I need to detect a change value for at least one of the 50 fields in order to write the changed row (respect to the previous row) in the fact table: writing into the fact table is the unique operation to accomplish for at least one changed field respect to the all 50 fields.

Are you inserting a new row or updating an existing? In the latter case, you could simply update anyhow, although if many rows are an unchanged this cause an unnecessarily high load.

April 18th, 2015 6:19pm

In the fact table I accomplish only insert operations no updates, so when I detect a modified row I need to insert it as a new row. I don't want to write unchanged rows day by day in order to save storage space for needless rows.

Thanks

Free Windows Admin Tool Kit Click here and download it now
April 18th, 2015 6:32pm

Ricardo, this is awesome, but I would be a little careful with CDC and at least would take into consideration whether his client will decide to use Azure or not for the database that he is developing, since it may be a trouble if CDC were not available in SQL Azure.

SQL Azure does not currently provide a method of change tracking data



April 18th, 2015 7:02pm

Hi Stanislav, I agree with you! The CDC Can't be enabled in azure environment yet. On the other hand, You can use this resource to fill the destination table in sql azure. I don't see any problem. Really you have to check the available versi
Free Windows Admin Tool Kit Click here and download it now
April 18th, 2015 9:38pm

Dear Erland, I am not ashamed because the statement wasn't very clear. I already recognized that the use the Triggers, In this case, is not re
April 18th, 2015 9:50pm

This is a good book about many techniques to use 

https://www.red-gate.com/community/books/defensive-database-programming

Free Windows Admin Tool Kit Click here and download it now
April 19th, 2015 1:59am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics