13

Is there any way in SQL Server 2012 to generate a hash of a set of rows and columns?

I want to generate a hash, store it on the parent record. The when an update comes in, I'll compare the incoming hash with the parent record hash and I'll know whether the data has changed.

So something like this would be nice:

SELECT GENERATEHASH(CONCATENATE(Name, Description, AnotherColumn))
FROM MyChildTable WHERE ParentId = 2 -- subset of data belong to parent record 2

"CONCATENATE" would be an aggregate function which would not only concat the columns, but also, the rows inside the resultset. Like MAX, but returning everything as a string concatenation.

Hopefully this helps you see what I mean anyway!

The fundamental problem I'm trying to solve is that my client's system perform imports of vast amounts of hierarchical data. If I can avoid processing through the use of hashes, then I would think this will save a lot of time. At the moment, the SP is running 300% slower when having to process duplicate data.

Many thanks

4 Answers 4

12
select HashBytes('md5',convert(varbinary(max),(SELECT * FROM MyChildTable WHERE ParentId = 2 FOR XML AUTO)))

but HashBytes is limited to 8000 bytes only... you can make a function to get de Md5 for every 8000 bytes....

Sign up to request clarification or add additional context in comments.

2 Comments

If you are on SQL Server 2016 or above, which has some JSON support, I recommend using FOR JSON AUTO instead of FOR XML AUTO, as this seems to be about 2x faster in a few tests I did.
Note: limit of 8000 bytes is for SQL 2014 and before. learn.microsoft.com/en-us/sql/t-sql/functions/…
10

You can use the CHECKSUM_AGG aggregate. it is made for that purpose.

1 Comment

Unfortunately CHECKSUM has known weaknesses (i.e. practical collisions). E.g. decimal type sqlserverpains.blogspot.com.au/2008/06/checksum-pains.html so just beware.
2

For single row hashes:

select HASHBYTES('md5', Name + Description + AnotherColumn)
FROM MyChildTable WHERE ParentId = 2

for table checksum:

select sum(checksum(Name + Description + AnotherColumn)*1.0)
FROM MyChildTable WHERE ParentId = 2

2 Comments

Does this produce a hash from the entire resultset? Or will it produce multiple hashes, one for each row in MyChildTable?
Thanks for the idea @juergend. Going to use the checksum solution a lot.
1

Another approach:

-- compute a single hash value for all rows of a table
begin

    set nocount on;

    -- init hash variable
    declare @tblhash varchar(40);
    set @tblhash = 'start';

    -- compute a single hash value
    select @tblhash = sys.fn_varbintohexsubstring(0, hashbytes('sha1',(convert(varbinary(max),@tblhash+
    (select sys.fn_varbintohexsubstring(0,hashbytes('sha1',(convert(varbinary(max),
    -- replace 'select *' if you want only specific columns to be included in the hash calculation
    -- [target table] is the name of the table to calc the hash from
    -- [row_id] is the primary key column within the target table
    -- modify those in the next lines to suit your needs:
    (select * from [target_table] obj2 where obj2.[row_id]=obj1.[row_id] for xml raw)
    ))),1,0))
    ))),1,0)
    from [target_table] obj1;

    set nocount off;

    -- return result
    select @tblhash as hashvalue;

end;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.