0

At a given time I stored the result of the following ORACLE SQL Query :

SELET col , TO_CHAR( LOWER( STANDARD_HASH( col , 'MD5' ) ) AS hash_col FROM MyTable ;

A week later, I executed the same query on the same data ( same values for column col ).

I thought the resulting hash_col column would have the same values as the values from the former execution but it was not the case.

Is it possible for ORACLE STANDARD_HASH function to deliver over time the same result for identical input data ? It does if the function is called twice the same day.

7
  • Calling standard_hash without a method parameter computes a SHA1 hash not an MD5 hash. If you pass the same parameters in, you'll get the same result out. That's a pretty basic property of a hash. I would wager that either you didn't execute the same query-- perhaps previously you actually computed the MD5 hash-- or the data has actually changed in some way. Commented Apr 28, 2021 at 13:00
  • Right. I correct that. It does give the same result if the hash is made twice during the same day but not when made on different days. It looks like some seed has been changed but there is no seed argument in the ORACLE STANDARD_HASH function. Commented Apr 28, 2021 at 13:03
  • 2
    Consider the possibility that the data, is not, in fact, the same. Values comprised of different bytes may end up looking the same when presented on screen. This can happen in various ways: (invisible) whitespace in strings, rounded numeric values. Commented Apr 28, 2021 at 13:13
  • The data hashed is stored in a ORACLE table that has not been modified between the two hash sessions. I am quite confused about what I observed. Commented Apr 28, 2021 at 13:24
  • 1
    When you have eliminated the impossible, whatever remains, however improbable, must be the truth. If the data truly has not changed (and how are you validating this?) and the hash isn't fundamentally wrongly implemented (which other uses of Oracle would have noticed) then few possibilities remain, including but not limited to broken hardware. My money would still be on not having sufficiently excluded the possibility of changed data (possibly through a nondeterministic query). Try reproducing the hash values from values you enter yourself rather than the table columns. Commented Apr 28, 2021 at 13:34

1 Answer 1

1

All we have about the data changing (or not) and the hash changing (or not) is your assertion.

You could create and populate a log table:

create table hash_log (
   sample_time timestamp,
   hashed_string varchar2(200),
   hashed_string_dump varchar2(200),
   hash_value varchar2(200)
   );

Then on a daily basis:

insert into hash_log values 
  (select systimestamp,
          source_column,
          dump(source_column),
          STANDARD_HASH(source_column , 'MD5' )
   from source_table
   );

Then, to spot changes:

select distinct      hashed_string ||
                     hashed_string_dump ||
                     hash_value 
from hash_log;
Sign up to request clarification or add additional context in comments.

3 Comments

Right. I have to admit I was myself surprised by the result. I will design the experiment you suggest ( but source table and hash log table will be the same ). Possibly I am mistaken. I will give some news later on. Thanks.
how can you make the source table and the log table the same? As I see it, there would be a 1-to-many relationship between the two.
So as expected, I got the same result today hashing the column of my source table . TO_CHAR( LOWER( STANDARD_HASH( col , 'MD5' ) ) returns the same result on the same data. I thought I hashed the same data but that was not the case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.