2

I have 2 same images with different Image properties and file properties (e.g. CreationDate, etc.). When I calculate hash, I get different hashes. Is there any way to skip such properties and calculate hash to get same hashes?

Awaiting help. Thanks

1
  • Added one edge case that may or may not matter to your application. Commented Mar 9, 2016 at 1:34

1 Answer 1

5

You can read the image data into a byte array and hash that byte array.

That way, differences in meta-data would not be considered.

Since the 2D data is read into a 1D array, you can construct cases where two images with different dimensions have the same hash. For example, consider a 2x2 image and a 4x1 image. R means red and B means blue (just to pick two colors)

RB
BR

and

RBBR

Both would have the same hash code. If that matters to you, prepend (or append) the width and height of the image to the byte array before hashing.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! Is there any solution for video formats?
There's a lot more data involved, but the same basic approach. You could probably grab a few seconds of data from the middle of the video and use that if performance is key. I would not grab the beginning or end as some videos have the same lead-in (e.g. if the same company made them)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.