1

If my input length is less than the hash output length, are there any hashing algorithms that can guarantee no collisions.

I know by nature that a one way hash can have collisions across multiple inputs due to the lossy nature of the hashing, especially when considering input size is often greater than output size, but does that still apply with smaller input sizes?

4
  • I'd look at these posts link 1 link 2. Commented Jan 9, 2015 at 21:59
  • @Xedni, your links appear to target the same post Commented Jan 9, 2015 at 22:07
  • @Dave.Gugg One for each eye? Commented Jan 9, 2015 at 22:41
  • Must have been a copy paste fail. Regardless, it looks like a suitable answer was given :) Commented Jan 12, 2015 at 21:48

3 Answers 3

1

Use a symmetric block cipher with a randomly chosen static key. Encryption can never produce a duplicate because that would prevent unambiguous decryption.

This scheme will force a certain output length which is a multiple of the cipher block size. If you can make use a variable-length output you can use a stream cipher as well.

Sign up to request clarification or add additional context in comments.

Comments

0

Your question sounds like you're looking for a perfect hash function. The problem with perfect hash functions is they tend to be tailored towards a specific set of data.

The following assumes you're not trying to hide, secure or encrypt the data...

To think of it another way, the easiest way to "generate" a perfect hash function that will accept your inputs is to map the data you want to store to a table and associate those inputs with a surrogate primary key. You then create a unique constraint for the column (or columns) to ensure the input you're mapping only maps to a single surrogate value.

The surrogate key could be int, bigint or a guid. It all depends on how many rows you're looking to store.

Comments

0

If your input lengths are known to be small, such as 32 bits, then you could actually enumerate through all possible inputs and check the resulting hashes for collisions. That's only going to be 4294967296 possible inputs, and shouldn't take to terribly long to enumerate all of them. Essentially you'd be building a rainbow table to test for collisions.

If there is some security relying on this though, one of the issues is if an attacker knows your input lengths are constrained, it makes it easy for them to also perform the same enumeration to create a map/table that will map hashes back to the original values. "attacker" is a pretty terrible term here though because I have no context of how you are using these hashes and whether you are concerned about being able to reverse them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.