1



I need to generate a unique integer id for a string.

Reason:
I have a database application that can run on different databases. This databases contains parameters with parameter types that are generated from external xml data. the current situation is that i use the ordinal number of the Enum. But when a parameter is inserted or removed, the ordinals get mixed up:
(FOOD = 0 , TOYS = 1) <--> (FOOD = 0, NONFOOD = 1, TOYS = 2)

The ammount of Parameter types is between 200 and 2000, so i am scared a bit using hashCode() for a string.

P.S.: I am using Java.

Thanks a lot

9
  • If the strings are expected to be short, you should just use the strings themselves. Commented Jun 14, 2011 at 15:06
  • the strings are up to 32 chars, but there is a table where the ids are put together (parameterGroup * 10000 + parameterType), so they need to have a numeric representation. this is part of a database index and should not be longer than 10 byte, so appending strings together won't work too. Commented Jun 14, 2011 at 15:10
  • I don't understand why you say you're scared about using String implementation of hashCode(). Can you explain? Commented Jun 14, 2011 at 15:15
  • 1
    because of the possible collision. the hash code is not unique Commented Jun 14, 2011 at 15:16
  • I don't think there would be a way to generically generate a number like this. If your Strings can be up to 32 characters, an int would not have a large enough range to be able to have a unique number for every possible String. You mentioned hashCode(), but this could return the same number for two different Strings, it could also return negative numbers, which you may not want. Commented Jun 14, 2011 at 15:20

6 Answers 6

5

I would use a mapping table in the database to map these Strings to an auto increment value. These mapping should then be cached in the application.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer, i also thought about that solution, i think thats the way i would take when i don't find an ID generation. the thing i don't like is the storing of additional data.
2

Use a cryptographic hash. MD5 would probably be sufficient and relatively fast. It will be unique enough for your set of input.

How can I generate an MD5 hash?

The only problem is that the hash is 128 bits, so a standard 64-bit integer won't hold it.

1 Comment

No you're not wrong. There are also integers (typed as long or long long) that are larger. In some cases, the integer type is 16 bit, 32, or 64. The types support numbers in the Integer set as opposed to supporting numbers in the Real set. The length just determines how many numbers you can represent in the Integer set.
1

If you need to be absolute certain that the id are unique (no collissions) and your strings are up to 32 chars, and your number must be of no more than 10 digits (approx 32 bits), you obviously cannot do it by a one way function id=F(string).

The natural way is to keep some mapping of the string to unique numbers (typically a sequence), either in the DB or in the application.

Comments

0

If you know the type of string values (length, letter patterns), you can count the total number of strings in this set and if it fits within 32 bits, the count function is your integer value.

Otherwise, the string itself is your integer value (integer in math terms, not Java).

Comments

0

By Enum you mean a Java Enum? Then you could give each enum value a unique int by your self instead of using its ordinal number:

public enum MyEnum {

    FOOD(0),
    TOYS(1),

    private final int id;

    private MyEnum(int id)
    {
        this.id = id;
    }
}

3 Comments

yes, that is true, but from where do i take this number? it must somehow be unique for the string.
@Gerald: When you have to extend the enum take the last id and increment it for the new value...
nice idea, but that generator always generates the whole enum. its a stupid script with no memory of things that happen before.
0

I came across this post that's sensible: How to convert string to unique identifier in Java

In it the author describes his implementation:

public static long longHash(String string) {
  long h = 98764321261L; 
  int l = string.length();
  char[] chars = string.toCharArray();

  for (int i = 0; i < l; i++) {
    h = 31*h + chars[i];
  }
  return h;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.