8

I'm trying to create an efficient look-up table in C.

I have an integer as a key and a variable length char* as the value.

I've looked at uthash, but this requires a fixed length char* value. If I make this a big number, then I'm using too much memory.

struct my_struct {
    int key;
    char value[10];             
    UT_hash_handle hh;
};

Has anyone got any pointers? Any insight greatly appreciated.


Thanks everyone for the answers. I've gone with uthash and defined my own custom struct to accommodate my data.

3
  • 2
    At a low level, I'd suggest using an array of linked-lists to back your hash table. Your hash function just needs to map key to a valid value in the array, and then you just append your value to the linked-list that exists there. Like any other hash implementation, this will perform efficiently so long as your hash function distributes keys relatively evenly within the array. Commented Jul 27, 2011 at 13:10
  • Hi, thanks for your message. But how do I efficiently find the correct key? Commented Jul 27, 2011 at 13:19
  • @Eamorr, finding the key is responsibility of the hash function. The hash needs to be deterministic: always delivering the same result for the same input. Then whatever key was used to store the value will retrieve the same value later. Commented Jul 28, 2011 at 2:00

3 Answers 3

15

You first have to think of your collision strategy:

  1. Will you have multiple hash functions?
  2. Or will you have to use containers inside of the hashtable?

We'll pick 1.

Then you have to choose a nicely distributed hash function. For the example, we'll pick

int hash_fun(int key, int try, int max) {
    return (key + try) % max;
}

If you need something better, maybe have a look at the middle-squared method.

Then, you'll have to decide, what a hash table is.

struct hash_table {
    int max;
    int number_of_elements;
    struct my_struct **elements;
};

Then, we'll have to define how to insert and to retrieve.

int hash_insert(struct my_struct *data, struct hash_table *hash_table) {
    int try, hash;
    if(hash_table->number_of_elements >= hash_table->max) {
        return 0; // FULL
    }
    for(try = 0; true; try++) {
        hash = hash_fun(data->key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) { // empty cell
            hash_table->elements[hash] = data;
            hash_table->number_of_elements++;
            return 1;
        }
    }
    return 0;
}

struct my_struct *hash_retrieve(int key, struct hash_table *hash_table) {
    int try, hash;
    for(try = 0; true; try++) {
        hash = hash_fun(key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) {
            return 0; // Nothing found
        }
        if(hash_table->elements[hash]->key == key) {
            return hash_table->elements[hash];
        }
    }
    return 0;
}

And least a method to remove:

int hash_delete(int key, struct hash_table *hash_table) {
    int try, hash;
    for(try = 0; true; try++) {
        hash = hash_fun(key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) {
            return 0; // Nothing found
        }
        if(hash_table->elements[hash]->key == key) {
            hash_table->number_of_elements--;
            hash_table->elements[hash] = 0;
            return 1; // Success
        }
    }
    return 0;
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! In hash_insert function, should hash_table->number_of_elements < hash_table->max be hash_table->number_of_elements >= hash_table->max instead?
Correct. With open addressing, the hash function needs to involve the capacity obviously, and rehashing needs to occur with the new capacity. This is a minimally simple implementation and I can only encourage anyone to read up on more sophisticated techniques.
5

Declare the value field as void *value.

This way you can have any type of data as the value, but the responsibility for allocating and freeing it will be delegated to the client code.

Comments

5

It really depends on the distribution of your key field. For example, if it's a unique value always between 0 and 255 inclusive, just use key % 256 to select the bucket and you have a perfect hash.

If it's equally distributed across all possible int values, any function which gives you an equally distributed hash value will do (such as the afore-mentioned key % 256) albeit with multiple values in each bucket.

Without knowing the distribution, it's a little hard to talk about efficient hashes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.