Implement a hash table

Question

I'm trying to create an efficient look-up table in C.

I have an integer as a key and a variable length char* as the value.

I've looked at uthash, but this requires a fixed length char* value. If I make this a big number, then I'm using too much memory.

struct my_struct {
    int key;
    char value[10];             
    UT_hash_handle hh;
};

Has anyone got any pointers? Any insight greatly appreciated.

Thanks everyone for the answers. I've gone with uthash and defined my own custom struct to accommodate my data.

At a low level, I'd suggest using an array of linked-lists to back your hash table. Your hash function just needs to map key to a valid value in the array, and then you just append your value to the linked-list that exists there. Like any other hash implementation, this will perform efficiently so long as your hash function distributes keys relatively evenly within the array. — aroth
– aroth, Commented Jul 27, 2011 at 13:10
Hi, thanks for your message. But how do I efficiently find the correct key? — Eamorr
– Eamorr, Commented Jul 27, 2011 at 13:19
@Eamorr, finding the key is responsibility of the hash function. The hash needs to be deterministic: always delivering the same result for the same input. Then whatever key was used to store the value will retrieve the same value later. — luser droog
– luser droog, Commented Jul 28, 2011 at 2:00

michael_wu · Accepted Answer · 2015-06-16 14:11:54Z

15

You first have to think of your collision strategy:

Will you have multiple hash functions?
Or will you have to use containers inside of the hashtable?

We'll pick 1.

Then you have to choose a nicely distributed hash function. For the example, we'll pick

int hash_fun(int key, int try, int max) {
    return (key + try) % max;
}

If you need something better, maybe have a look at the middle-squared method.

Then, you'll have to decide, what a hash table is.

struct hash_table {
    int max;
    int number_of_elements;
    struct my_struct **elements;
};

Then, we'll have to define how to insert and to retrieve.

int hash_insert(struct my_struct *data, struct hash_table *hash_table) {
    int try, hash;
    if(hash_table->number_of_elements >= hash_table->max) {
        return 0; // FULL
    }
    for(try = 0; true; try++) {
        hash = hash_fun(data->key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) { // empty cell
            hash_table->elements[hash] = data;
            hash_table->number_of_elements++;
            return 1;
        }
    }
    return 0;
}

struct my_struct *hash_retrieve(int key, struct hash_table *hash_table) {
    int try, hash;
    for(try = 0; true; try++) {
        hash = hash_fun(key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) {
            return 0; // Nothing found
        }
        if(hash_table->elements[hash]->key == key) {
            return hash_table->elements[hash];
        }
    }
    return 0;
}

And least a method to remove:

int hash_delete(int key, struct hash_table *hash_table) {
    int try, hash;
    for(try = 0; true; try++) {
        hash = hash_fun(key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) {
            return 0; // Nothing found
        }
        if(hash_table->elements[hash]->key == key) {
            hash_table->number_of_elements--;
            hash_table->elements[hash] = 0;
            return 1; // Success
        }
    }
    return 0;
}

edited Jun 16, 2015 at 14:11

michael_wu

16k5 gold badges66 silver badges77 bronze badges

answered Jul 27, 2011 at 13:26

marc

6,2431 gold badge31 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tim Over a year ago

Thanks! In hash_insert function, should hash_table->number_of_elements < hash_table->max be hash_table->number_of_elements >= hash_table->max instead?

marc Over a year ago

Correct. With open addressing, the hash function needs to involve the capacity obviously, and rehashing needs to occur with the new capacity. This is a minimally simple implementation and I can only encourage anyone to read up on more sophisticated techniques.

Blagovest Buyukliev · Accepted Answer · 2011-07-27 13:06:16Z

5

Declare the value field as void *value.

This way you can have any type of data as the value, but the responsibility for allocating and freeing it will be delegated to the client code.

answered Jul 27, 2011 at 13:06

Blagovest Buyukliev

43.7k14 gold badges96 silver badges135 bronze badges

Comments

paxdiablo · Accepted Answer · 2011-07-27 13:06:13Z

5

It really depends on the distribution of your key field. For example, if it's a unique value always between 0 and 255 inclusive, just use key % 256 to select the bucket and you have a perfect hash.

If it's equally distributed across all possible int values, any function which gives you an equally distributed hash value will do (such as the afore-mentioned key % 256) albeit with multiple values in each bucket.

Without knowing the distribution, it's a little hard to talk about efficient hashes.

answered Jul 27, 2011 at 13:06

paxdiablo

889k243 gold badges1.6k silver badges2k bronze badges

Collectives™ on Stack Overflow

Implement a hash table

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related