1

I am trying to represent a hash table as a vector of pair < string, int>. I am using a hash function to return the value of the index of the vector where I wish to place the pair. I have been able to successfully create a pair and index the pair's string with the hash function. Now that I know where I want to place my pair in my vector I try to put it there but my program has a segmentation fault at this point. My hash function:

size_t hashfunction(const string& ident){
    unsigned hash = 0;
    for(int i = 0; i < ident.size(); ++i) {
       char c = ident[i];
       hash ^= c + 0x9e3779b9 + (hash<<6) + (hash>>2);
    }
    return hash;
}

My main function:

int main(){
    vector < pair < string, int > > hashtable;
    pair <string, int> testone ("bartering", 5);

    size_t testoneindex = hashfunction(testone.first);
    hashtable[testoneindex] = testone;
    return 0;

}

This section of code compiles but produces a segmentation fault at the line

hashtable[testoneindex] = testone;

What am I doing wrong?

3
  • 4
    You need to modulo your hash index down to the range of indices in your vector. For example, initialize your vector to have 1000 buckets, and use hashfunction(..) % 1000. That brings the next question: how do you plan to handle hash collisions? Commented Dec 7, 2013 at 20:00
  • Thanks for you quick answer. That change worked perfectly. I was planning on using linear probing to handle hash collisions. Commented Dec 7, 2013 at 20:12
  • I'll copy my comment to an answer, then. Commented Dec 7, 2013 at 20:15

3 Answers 3

1

You cannot realistically have your container done this way because of the memory required. Instead you'd want the container and insertion code to be closer to classic hash container design, something like this:

typedef pair <string, int> value_t;
value_t val;
vector<list<value_t>> buckets;
buckets.resize(current_size);
auto& bucket = buckets[hashfunc(val.first) % buckets.size()];
auto itr = find_if(bucket.begin(), bucket.end(), [&](value_t const& other) {
    return other.first == val.first;
});
if (itr == bucket.end()) bucket.push_back(val);
Sign up to request clarification or add additional context in comments.

Comments

1

You need to modulo your hash index down to the range of indices in your vector. For example, initialize your vector to have 1000 buckets, and use hashfunction(..) % 1000.

Comments

0

The std::vector<...> you created is empty. Placing an object anywhere in this object won't work. You need to resize the hashtable object to a suitable size, i.e., you need to give that object the number of buckets, e.g., using

std::size_t number_of_buckets = ...;
std::vector<std::pair<std::string, int> > hashtable(number_of_buckets);

Note, that the approach you take for hashing is a bit too simplistic, though: especially for smaller number of buckets there is a chance that two different hashes as keyed to the same bucket. That is, you'll need to deal with collisions. The two approaches for dealing with collisions I'm aware of are

  1. Determine a new bucket with a key if the first bucket found is already used (and keep searching for new buckets until an empty bucket is found). The main issue with this approach is that you can't really remove objects as other rebucketed objects can be found.
  2. Use a list in each bucket for all the keys which use the same bucket. This approach has the additional advantage that you don't need to create any key or value until a bucket is actually used (you'd have the lists, though, but this can be made fairly cheap).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.