0

I'm currently using g++ on a Cygwin terminal as per my professor's request.

I'm supposed to take in an input file and read it word by word, then place all words inside a vector, sorted alphabetically and with no duplicates.

However, every time I try to manipulate my vector (i.e. - push_back) inside of certain loops, my program just segmentation faults.

Here's a snippet of my code:

void word_count(ifstream& input){
    string temp;
    vector<string> v;

    input >> temp; //set first variable
    v.push_back(temp);

    while (!input.eof()) { //I'm aware of the limitations while using !eof, this is just the way I am required to loop over a file
        input >> temp;

        for (vector<string>::iterator i = v.begin(); i != v.end(); i++) { //check entire vector for word
            if (*i == temp) { //just break and skip the word if it already exists
                break;
            }
            if (i == v.end() - 1) { //if the word doesn't exist yet
                for (vector<string>::iterator k = v.begin(); k != v.end(); k++) { //re-search the vector for the proper place
                    if (k == v.end() - 1) { //if at the end, just push_back the vector
                        v.push_back(temp); //Causes segmentation fault
                        break;
                    }
                    if ((*k < temp) && (*(k + 1) > temp)) { //find correct place and insert the word in the vector
                        v.insert(k, temp); //Also causes segmentation fault if execution even manages to get this far
                    }
                }
            }
        }
    }
}

The first push_back on line 5 is perfectly fine, I can copy and paste that multiple times without error. I can also push_back right after input >> temp (inside of the while loop) without error. However, if I try push_back under the 'k' loop, it segmentation faults. I'm totally stumped.

I've tried looking at the other vector related questions here on StackOverflow, but I don't really understand why I can (or cannot) use push_back in certain places.

Thanks for any help in advance!

Edit 1: I should mention that I tested it in VS 2019. The vector library file popped up, stating a "read-access-violation" exception was thrown. No segmentation faults (Or maybe that's VS's way of telling me a segmentation fault occurred?)

Edit 2: Modifying a vector invalidates iterators. I didn't know that, thanks for the help everyone!

Edit 3: I'm only allowed to use vectors, not sets or other containers. If I could use a set, I totally would.

5
  • If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated. This is from cppreference from push_back() method, and probably this is your problem Commented Feb 25, 2020 at 11:41
  • Are you allowed/supposed to use standard algorithms? If yes, this looks like a job for en.cppreference.com/w/cpp/algorithm/lower_bound to me Commented Feb 25, 2020 at 11:43
  • 1
    You should never modify a vector while iterating over it, as @Raffallo correctly mentioned (just to break it down a bit). Maybe think about using std::set to save you much work. It's ordered and contains only uniqe elements. It's like it's made for exactly this task. Commented Feb 25, 2020 at 11:44
  • Well, or you immediately stop iterating once you've modified it - break once you've inserted. Commented Feb 25, 2020 at 11:54
  • I did not know push_back invalidates iterators. Thank you guys! I'll have to double check the documentation on vectors. I'm not using <set> because I'm required to use <vector> as to "think outside the box" as said by my professor (even though, as you said, <set> is exactly perfect for this task). Commented Feb 25, 2020 at 22:13

3 Answers 3

2

When you modify vector iterators become invalid.

There are two reason:

  • when you push_back and std::vector::capacity is busted new block data is allocated and data are moved/copied to new buffer
  • when you add/remove items in middle old iterator can point to different item possibly not existing anymore.

There is quick way to fix it. When you do a modification you have to fetch updated value of iterator. poush_back doesn't have such functionality, but std::vector::insert returns iterator to new value and this iterator can be use to update for loop iterator.

I could fix you code, but it is so convoluted (to much indentation) that I wish to avoid that. You should first slice this code to smaller functions.

Instead salvage your code, here is my version:

template<typename Iter>
size_t count_unique_items(Iter begin, Iter end)
{
    using value_type = typename std::iterator_traits<Iter >::value_type;
    std::unordered_set<value_type> unique_items;

    std::copy(begin, end, std::inserter(unique_items, unique_items.end()));

    return unique_itmes.size();
}

size_t count_unique_words(std::istream& input)
{
    return count_unique_items(std::istream_iterator<std::string>{input}, {});
}

https://wandbox.org/permlink/bHji7JZoB7E9ZoLn

Sign up to request clarification or add additional context in comments.

1 Comment

Unfortunately, I'm not allowed to use <unordered_set>, just <vector>. Thank you for helping me though! I didn't know push_back invalidated iterators.
1

Modifying a vector as you're iterating over it may invalidate iterators, and then anything can happen.

But you're overcompicating things - since the vector is ordered, you don't need to first see if the string exists and then search for the correct position, you can look for the position directly.
(That you don't need to search twice is one of the discoveries you're supposed to make during this exercise.)

I would (since you're probably not supposed to use any functions from <algorithm> or such "advanced" features)

  • Break the loop when you reach the end or when you have found the item,
  • If you find an element greater than the item, you should insert before that position and stop.
    Luckily, insert takes an iterator to insert before, so you can use i.

Something like this:

for (vector<string>::iterator i = v.begin(); i != v.end() && *i != temp; ++i)
{
    if (*i > temp)
    {
        v.insert(i, temp);
        break;
    }
}

Note that the break means that i is not used for any comparisons after the insert, so the insertion is safe.

2 Comments

I didn't know modifying a vector invalidated vectors. Even though I break after push_back (under the k loop), is that still not allowed?
That only breaks out of the inner loop – the outer loop also has an iterator over the vector, and that's where the problem is.
0

As mentioned, you could use a std::set to store your unique words. You could populate it like this:

std::set<std::string> set_of_words(std::ifstream & input)
{
  std::set<std::string> words;

  std::string word;
  while (input >> word)
  {
    words.insert(word);
  }

  return words;
}

or you can use a std::vector as in your question. Using std::lower_bound from <algorithm> you could use it like this:

std::vector<std::string> vector_of_words(std::ifstream & input)
{
  std::vector<std::string> words;

  std::string word;
  while (input >> word)
  {
    auto pos = std::lower_bound(words.begin(), words.end(), word);
    if (pos == words.end())
    {
      words.push_back(word);
    }
    else
    {
      if (*pos != word)
      {
        words.insert(pos, word);
      }
    }
  }

  return words;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.