3

Suppose I have string like

   Harry potter was written by J. K. Rowling

How to split string using was and by as a delimiter and get result in vector in C++?

I know split using multiple char but not using multiple string.

2
  • What about the regex token iterator?: en.cppreference.com/w/cpp/regex/regex_token_iterator Commented Apr 2, 2014 at 10:53
  • I am trying not to use regex till its possible but a regex answer if possible for given example will also be helpful. Commented Apr 2, 2014 at 10:57

2 Answers 2

3

If you use c++11 and clang there is a solution using a regex string tokenizer:

#include <fstream>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>

int main()
{
   std::string text = " Harry potter was written by J. K. Rowling.";

   std::regex ws_re("(was)|(by)"); 
   std::copy( std::sregex_token_iterator(text.begin(), text.end(), ws_re, -1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout, "\n"));


}

The output is :

Harry potter 
 written 
 J. K. Rowling.

Sadly gcc4.8 does not have the regex fully integrated. But clang does compile and link this correctly.

Sign up to request clarification or add additional context in comments.

1 Comment

BOOST is there if you can give example using boost then also it will be helpful.
1

Brute force approach, not boost, no c++11, optimizations more than welcome:

/** Split the string s by the delimiters, place the result in the 
    outgoing vector result */
void split(const std::string& s, const std::vector<std::string>& delims,
           std::vector<std::string>& result)
{
    // split the string into words
    std::stringstream ss(s);
    std::istream_iterator<std::string> begin(ss);
    std::istream_iterator<std::string> end;
    std::vector<std::string> splits(begin, end);

    // then append the words together, except if they are delimiter
    std::string current;
    for(int i=0; i<splits.size(); i++)
    {
        if(std::find(delims.begin(), delims.end(), splits[i]) != delims.end())
        {
            result.push_back(current);
            current = "";
        }
        else
        {
            current += splits[i] + " " ;
        }
    }

    result.push_back(current.substr(0, current.size() - 1));
}

1 Comment

The resulting whitespace at the end and beginning of tokens is not correct. E.g. for the example I get 'Harry potter ','written ','J. K. Rowling ' instead of 'Harry potter ',' written ',' J. K. Rowling'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.