Parsing a complex string

Question

I need to read a string in a following order:

Read any amount of numbers separated by spaces, discard all but the last one, saving it to n
Read a space followed by n characters followed by space, save only the characters
Read two more numbers separated by spaces and save them as well

I thought of using string stream to read the numbers and stop at the string, but I don't know how to predict a string in string stream and stop reading numbers without "reading" the string as number and killing the string stream.
How predict a string and stop reading numbers before it?
Is there a better way to read this whole pattern?
I use C++11.

Edit:
Example input:

1 2 3 4 6 abc de 7 8

Excepted output:

The string: 'abc de'
Number 1: 7
Number 2: 8

Sounds like a nice case of regex to me :). An input / output sample would be nice. — FailedDev
– FailedDev, Commented Nov 30, 2011 at 7:14

Some programmer dude · Accepted Answer · 2011-11-30 08:08:18Z

3

There are a couple options as I see it: Either use regular expression, or go through the input character by character using some kind of state-machine.

Edit

About that state-machine... Maybe something like this:

// Pre-conditions: "str" is a std::string containing the whole string to be parsed

enum class states
{
    GET_LENGTH,           // Looking for the embedded string length
    GET_LENGTH_OR_STRING, // Get the embedded string, or the length
    GET_STRING,           // Getting the embedded string
    GET_NUMBER_1,         // Look for the first number after the string
    GET_NUMBER_2,         // Look for the second number after the string
};

int         len = 0; // Length of the embedded string
std::string tmp;     // Temporary string
int         n1, n2;  // The numbers after the string
states      state = GET_LENGTH;

for (auto ci = str.begin(); ci != str.end(); )
{
    // Skip whitespace
    while (isspace(*ci))
        ci++;

    switch (state)
    {
    case GET_LENGTH:
        while (isdigit(*ci))
            tmp += *ci++;
        len = strtol(tmp.c_str(), nullptr, 10);

        state = GET_LENGTH_OR_STRING;
        break;

    case GET_LENGTH_OR_STRING:
        if (isdigit(*ci))
            state = GET_LENGTH;
        else
            state = GET_STRING;
        break;

    case GET_STRING:
        tmp = std::string(ci, ci + len);
        ci += len;
        tmp = "";
        state = GET_NUMBER_1;
        break;

    case GET_NUMBER_1:
        while (isdigit(*ci))
            tmp += *ci++;
        n1 = strtol(tmp.c_str(), nullptr, 10);
        break;

    case GET_NUMBER_2:
        while (isdigit(*ci))
            tmp += *ci++;
        n2 = strtol(tmp.c_str(), nullptr, 10);
        break;
    }
}

Disclaimer: This is not tested, just written "as is" directly in the browser.

The code can probably be better, like the states for getting the length and the trailing numbers are basically the same and could be put in separate functions to share it.

edited Nov 30, 2011 at 8:08

answered Nov 30, 2011 at 7:24

Some programmer dude

411k36 gold badges420 silver badges655 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Daniel Over a year ago

How would I use regular expression to read n numbers while n is a number earlier in the string?

Some programmer dude Over a year ago

@Dani Just saw your comment to Jans answer about spaces in the n characters parts. That makes it harder to use regexps, so I guess state-machine might be the way to go. Will think about how to do it over breakfast.

fjardon · Accepted Answer · 2011-12-01 11:00:18Z

1

You can do that without using any regex, just by using standard C++ streams functionality. Here is an example using std::cin as input stream, but you can use a string stream if you want to read from a string.

#include <iostream>
#include <iomanip>
#include <vector>

int main(int argc, char* const argv[]) {

        int n,tmp;

        /// read integers, discarding all but the last
        while(std::cin >> tmp)
                n = tmp;
        if(std::cin.bad()) {
                std::cout << "bad format 1" << std::endl;
                return -1;
        }

        /// skip whitespaces
        std::cin >> std::ws;
        std::cin.clear();

        /// read a string of 'n' characters
        std::vector<char> buffer(n+1, '\0');
        if(! std::cin.read(buffer.data(), n) ) {
                std::cout << "bad format 2" << std::endl;
                return -1;
        }
        std::string s(buffer.data());

        /// Read 2 numbers
        int nb1, nb2;
        if(! (std::cin >> nb1 >> nb2)) {
                std::cout << "bad format 3" << std::endl;
                return -1;
        }

        std::cout << "The string: " << s << std::endl;
        std::cout << "Number 1: " << nb1 << std::endl;
        std::cout << "Number 2: " << nb2 << std::endl;
        return 0;
}

edited Dec 1, 2011 at 11:00

answered Dec 1, 2011 at 10:21

fjardon

8,03624 silver badges35 bronze badges

2 Comments

Daniel Over a year ago

the string doesn't contain digits. my question is when you do cin >> someInt; while the input contains 'abc', wouldn't cin just raise the bad bit and die?

fjardon Over a year ago

No, in that case it raises the 'fail' bit.

Daniel · Accepted Answer · 2011-11-30 07:15:24Z

1

I don't know C++ enough, but can't you:

parse the entire input on the space separator
go through that list:
- while a number, store the number in the same var
- store the n chars (I'm assuming you mean there's a string there)
- store the last two numbers

edited Nov 30, 2011 at 7:15

Daniel

31.8k19 gold badges87 silver badges145 bronze badges

answered Nov 30, 2011 at 7:13

Jan Doggen

9,08615 gold badges82 silver badges159 bronze badges

1 Comment

Daniel Over a year ago

there might be spaces in the n chars so it will problematic to break it on space

Gene Bushuyev · Accepted Answer · 2011-12-01 09:03:36Z

Since you are using C++11 compiler you can probably write your grammar in AXE:

// input text
std::string txt("1 2 3 4 6 abc de 7 8");

// assume spaces are ' ' and tabs
auto space = axe::r_any(" \t");

// create a number rule that stores matched decimal numbers in 'n'
int n = 0;
auto number_rule = axe::r_decimal(n) % +space;

// create a string rule, which stops when reaching 'n' characters
std::string s;
int count = 0;
auto string_rule = space & 
    *(axe::r_any() & axe::r_bool([&](...){ return n > count++; })) >> s;

// tail rule for two decimal values
int n1 = 0, n2 = 0;
auto tail_rule = +space & axe::r_decimal(n1) & +space & axe::r_decimal(n2);

// a rule for entire input text
auto rule = number_rule & string_rule & tail_rule;
// run parser
rule(txt.begin(), txt.end());
// dump results, you should see: n=6, s=abc de, n1=7, n28
std::cout << "\nn=" << n << ", s=" << s << ", n1=" << n1 << ", n2" << n2;

Collectives™ on Stack Overflow

Parsing a complex string

4 Answers 4

Edit

2 Comments

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Edit

2 Comments

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related