2

I have a file which contains some "entity" data in Valve's format. It's basically a key-value deal, and it looks like this:

{
"world_maxs" "3432 4096 822"
"world_mins" "-2408 -4096 -571"
"skyname" "sky_alpinestorm_01"
"maxpropscreenwidth" "-1"
"detailvbsp" "detail_sawmill.vbsp"
"detailmaterial" "detail/detailsprites_sawmill"
"classname" "worldspawn"
"mapversion" "1371"
"hammerid" "1"
}
{
"origin" "553 -441 322"
"targetname" "tonemap_global"
"classname" "env_tonemap_controller"
"hammerid" "90580"
}

Each pair of {} counts as one entity, and the rows inside count as KeyValues. As you can see, it's fairly straightforward.

I want to process this data into a vector<map<string, string> > in C++. To do this, I've tried using regular expressions that come with Boost. Here is what I have so far:

static const boost::regex entityRegex("\\{(\\s*\"([A-Za-z0-9_]+)\"\\s*\"([^\"]+)\")+\\s*\\}");
boost::smatch what;
while (regex_search(entitiesString, what, entityRegex)) {
    cout << what[0] << endl;
    cout << what[1] << endl;
    cout << what[2] << endl;
    cout << what[3] << endl;
    break; // TODO
}

Easier-to-read regex:

\{(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)")+\s*\}

I'm not sure the regex is well-formed for my problem yet, but it seems to print the last key-value pair (hammerid, 1) at least.

My question is, how would I go about extracting the "nth" matched subexpression within an expression? Or is there not really a practical way to do this? Would it perhaps be better to write two nested while-loops, one which searches for the {} patterns, and then one which searches for the actual key-value pairs?

Thanks!

3
  • 1
    Don't use Regex. Use a grammar. I'll show you in a minute Commented Jun 11, 2015 at 13:16
  • 1
    Boost regex will not perform well here. You'd first need to extract the {...} part, then inside it, use "([A-Za-z0-9_]+?)"\s*?"([^"]+?)". Like this. Commented Jun 11, 2015 at 13:24
  • Very interesting website! Thanks for that. I'll see what @sehe has to say about grammars. Commented Jun 11, 2015 at 13:25

3 Answers 3

1

Using a parser generator you can code a proper parser.

For example, using Boost Spirit you can define the rules of the grammar inline as C++ expressions:

    start  = *entity;
    entity = '{' >> *entry >> '}';
    entry  = text >> text;
    text   = '"' >> *~char_('"') >> '"';

Here's a full demo:

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>

using Entity    = std::map<std::string, std::string>;
using ValveData = std::vector<Entity>;

namespace qi = boost::spirit::qi;

template <typename It, typename Skipper = qi::space_type>
struct Grammar : qi::grammar<It, ValveData(), Skipper>
{
    Grammar() : Grammar::base_type(start) {
        using namespace qi;

        start  = *entity;
        entity = '{' >> *entry >> '}';
        entry  = text >> text;
        text   = '"' >> *~char_('"') >> '"';

        BOOST_SPIRIT_DEBUG_NODES((start)(entity)(entry)(text))
    }
  private:
    qi::rule<It, ValveData(),                           Skipper> start;
    qi::rule<It, Entity(),                              Skipper> entity;
    qi::rule<It, std::pair<std::string, std::string>(), Skipper> entry;
    qi::rule<It, std::string()>                                  text;
};

int main()
{
    using It = boost::spirit::istream_iterator;
    Grammar<It> parser;
    It f(std::cin >> std::noskipws), l;

    ValveData data;
    bool ok = qi::phrase_parse(f, l, parser, qi::space, data);

    if (ok) {
        std::cout << "Parsing success:\n";

        int count = 0;
        for(auto& entity : data)
        {
            ++count;
            for (auto& entry : entity)
                std::cout << "Entity " << count << ": [" << entry.first << "] -> [" << entry.second << "]\n";
        }
    } else {
        std::cout << "Parsing failed\n";
    }

    if (f!=l)
        std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}

Which prints (for the input shown):

Parsing success:
Entity 1: [classname] -> [worldspawn]
Entity 1: [detailmaterial] -> [detail/detailsprites_sawmill]
Entity 1: [detailvbsp] -> [detail_sawmill.vbsp]
Entity 1: [hammerid] -> [1]
Entity 1: [mapversion] -> [1371]
Entity 1: [maxpropscreenwidth] -> [-1]
Entity 1: [skyname] -> [sky_alpinestorm_01]
Entity 1: [world_maxs] -> [3432 4096 822]
Entity 1: [world_mins] -> [-2408 -4096 -571]
Entity 2: [classname] -> [env_tonemap_controller]
Entity 2: [hammerid] -> [90580]
Entity 2: [origin] -> [553 -441 322]
Entity 2: [targetname] -> [tonemap_global]
Sign up to request clarification or add additional context in comments.

Comments

1

I think doing it all with one regex expression is hard because of the variable number of entries inside each entity {}. Personally I would consider using simply std::readline to do your parsing.

#include <map>
#include <vector>
#include <string>
#include <sstream>
#include <iostream>

std::istringstream iss(R"~(
    {
    "world_maxs" "3432 4096 822"
    "world_mins" "-2408 -4096 -571"
    "skyname" "sky_alpinestorm_01"
    "maxpropscreenwidth" "-1"
    "detailvbsp" "detail_sawmill.vbsp"
    "detailmaterial" "detail/detailsprites_sawmill"
    "classname" "worldspawn"
    "mapversion" "1371"
    "hammerid" "1"
    }
    {
    "origin" "553 -441 322"
    "targetname" "tonemap_global"
    "classname" "env_tonemap_controller"
    "hammerid" "90580"
    }
)~");

int main()
{
    std::string skip;
    std::string entity;

    std::vector<std::map<std::string, std::string> > vm;

    // skip to open brace, read entity until close brace
    while(std::getline(iss, skip, '{') && std::getline(iss, entity, '}'))
    {
        // turn entity into input stream
        std::istringstream iss(entity);

        // temporary map
        std::map<std::string, std::string> m;

        std::string key, val;

        // skip to open quote, read key to close quote
        while(std::getline(iss, skip, '"') && std::getline(iss, key, '"'))
        {
            // skip to open quote read val to close quote
            if(std::getline(iss, skip, '"') && std::getline(iss, val, '"'))
                m[key] = val;
        }

        // move map (no longer needed)
        vm.push_back(std::move(m));
    }

    for(auto& m: vm)
    {
        for(auto& p: m)
            std::cout << p.first << ": " << p.second << '\n';
        std::cout << '\n';
    }
}

Output:

classname: worldspawn
detailmaterial: detail/detailsprites_sawmill
detailvbsp: detail_sawmill.vbsp
hammerid: 1
mapversion: 1371
maxpropscreenwidth: -1
skyname: sky_alpinestorm_01
world_maxs: 3432 4096 822
world_mins: -2408 -4096 -571

classname: env_tonemap_controller
hammerid: 90580
origin: 553 -441 322
targetname: tonemap_global

3 Comments

I was on vacation and forgot to accept this as the right answer; it was perfect for my needs, after adjusting the iterators a bit I got the perfect result: Thanks!
@bombax FYI, you accepted a different answer. Still in the vacation mood?
Sorry was going through questions and thought I left this one unanswered, I must have marked the wrong one before...
0

I would have written it like this:

^\{(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)")+\s*\}$

Or splited the regex into two strings. First match the curly braces, then loop through the content of curly braces line for line.

Match curly braces: ^(\{[^\}]+)$ Match the lines: ^(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)"\s*)$

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.