1

I am working on a lexer. I have a Token struct, which looks like this:

struct Token {
    enum class Type { ... };
    
    Type type;
    std::string_view lexeme;
}

The Token's lexeme is just a view to a small piece of the full source code (which, by the way, is also std::string_view).

The problem is that I need to re-map special characters (for instance, '\n'). Storing them as-is isn't a nice solution.

I've tried replacing lexeme's type with std::variant<std::string, std::string_view>, but it has quickly become spaghetti code, as every time I want to read the lexeme (for example, to check if the type is Bool and lexeme is "true") it's a big pain.

Storing lexeme as an owning string won't solve the problem.

By the way, I use C++20; maybe there is a nice solution for it?

5
  • 1
    Unfortunately, C++ does not have a reputation for being nice. There can be various solutions to this, but they would be highly context-dependent, and tailored to the rest of the code. Commented Aug 14, 2023 at 19:03
  • 1
    As std::string and std::string_view have similar interface, it seems that std::visit code would be simple... Commented Aug 14, 2023 at 19:08
  • 1
    Please show an example where std::variant<std::string, std::string_view> is causing you trouble. Assuming the owning/non-owning combination works out for your use case, I don't see why this should result in particularly convoluted code. Of course working with std::variant is far from being as nice as in languages with proper sum types. Commented Aug 14, 2023 at 19:35
  • Is using std::string_view here worth the pain, in terms of performance (and maybe memory overhead, as well as, effectively, risk of dangling pointers)? Maybe SSO (short string optimisation) will come to your aid. Commented Aug 14, 2023 at 19:55
  • 1
    I would solve this in another way: always have both a std::string_view and an std::string. The string_view can point to the original source text or the Token's own std::string; the ` std::string` in the Token is empty if unnecessary. Commented Aug 15, 2023 at 7:53

2 Answers 2

2

It seems to me that all you need is to encapsulate the variant to provide a uniform interface to both. Since it is dirt-cheap to convert an std::string to an std::string_view and it is equally cheap to copy an std::string_view, you can just create a method for that and access the content like that.

struct OptOwnString
{
    using variant_t = std::variant<std::string, std::string_view>;
    variant_t value;

    std::string_view view() const noexcept
    {
        /**
         * Note: noexcept since it is effectively impossible to
         * make this particular variant valueless_by_exception
         */
        return std::visit([](auto const& v) {
              return std::string_view(v); }, value);
    }
};

int main()
{
    OptOwnString owning { std::string("foo") };
    std::cout << owning.view() << '\n';
    OptOwnString borrowed { owning.view() };
    std::cout << borrowed.view() << '\n';
}
Sign up to request clarification or add additional context in comments.

2 Comments

view() could be const noexcept, and to reduce emitted code size, you could use *std::get_if<std::string_view>(value).
Or, much shorter implementation of view (which is also more resilient to future changes): return std::visit([](auto const& v) { return std::string_view(v); }, value); Which, if you happen to have a function object lying around that does static_cast, can read much nicer: std::visit(static_cast_<std::string_view>, value)
1

You could just use std::string

Firstly, a std::string could be used in a Token just as well as a std::string_view. This might not be as costly as you think, because std::string in all C++ standard libraries has SSOs (small string optimizations).

This means that short tokens like "const" wouldn't be allocated on the heap; the characters would be stored directly inside the container. Before bothering with std::string_view and std::variant, you might want to measure whether allocations are even being a performance issue. Otherwise, this is a case of premature optimization.

If you insist on std::variant ...

User @Homer512 has provided a solid solution already. Rather than using the std::variant directly, you could create a wrapper around it which provides a string-like interface for both std::string and std::string_view.

This is easy to do, because the name and meaning of most member functions is identical for both classes. That also makes them easy to use through std::visit.

struct MaybeOwningString
{
    using variant_type = std::variant<std::string, std::string_view>;
    using size_type = std::string_view::size_type;

    variant_type v;

    // main member function which grants access to either alternative as a view
    std::string_view view() const noexcept {
        return std::visit([](const auto& str) -> std::string_view {
            return str;
        }, v);
    }

    // various helper functions which expose commonly used member functions
    bool empty() const noexcept {
        // helper functions can be implemented with std::visit, but this is verbose
        return std::visit([](const auto& str) {
            return str.empty();
        }, v);
    }

    size_type size() const noexcept {
        // helper functions can also be implemented by using view()
        return view().size();
    }

    // ...
};

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.