0

I have a long string containing attributes, in order to parse this I am attempting to extract the 'lists' from the string, I'm having some trouble particularly when dealing with multi-dimensional lists.

An Example String:

'a="foo",c=[d="test",f="bar",g=[h="some",i="text"],j="over"],k="here",i=[j="baz"]'

I would like to extract

c=[d="test",f="bar",g=[h="some",i="text"],j="over"]

and

i=[j="baz"]

from this string.

Is this possible using regex?

I've tried numerous different regex, this is my most recent one:

([^\W0-9]\w*=\[.*\])
4
  • WIll attributes(a,ck,i) change ? Commented Oct 26, 2022 at 16:27
  • @Bhargav Yes, these are just dummy variables Commented Oct 26, 2022 at 16:29
  • I remember seeing a very similar question a few days ago: stackoverflow.com/questions/74164066/… Commented Oct 26, 2022 at 16:30
  • @Swifty, This looks promising, thanks! Commented Oct 26, 2022 at 16:42

1 Answer 1

1

This string looks like a JSON object, with a few differences. My plan is to turn this into a JSON string, then parse it. After that, it is a matter of picking out what you want:

import json
import re

def str2obj(the_string):
    out = re.sub(r"(\w+)=", f'"\\1":', the_string)
    out = out.replace("[", "{").replace("]", "}")
    out = "{%s}" % out
    out = json.loads(out)
    return out


string_object = 'a="foo",c=[d="test",f="bar",g=[h="some",i="text"],j="over"],k="here",i=[j="baz"]'
json_object = str2obj(string_object)
print(json_object)
assert json_object["a"] == "foo"
assert json_object["c"] == {
    'd': 'test',
    'f': 'bar',
    'g': {'h': 'some', 'i': 'text'},
    'j': 'over'
}
assert json_object["k"] == "here"
assert json_object["i"] == {"j": "baz"}

Output:

{'a': 'foo', 'c': {'d': 'test', 'f': 'bar', 'g': {'h': 'some', 'i': 'text'}, 'j': 'over'}, 'k': 'here', 'i': {'j': 'baz'}}

Notes

  • The re.sub call replace a= with "a":
  • The replace calls turn the square brackets into the curly ones
  • There is no error checking in the code, I assume what you have is valid in term of balanced brackets
Sign up to request clarification or add additional context in comments.

1 Comment

Unfortunately the example I provided is actually a substring of the actual string, I actually attempted this approach before deciding to separate different parts of the string to parse it. My end goal is to convert the string into json. The actual string is formatted very strangely so in order to parse it I need to separate the lists from the other attributes which is what I'm trying to do here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.