0

I am given a raw string which is a path or "direction" to a string in JSON. I need the following string converted to a list containing dictionaries..

st = """data/policy/line[Type="BusinessOwners"]/risk/coverage[Type="FuelHeldForSale"]/id"""

The list should look like this

paths = ['data','policy','line',{'Type':'BusinessOwners'},'risk','coverage',{"Type":"FuelHeldForSale"},"id"]

I then iterate over this list to find the object in the JSON (which is in a Spark RDD)

I attempted st.split(\) which gave me

st.split('/')
Out[370]: 
['data',
 'policy',
 'line[Type="BusinessOwners"]',
 'risk',
 'coverage[Type="FuelHeldForSale"]',
 'CalculationDisplay']

But how do I convert and split items like 'line[Type="BusinessOwners"]' to 'line',{'Type':'BusinessOwners'} ?

5
  • Hi. Did you try using eval()? Can you try this out: st_new=eval(st) Then print st_new. I hope this works.! Commented Mar 16, 2018 at 4:35
  • Hi! That did not work @ShrinivasDeshmukh data/policy/line[Type="BusinessOwners"]/risk/coverage[Type="FuelHeldForSale"]/id ^ SyntaxError: invalid syntax Commented Mar 16, 2018 at 4:37
  • Please refer to this link, a similar problem has been discussed here: stackoverflow.com/questions/36068779/… Commented Mar 16, 2018 at 4:43
  • @mdeonte001 --- You should be a lot more specific as to what you want if you want people to use their time to solve your problem. If you want a dictionary in your list then state it instead of leaving others to read your mind! Commented Mar 16, 2018 at 5:25
  • @MichaelSwartz please see above, i state 'list containing dictionaries..' and in my example i show a dictionary. Commented Mar 16, 2018 at 15:37

4 Answers 4

1
import json

first_list = st.replace('[', '/{"').replace(']', '}').replace('="', '": "').split('/')
[item if not "{" in item  else json.loads(item) for item in first_list]

or using ast.literal_eval

import ast

[item if not "{" in item  else ast.literal_eval(item) for item in first_list]


out:
['data',
 'policy',
 'line',
 {'Type': 'BusinessOwners'},
 'risk',
 'coverage',
 {'Type': 'FuelHeldForSale'},
 'id']
Sign up to request clarification or add additional context in comments.

6 Comments

Hi, I ran this and I got the error AttributeError: 'str' object has no attribute 'literal_eval' - what is ast.literal_eval(item) should that be st.literal?
sorry. you need to import ast first. please check again.
Not a huge fan of using literal_eval, but this is much better.
This works! Thank you. @MadPhysicist care share why you dislike literal_eval?
It certainly does work. I am wary of literal_eval because of things like this gist. I am not 100% sure if it can be done exactly with literal_eval, but I would rather not take a chance.
|
1

Would be more efficient if it wasn't a 1 liner, but I'll let you figure it out from here. Probably wanna come up with a more robust regex based parsing engine if your input varies more than your given schema. Or just use a standardized data model like JSON.

[word if '=' not in word else {word.split('=')[0]:word.split('=')[1]} for word in re.split('[/\[]', st.replace(']','').replace('"',''))]

['data', 'policy', 'line', {'Type': 'BusinessOwners'}, 'risk', 'coverage', {'Type': 'FuelHeldForSale'}, 'id']

Comments

0

Let's do it in one line :

import re

pattern=r'(?<=Type=)\"(\w+)'
data="""data/policy/line[Type="BusinessOwners"]/risk/coverage[Type="FuelHeldForSale"]/id"""


print([{'Type':re.search(pattern,i).group().replace('"','')} if '=' in i else i for i in re.split('\/|\[',data)])

output:

['data', 'policy', 'line', {'Type': 'BusinessOwners'}, 'risk', 'coverage', {'Type': 'FuelHeldForSale'}, 'id']

Comments

0

Regular expressions may be a good tool here. It looks like you want to transform elements that look like text1[text2="text3"] with `text1, {text2: text3}. The regex would look something like this:

(\w+)\[(\w+)=\"(\w+)\"\]

You can modify this expression in any number of ways. For example, you could use something other than \w+ for the names, and insert \s* to allow optional whitespace wherever you want.

The next thing to keep in mind is that when you do find a match, you need to expand your list. The easiest way to do that would be to just create a new list and append/extend it:

import re

paths = []
pattern = re.compile(r'(\w+)\[(\w+)=\"(\w+)\"\]')
for item in st.split('/'):
    match = pattern.fullmatch(item)
    if match:
        paths.append(match.group(1))
        paths.append({match.group(2): match.group(3)})
    else:
        paths.append(item)

This makes a paths that is

['data', 'policy', 'line', {'Type': 'BusinessOwners'}, 'risk', 'coverage', {'Type': 'FuelHeldForSale'}, 'id']

[IDEOne Link]

I personally like to split the functionality of my code into pipelines of functions. In this case, I would have the main loop accumulate the paths list based on a function that returned replacements for the split elements:

def get_replacement(item):
    match = pattern.fullmatch(item)
    if match:
        return match.group(1), {match.group(2): match.group(3)}
    return item,

paths = []
for item in st.split('/'):
    paths.extend(get_replacement(item))

The comma in return item, is very important. It makes the return value into a tuple, so you can use extend on whatever the function returns.

[IDEOne Link]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.