Python 3 - import variable into dictionary

Question

Am trying to get the output of the print command below into a dictionary (without success) so that I can subsequently export it to a CSV.

How can I get parseddata (output of print below) into a dictionary?

sample input file:

<html>
<body>
<p>{ success:true ,results:3,rows:[{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"N‌on-cumulative",Consolidated:"Non-Consolidated",FilingDate:"14-Aug-2015 15:39",SeqNumber:"1001577"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cu‌mulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"30-May-2015 14:37",SeqNumber:"129901"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cum‌ulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"17-Feb-2015 14:57",SeqNumber:"126171"}]}</p>
</body>
</html>

my code:

import requests
import re
from bs4 import BeautifulSoup
url = requests.get("http://. . .")
soup = BeautifulSoup(url.text, "lxml")
parseddata = soup.string.split(':[', 1)[1].lstrip(']')
print(parseddata)

the output of print(parseddata) is:

{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"14-Aug-2015 15:39",SeqNumber:"1001577"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"30-May-2015 14:37",SeqNumber:"129901"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"17-Feb-2015 14:57",SeqNumber:"126171"}]}

yurib, i have edited the post to show what parseddata looks like. thanks — zs_python
– zs_python, Commented Oct 7, 2015 at 22:57
@zs_python: can you provide a sample input file to process, such that people can run test cases against it. — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Oct 7, 2015 at 23:00

ShadowRanger · Accepted Answer · 2015-10-08 00:23:54Z

2

Aside from the stray close brace/bracket at the end, ~~this is valid JSON~~ this is valid YAML (I made a mistake in my initial answer; JavaScript objects can be declared without quoting the properties, but JSON the portable format doesn't allow that; YAML does).

Follow the instructions here to use PyYAML to parse the data. The manual split-ing and lstrip is hurting you and making this harder than it needs to be. Just get the text, then parse with yaml (which is a third party module that must be installed separately):

import requests
import yaml
from bs4 import BeautifulSoup

url = requests.get("http://. . .")
soup = BeautifulSoup(url.text, "lxml")
# Use safe_load over load to avoid opening security holes; YAML can do
# a lot of unsafe things if the input isn't trusted, but handling JS
# object literals can be done safely with safe_load
response_object = yaml.safe_load(soup.string.strip())
data_rows = response_object['rows']

for row in data_rows:
    ... do stuff with each returned row ...

You can read more on the PyYAML tutorial.

edited Oct 8, 2015 at 0:23

answered Oct 7, 2015 at 23:00

ShadowRanger

158k12 gold badges221 silver badges316 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

zs_python Over a year ago

thanks ShadowRanger, i guess the "stray close brace/bracket at the end" is the problem, how do i get rid of it please?

ShadowRanger Over a year ago

@zs_python: Anticipated that and added an example before you asked. :-)

ShadowRanger Over a year ago

Odds are, the original data is valid json, just with the object you're interested in as the sole entry in an array attribute of an object with only one attribute (holding the one element array). You could probably just json.loads the whole thing, then access and assign data_as_dict = whole_thing_as_dict['name_of_singleton_key'][0] and avoid your explicit split-ing and lstrip-ing.

zs_python Over a year ago

Thanks for helping remove the strays ShadowRanger. The above example throws me an error: JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

zs_python Over a year ago

I have just posted the sample input file in the question so that it gives a clearer picture of what i am trying to parse

|

Community · Accepted Answer · 2020-06-20 09:12:55Z

This looks like a key-value mapping, with ISIN a key and "INE134E01011" a value. But it is not JSON, because the keys are not quoted, nor is it YAML because the plain scalar keys (i.e. strings without quotes have to be be followed by colon + space (: ).

If you break the output string in parts ¹:

test_str = (
    '{ISIN:"INE134E01011",Ind:"-",'
    'Audited:"Un-Audited",'
    'Cumulative:"Non-cumulative",'
    'Consolidated:"Non-Consolidated",'
    'FilingDate:"14-Aug-2015 15:39",'
    'SeqNumber:"1001577"},'
    '{ISIN:"INE134E01011",'  # new mapping starts
    'Ind:"-",'
    'Audited:"Un-Audited",'
    'Cumulative:"Non-cumulative",'
    'Consolidated:"Non-Consolidated",'
    'FilingDate:"30-May-2015 14:37",'
    'SeqNumber:"129901"},'
    '{ISIN:"INE134E01011",'    # new mapping starts
    'Ind:"-",'
    'Audited:"Un-Audited",'
    'Cumulative:"Non-cumulative",'
    'Consolidated:"Non-Consolidated",'
    'FilingDate:"17-Feb-2015 14:57",'
    'SeqNumber:"126171"}]}'
)

it test equal to your input:

test_org = '{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"14-Aug-2015 15:39",SeqNumber:"1001577"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"30-May-2015 14:37",SeqNumber:"129901"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"17-Feb-2015 14:57",SeqNumber:"126171"}]}'
assert test_str == test_org

That split up makes it clear there are actually 3 mappings and that there is a trailing ] and }. The ] indicates that there is a list, which is consistent with having the 3 mappings seperated by comma. The matching [ went missing because you after you split on ':[', you lstrip() it away.

You can easily manipulate the string so YAML can parse it, but the result is a list ²:

import ruamel.yaml
test_str = '[' + test_str.replace(':"', ': "').rstrip('}')

data = ruamel.yaml.load(test_str)
print(type(data))

prints:

<class 'list'>

And since the dicts of which this list consists have keys in common you cannot just combine those without losing information.

You can either map this list to some key (that there is a colon in your split and the output has a trailing } is indication that is in the XML) or you can take a key with unique values (SeqNumber) and promote the value to a key in a dict replacing the list:

ddata = {}
for elem in data:
    k = elem.pop('SeqNumber')
    ddata[k] = elem

but I don't see a reason to go from a list to a dict if your final goal is a CSV file. If you take the output from the YAML parser you can do:

import csv
with open('output.csv', 'w', newline='') as fp:
    csvwriter = csv.writer(fp)
    csvwriter.writerow(data[0].keys())  # header of common dict keys
    for elem in data:
        csvwriter.writerow(elem.values())  # values

to get a CSV file with the following content:

ISIN,Ind,Consolidated,Cumulative,Audited,FilingDate
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,14-Aug-2015 15:39
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,30-May-2015 14:37
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,17-Feb-2015 14:57

¹ _{Instead of escaping the newlines with \, I use parenthesis to make the multi line definition into one string, that allows me to put comment on the lines more easily}
² _{instead of re-adding the '[', you should of course not strip it in the first place}

thanks Anthon. that was perfect, just did the work for me precisely ! really appreciate all the efforts you took to explain it to me too. Thanks @ShadowRanger, your efforts have added to my python learning and were really helpful too. This noob is overwhelmed by the efforts you guys put in to help me learn. Thank you, onwards !
@zs_python If this solves your issue, please consider accepting the answer (by clicking the marker next to top of this answer). That indicates to others that your problem has been solved (they might not read all the way down to your comment), and marks it as such in the database.
thanks @anthon for the hand holding, have accepted the answer as guided. see you guys around soon :)

Collectives™ on Stack Overflow

Python 3 - import variable into dictionary

2 Answers 2

9 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related