0

I am probably able to solve this myself if I had the time to investigate. I've been trying different things but I can't get it to work! I am doing my master in Marketing, and we are expected to be able to code in Python very basically to parse a dataset (json) into an organised text file that can be used for further analysis.

We have a dataset with a lot of missing values. What I want to be parsed is this: artist, mbid (music brainz artist id), event data, venue name, city.

This is (part of) the script I have written for that:

for event in setlists:
    eventdate = event.get(u'@eventDate')
    venuename = event.get(u'venue').get(u'@name')
    mbid = event.get(u'artist').get(u'@mbid')
    artistname = event.get(u'artist').get(u'@name')
    city = event.get(u'venue').get(u'city').get(u'@name')

    f = open(parse_file, 'a')
    f.write(artistname+'\t'+mbid+'\t'+eventdate+'\t'+venuename+'\t'+city+'\n')
    f.close()

This script works like a charm, except for that it leaves out entries for which there are missing values, e.g. no city.

I want it to report it to a line of text anyway, and print "missing" for the info that is missing.

I can't get it to work and I don't know where to start either. I tried things like this:

f = open(parse_file, 'a')
try: f.write(artistname) except: continue try: f.write(mbid) except:     continue...
f.close()

Every line in the parsed file should like like this:

artistname mbid eventdate venuename location

I did try to put everything on different lines but then the problem was the output was vertically and not horizontally for each event.

3
  • 2
    Could you add sample JSON file and desired output? Commented May 22, 2016 at 12:09
  • You must put your try: except: statements on different lines. That has nothing to do with the formatting of the file output. Commented May 22, 2016 at 12:38
  • I tried doinf that Keozon. Try to get .... except, continue and then another try and except. And then when I f = open(parse_file, 'a') f.write(eventdate+'\t'+city+'\n') f.close() it will still only output the combinations where both variables are present... Commented May 22, 2016 at 12:54

2 Answers 2

1

So this is definetly not the right way to do this, but since you're in a hurry...

for event in setlists:
    eventdate = event.get(u'@eventDate', 'missing')
    venuename = event.get(u'venue', {u'@name': 'missing'}).get(u'@name', 'missing')
    mbid = event.get(u'artist', {u'@mbid': 'missing'}).get(u'@mbid', 'missing')
    artistname = event.get(u'artist', {u'@name': 'missing'}).get(u'@name', 'missing')
    city = event.get(u'venue').get(u'city', {u'@name': 'missing'}).get(u'@name', 'missing')

<etc>

The idea is to supply the default arguments to the .get such that your nested .get methods have something to .get :P

Sign up to request clarification or add additional context in comments.

2 Comments

mbid = event.get(u'artist', {u'@mbid': 'missing'}).get(u'@mbid', 'missing') could be simplified to just mbid = event.get(u'artist', {}).get(u'@mbid', 'missing') - that is, the first call to get just needs to return an empty dict, since you give the default arg on the second call to get. Do this throughout and it cleans up the code quite a bit. And I don't know why you say this is not the right way to do this, I think it meets the requirements just fine, with a minimum of fuss.
That does clean it up quite a bit, thank you. And you're also right about it meeting the requirements just fine, I guess part of me leans towards preprocessing the data a bit more so that the .gets are unnecessary, but I'm probably just being overly fussy :P
0

Try - except on first block, where you are fetching data in your example city = event.get(u'venue').get(u'city').get(u'@name') - get fails, so the same happens to processing.

UPDATE:

According to provided data - this is what works. Please note that provided data is not JSON file. It is set of rows, where each of them is JSON file... That is why I did readlines, and then process each of them. It can be done in more pythonic way, more memory efficient, but I wanted to show how to solve the problem. Hope it helps:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import json
from StringIO import StringIO

with open('sample.json.txt') as data_file:
    content = data_file.readlines()

f = open('out_ok.txt', 'a')
errors  = open('out_errors.txt', 'a')

try: 
    for ctx in content:
        line = StringIO(ctx)
        try:
            json_data = json.load(line)
        except UnicodeDecodeError:
            errors.write('unicode: ' + ctx)
            continue
        event = json_data.get('setlists').get('setlist')
        try:
            eventdate = event.get(u'@eventDate')
            venuename = event.get(u'venue').get(u'@name')
            mbid = event.get(u'artist').get(u'@mbid')
            artistname = event.get(u'artist').get(u'@name')
            city = event.get(u'venue').get(u'city').get(u'@name')
            f.write(artistname+'\t'+mbid+'\t'+eventdate+'\t'+venuename+'\t'+city+'\n')
        except AttributeError:
            errors.write('json: ' + json.dumps(event))
finally:
    f.close()
    errors.close()

2 Comments

First of all, thanks for replying! If I run that, i get no output at all, it gives an error on every line!
Well, that way I can't help you. I mean - provide sample events - that I can test it, and update my answer...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.