1

Im trying to extract some data from a large batch of files and convert them to a specific (JSON) format for importing into a database using Django Fixtures

I've been able to get this far:

'{ {\n "pk":2,\n "model": trials.conditions,\n "fields": {\n "trial_id": NCT00109798,\n "keyword": Brain and Central Nervous System Tumors,\n }{\n "pk":3,\n "model": trials.conditions,\n "fields": {\n "trial_id": NCT00109798,\n "keyword": Lymphoma,\n }{\n "pk": 2,\n "model": trials.criteria,\n "fields": {\n "trial_id": NCT00109798,\n "gender": Both,\n "minimum_age": 18 Years,\n "maximum_age": N/A,\n "healthy_volunteers": No,\n "textblock": ,\n }\n\t\t"pk":2,\n\t\t"model": trials.keyword,\n\t\t"fields": {\n\t\t"trial_id": NCT00109798,\n\t\t"keyword": primary central nervous system non-Hodgkin lymphoma,\n\t\t}\n\t\t

...many lines later.....

After completion of study treatment, patients are followed every 3 months for 1 year, every\n 4 months for 1 year, and then every 6 months for 3 years.\n\n PROJECTED ACCRUAL: A total of 6-25 patients will be accrued for this study.\n ,\n "overall_status": Recruiting,\n "phase": Phase 2,\n "enrollment": 25,\n "study_type": Interventional,\n "condition": 2,3,\n "criteria": 1,\n "overall_contact": testdata,\n "location": 4,\n "lastchanged_date": March 31, 2010,\n "firstreceived_date": May 3, 2005,\n "keyword": 2,3,\n "condition_mesh": ,\n }\n \n {\n "pk": testdata,\n "model": trials.contact,\n "fields": {\n "trial_id": NCT00109798,\n "last_name": Pamela Z. New, MD,\n "phone": ,\n "email": ,\n }}'

The output actually needs to look like this:

{
    "pk": trial_id,
    "model": trials.trial,
    "fields": {
            "trial_id": trial_id,
            "brief_title": brief_title,
            "official_title": official_title,
            "brief_summary": brief_summary,
            "detailed_Description": detailed_description,
            "overall_status": overall_status,
            "phase": phase,
            "enrollment": enrollment,
            "study_type": study_type,
            "condition": _______________,
            "elligibility": elligibility,
            "criteria": ______________,
            "overall_contact": _______________,
            "location": ___________,
            "lastchanged_date": lastchanged_date,
            "firstreceived_date": firstreceived_date,
            "keyword": __________,
            "condition_mesh": condition_mesh,
    }

    "pk": null,
    "model": trials.locations,
    "fields": {
           "trials_id": trials_id,
           "facility": facility,
           "city": city,
           "state": state,
           "zip": zip,
           "country": country,
    }

Any advice would be much appreciated.

3
  • You might have a look at the code for the management command 'dumpdata' in the source code.djangoproject.com/browser/django/trunk/django/core/… because it has the option to indent output which I gather you are trying to do Commented Nov 6, 2011 at 2:21
  • Formatting and indenting is irrelevant in JSON. Commented Nov 6, 2011 at 12:33
  • @pastylegs Unfortunately the data aren't coming from the DB so I can't use dumpdata. Im taking an XML file and pulling out the relevant fields and outputting them in the Fixture JSON format using python. I'm basically returning one giant string with line breaks. Commented Nov 6, 2011 at 18:16

2 Answers 2

3

Alternative to json.dumps indent parameter:

Python has a pretty printer at http://docs.python.org/library/pprint.html. It is extremely simple to use but only pretty prints python objects (You can't give it a json string and expect formatted output)

Eg.

pydict = {"name":"Chateau des Tours Brouilly","code":"chateau-des-tours-brouilly-2009-1","region":"France > Burgundy > Beaujolais > Brouilly","winery":"Chateau Des Tours","winery_id":"chateau-des-tours","varietal":"Gamay","price":"14.98","vintage":"2009","type":"Red Wine","link":"http://www.snooth.com/wine/chateau-des-tours-brouilly-2009-1/","tags":"colorful, mauve, intense, purple, floral, violet, lively, rich, raspberry, berry","image":"http://ei.isnooth.com/wine/b/7/8/wine_6316762_search.jpeg","snoothrank":3,"available":1,"num_merchants":10,"num_reviews":1}
from pprint import pprint
pprint(pydict)

The output is

{'available': 1,
 'code': 'chateau-des-tours-brouilly-2009-1',
 'image': 'http://ei.isnooth.com/wine/b/7/8/wine_6316762_search.jpeg',
 'link': 'http://www.snooth.com/wine/chateau-des-tours-brouilly-2009-1/',
 'name': 'Chateau des Tours Brouilly',
 'num_merchants': 10,
 'num_reviews': 1,
 'price': '14.98',
 'region': 'France > Burgundy > Beaujolais > Brouilly',
 'snoothrank': 3,
 'tags': 'colorful, mauve, intense, purple, floral, violet, lively, rich, raspberry, berry',
 'type': 'Red Wine',
 'varietal': 'Gamay',
 'vintage': '2009',
 'winery': 'Chateau Des Tours',
 'winery_id': 'chateau-des-tours'}
Sign up to request clarification or add additional context in comments.

Comments

1

There is a pretty printer in the json module. Try something like this, print json.dumps(s, indent=4).

>>> s = {'pk': 5678, 'model': 'trial model', 'fields': {'brief_title': 'a short title', 'trial_id':    1234}}

>>> print json.dumps(s, indent=4)
{
    "pk": 5678, 
    "model": "trial model", 
    "fields": {
        "brief_title": "a short title", 
        "trial_id": 1234
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.