Convert non-nested json to csv file?

Question

I am working with a non-nested json file, the data is from reddit. I am trying to convert it to csv file using python. Each row is not having the same fields and therefore keep getting the error as:

JSONDecodeError: Extra data: line 2 column 1

Here is the code:

import csv
import json
import os

os.chdir('c:\\Users\\Desktop')
infile = open("data.json", "r")
outfile = open("outputfile.csv", "w")

writer = csv.writer(outfile)

for row in json.loads(infile.read()):
    writer.writerow(row)

Here are few lines from the data:

{"author":"i_had_an_apostrophe","body":"\"It's not your fault.\"","author_flair_css_class":null,"link_id":"t3_5c0rn0","subreddit":"AskReddit","created_utc":1478736000,"subreddit_id":"t5_2qh1i","parent_id":"t1_d9t3q4d","author_flair_text":null,"id":"d9tlp0j"}
{"id":"d9tlp0k","author_flair_text":null,"parent_id":"t1_d9tame6","link_id":"t3_5c1efx","subreddit":"technology","created_utc":1478736000,"subreddit_id":"t5_2qh16","author":"willliam971","body":"9/11 inside job??","author_flair_css_class":null}
{"created_utc":1478736000,"subreddit_id":"t5_2qur2","link_id":"t3_5c44bz","subreddit":"excel","author":"excelevator","author_flair_css_class":"points","body":"Have you tried stepping through the code to analyse the values at each step?\n\n","author_flair_text":"442","id":"d9tlp0l","parent_id":"t3_5c44bz"}
{"created_utc":1478736000,"subreddit_id":"t5_2tycb","link_id":"t3_5c384j","subreddit":"OldSchoolCool","author":"10minutes_late","author_flair_css_class":null,"body":"**Thanks Hillary**","author_flair_text":null,"id":"d9tlp0m","parent_id":"t3_5c384j"}

I am thinking of getting all the fields that are available in csv file (as header) and if data is not available for that particular field, just fill it with NA.

@DYZ My question is to write the python code in a way that can take all the available fields from all rows and make a csv which will have nulls if data is not available for that field. — ash25
– ash25, Commented Jan 27, 2017 at 1:09
@RoryDaulton That I am not sure of and so I was thinking of taking all the available fields from all rows and create headers in csv files and put nulls if data is not available for that particular field for that row. — ash25
– ash25, Commented Jan 27, 2017 at 1:11
Can you post your actual JSON data in a gist? The lines you quoted are not valid JSON (they're just four JSON objects, each on their own line). From the error it looks like the problem is in the read step, not the write step. — sundance
– sundance, Commented Jan 27, 2017 at 6:29

martineau · Accepted Answer · 2017-01-27 02:19:45Z

1

Your question is missing information about what you're trying to accomplish, so I'm guessing about them. Note that csv files don't use "nulls" to represent missing fields, they just have delimiters with nothing between them, like 1,2,,4,5 which has no third field value.

Also how you open csv files varys depending on whether you're using Python 2 or 3. The code below is for Python 3.

#!/usr/bin/env python3
import csv
import json
import os

os.chdir('c:\\Users\\Desktop')
with open('sampledata.json', 'r', newline='') as infile:
    data = json.loads(infile.read())

# determine all the keys present, which will each become csv fields
fields = list(set(key for row in data for key in row))

with open('outputfile.csv', 'w', newline='') as outfile:
    writer = csv.DictWriter(outfile, fields)
    writer.writeheader()
    writer.writerows(row for row in data)

edited Jan 27, 2017 at 2:19

answered Jan 27, 2017 at 2:14

martineau

124k29 gold badges181 silver badges319 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ash25 Over a year ago

It is still not able to figure out all the fields, got the error: JSONDecodeError("Extra data", s, end)

martineau Over a year ago

That may be because the JSON data shown in your question isn't valid. JSON objects can't appear one-right-after-the-other like that, so the JSONDecoder is complaining. For testing purposes, I enclosed the group of them all in [] bracket characters and added a comma between each. If your data is actually in exactly the format you describe, one-object-per-line, you can work around the issue by calling json.loads() for each row of the input file and creating the data list that way.

slackmart · Accepted Answer · 2017-01-27 02:18:04Z

0

I suggest you to use the csv.DictWriter class. That class needs an file to write to and a list of fieldnames (I've figured out from your data sample).

import csv
import json
import os

fieldnames = [
    "author", "author_flair_css_class", "author_flair_text", "body",
    "created_utc", "id", "link_id", "parent_id", "subreddit",
    "subreddit_id"
]

os.chdir('c:\\Users\\Desktop')
with open("data.json", "r") as infile:
    outfile = open("outputfile.csv", "w")

    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()

    for row in infile:
        row_dict = json.loads(row)
        writer.writerow(row_dict)

    outfile.close()

answered Jan 27, 2017 at 2:18

slackmart

4,9843 gold badges30 silver badges40 bronze badges

2 Comments

ash25 Over a year ago

This works fine for the above four lines of data, but when i run for the whole data file, i got UnicodeEncodeError: 'charmap' codec can't encode character '\u03a9' in position 46: character maps to <undefined>

slackmart Over a year ago

That error is caused by the file encoding, I think that by specifying the file encoding, e.g. with open('data.json', 'r', encoding='utf-8') as infile:, could fix that. (encoding keyword is available in py3k). docs.python.org/3/library/functions.html#open

Apollo2020 · Accepted Answer · 2017-01-27 01:27:15Z

0

You can write a little function to build the rows for you, extracting data only where it is available and inserting None if it is not. What you called header, I called schema. Get all the fields, remove duplicates and sort, then build records based on the full set of fields and insert those records into the csv.

import csv
import json

def build_record(row, schema):
    values = []
    for field in schema:
        if field in row:
            values.append(row[field])
        else:
            values.append(None)
    return tuple(values)

infile = open("data.json", "r").readlines()
outfile = open("outputfile.csv", "wb")
writer = csv.writer(outfile)

rows = [json.loads(row.strip()) for row in infile]
schema = tuple(sorted(list(set([k for r in rows for k in r.keys()]))))
records = [build_record(r, schema) for r in rows]

writer.writerow(schema)

for rec in records:
    writer.writerow(rec)
outfile.close()

answered Jan 27, 2017 at 1:27

Apollo2020

5192 silver badges8 bronze badges

1 Comment

ash25 Over a year ago

I got the TypeError: a bytes-like object is required, not 'str'

sundance · Accepted Answer · 2017-01-27 01:37:06Z

0

You can use Pandas to fill in the blanks for you (you may need to pip install pandas first):

import pandas as pd
import os

# load json
os.chdir('c:\\Users\\Desktop')
with open("data.json", "r") as infile:

    # read data into a Pandas DataFrame
    df = pd.read_json(infile)

# use Pandas to write to CSV
df.to_csv("myfile.csv")

answered Jan 27, 2017 at 1:37

sundance

2,9554 gold badges23 silver badges31 bronze badges

3 Comments

ash25 Over a year ago

Getting ValueError: Trailing data

sundance Over a year ago

Must be the form of the JSON. You can also just parse it separately and then read the dictionary: df = pd.DataFrame.from_dict(json.load(infile))

sundance Over a year ago

As I said in my comment above, we'd really need to see the actual JSON to help you fix JSON read errors.

Collectives™ on Stack Overflow

Convert non-nested json to csv file?

4 Answers 4

2 Comments

2 Comments

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

2 Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related