0

I have the following csv

id, o1, o2, o3
'jess', 1.0, 4, 0.3
'jill', 0, 5, 0.123
'jamie', -3, 0.2, 1.0

and would like it in a nested json with each column as a json keyed on the header name:

myjson = {
    "o1": {"jess": 1.0, "jill":   0, "jamie": -3}, 
    "o2": {"jess":   4, "jill":   5, "jamie": 0.2}, 
    "o3": {"jess": 0.3, "jill": 0.2, "jamie": 1.0}, 

Not sure the best (most pythonic) way to do this. Here is my first attempt:

import csv
with open(myfile, "r") as f:
    reader = csv.reader(f, delimiter=',', quotechar='"')
    first = True
    for line in reader:
        if first:
            myjson = {key: dict() for key in line}
            header = list(line)
            first = False
        for i in range(len(header)):
            id = line[0]
            myjson[header[i+1]][id] = line[i+1]

I assume there is a better way to do this.

Edit: Should've specified this earlier but I do NOT want to use anything like Pandas. This needs to be super fast with minimal package dependencies.

2
  • Please find the answer here, hope it helps. Commented Mar 8, 2017 at 20:56
  • @Bhargav sorry, that's not the answer. That answer is taking each row and making a dict keyed on column names. I'm trying to take each column and make a dict keyed on the first entry of the row. Commented Mar 9, 2017 at 0:08

3 Answers 3

3

This may be "cheating" but this has always worked for me. If it hasn't - nothing a little code couldn't fix. But I use the Pandas module. It really takes care of a lot of my data needs. I read the csv into a DataFrame and then Put the dataframe into JSON (or any other format)

import pandas as pd

df1 = pd.read_csv('YOUR_PATH_HERE')

df1.to_json('PATH_HERE')

It's super simple and easily customizable. You may need to fiddle around with more variables. Here is the documentation: read_csv, to_json and this is always a good one to read: 10 Minutes to Pandas

Sign up to request clarification or add additional context in comments.

3 Comments

I'm sorry, I should've specified no pandas. I'm trying not to rely on additional packages nor do i want to take the time to build up a DataFrame only to abandon it. Totally the easiest way of doing this though!
@Sal, Understood. However, for your point about 'building the dataframe to abandon it' - creating the dataframe takes a minute. Don't feel bad about abandoning your data-baby.
The design specs actually say "no pandas". :)
0

I definitely think the following answer is too long, but if you are in need of answer still, this works. I have created a test.csv as per your data

I don't know why you want to eliminate using pandas, but anyways

import csv
import itertools
from itertools import *
import json


def read_with_header():
    with open ('/Users/bhargavsaidama/test.csv', 'rb') as f:
        reader = csv.reader(f, delimiter = ',', quotechar = '|')
        row_count = 0
        keys = []
        for row in reader:
            row_count = row_count + 1
            keys.append(row)
        header = keys[0]
        return row_count, header


def reading_ignore_header():

    row_count, header = read_with_header()

    with open('/Users/bhargavsaidama/test.csv', 'rb') as f:
        f.next()
        # row_count = sum(1 for row in f)
        # # print row_count
        reader = csv.reader(f, delimiter = ',' , quotechar = '|')
        result = []
        values = ()

        for row in reader:
            # row_count is taken including header file in the above function
            values = tuple((itertools.combinations(row, 2)))[:(row_count-1)]  # row_count is important,since your keys are rows

            for x, y in values:
                result.append({x:y})
        return result, header

# The following function is taken from here
# http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]


def main():

    result, header = reading_ignore_header()
    final_values = list(chunks(result,3)) # here 3 reflects (row_count-1)
    header = header[1:]        # seems like u wanna ignore Id 
    data_str = json.dumps(dict(zip(header, final_values)))
    data_json = json.loads(data_str)
    print data_str, data_json
    return data_str, data_json


if __name__ == "__main__":
    main()

Hope it helps, if you can optimize it go ahead and do it. I will also learn :)

Thanks

Comments

0

Here is a simpler solution and you will need pyexcel and pyexcel-text:

>>> import pyexcel as p
>>> sheet=p.get_sheet(file_name='test.csv')
>>> sheet
test.csv:
+---------+-----+-----+-------+
| id      | o1  | o2  | o3    |
+---------+-----+-----+-------+
| 'jess'  | 1.0 | 4   | 0.3   |
+---------+-----+-----+-------+
| 'jill'  | 0   | 5   | 0.123 |
+---------+-----+-----+-------+
| 'jamie' | 3   | 0.2 | 1.0   |
+---------+-----+-----+-------+
>>> sheet.transpose()
>>> sheet.name_columns_by_row(0)
>>> sheet.name_rows_by_column(0)
>>> sheet
test.csv:
+----+--------+--------+---------+
|    | 'jess' | 'jill' | 'jamie' |
+====+========+========+=========+
| o1 | 1.0    | 0      | 3       |
+----+--------+--------+---------+
| o2 | 4      | 5      | 0.2     |
+----+--------+--------+---------+
| o3 | 0.3    | 0.123  | 1.0     |
+----+--------+--------+---------+
>>> sheet.get_json(write_title=False) # pip install pyexcel-text
'{"o1": {"\'jamie\'": 3, "\'jess\'": 1.0, "\'jill\'": 0}, "o2": {"\'jamie\'": "0.2", "\'jess\'": 4, "\'jill\'": 5}, "o3": {"\'jamie\'": 1.0, "\'jess\'": "0.3", "\'jill\'": "0.123"}}'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.