Export Pandas dataframe to custom CSV format with JSON rows

Question

In my pandas program I am reading a csv and converting some columns as json

For ex: my csv is like this:

id_4 col1  col2 .....................................col100
1     43    56  .....................................67
2     46    67   ....................................78

What I want to achieve is:

id_4 json

1  {"col1":43,"col2":56,.....................,"col100":67}
2  {"col1":46,"col2":67,.....................,"col100":78}

The code what I have tried is as follows:

    df = pd.read_csv('file.csv')
    def func(df):         
        d = [
        dict([
        (colname, row[i])        
        for i,colname in enumerate(df[['col1','col2',............,'col100']])

        for row in zip(df['col1'].astype(str),df['col2'].astype(str),...............,df['col100'].astype(str))]

        format_data = json.dumps(d)
        format_data = format_data[1:len(format_data)-1]
        json_data = '{"key":'+format_data+'}' 
        result.append(pd.Series([df['id_4'].unique()[0],json_data],index = headers))                                        
        return df   

    df.groupby('id_4').apply(func)

b = open('output.csv', 'w')
writer = csv.writer(b)
writer.writerow(headers)
writer.writerows(result[1:len(result)])

The CSV contains some 100 000 data, memory is (15 MB). when I execute this, after a long time the process is killed automatically. I think its a memory issue.

As I am newbie to this python and pandas, Is there any way to optimize the above code to work properly or increasing the memory is the only way.

I am using 5GB RAM Linux System.

EDIT:

df = pd.read_csv('Vill_inter.csv')
with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    for id_4, row in itertools.izip(df.index.values, df.to_dict(orient='records')):
        writer.writerow((id_4, json.dumps(row))

['col1','col2',............,'col100'] is equivalent to ['col'+str(n) for n in range(1, 101)] — machine yearning
– machine yearning, Commented Aug 4, 2015 at 12:11
no error, just letting you know an alternative way to write it to express the same thing more concisely in real python without any "shorthand" notation such as ........... It can also help clean up your code. — machine yearning
– machine yearning, Commented Aug 4, 2015 at 12:18
As far as errors go, your code sample has mismatched parens and braces. — machine yearning
– machine yearning, Commented Aug 4, 2015 at 12:19
@Francis Usher i just pasted a snippet of code. i didn't noticed parens and braces. — Subburaj
– Subburaj, Commented Aug 4, 2015 at 12:21
@Francis Usher is any solution is there for this?? or any alternate way to achieve the result?? — Subburaj
– Subburaj, Commented Aug 4, 2015 at 12:22

Kirell · Accepted Answer · 2015-08-04 13:04:18Z

2

Pandas dataframe can be directly serializable in JSON with to_json method.

Your output format is not very clear but have a look at this:

# Generate dataframe
df = pd.DataFrame(np.random.randn(5, 100), columns=['col' + str(n) for n in xrange(1, 101)])
# Create id_4 column
df.index += 1
df.index.name = 'id_4'
# Reindex df to have the column id_4 in the output, remove if you only columns1 to X
df.reset_index(drop=False, inplace=True)

# Dump data to disk, or buffer
path = 'out.json'
df.to_json(path, orient='records')

It is gonna be much faster than your loops and will probably solve your error.

EDIT:

Apparently the output should be a custom fileformat. In this case you can output the dataframe using to_dict(orient='records). The output will be a list where each element represents a row as a dictionary. You can serialize the dictionary using the dumps function of the json module (built-in).

Something like this:

import json
import itertools

with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    for id, row in itertools.izip(df.index.values, df.to_dict(orient='records')):
        writer.writerow((id, json.dumps(row)))

edited Aug 4, 2015 at 13:04

answered Aug 4, 2015 at 12:48

Kirell

9,8984 gold badges49 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Subburaj Over a year ago

also its possible to add key to that json

Kirell Over a year ago

It depends if you indexed your dataframe with the column id_4 or not. If you skip the line df.reset_index(..), it works.

Kirell Over a year ago

Please do not change your requirements and try to understand the code. or ask another question. My answer should be accepted ..

Collectives™ on Stack Overflow

Export Pandas dataframe to custom CSV format with JSON rows

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related