0

In my pandas program I am reading a csv and converting some columns as json

For ex: my csv is like this:

id_4 col1  col2 .....................................col100
1     43    56  .....................................67
2     46    67   ....................................78

What I want to achieve is:

id_4 json

1  {"col1":43,"col2":56,.....................,"col100":67}
2  {"col1":46,"col2":67,.....................,"col100":78}

The code what I have tried is as follows:

    df = pd.read_csv('file.csv')
    def func(df):         
        d = [
        dict([
        (colname, row[i])        
        for i,colname in enumerate(df[['col1','col2',............,'col100']])

        for row in zip(df['col1'].astype(str),df['col2'].astype(str),...............,df['col100'].astype(str))]

        format_data = json.dumps(d)
        format_data = format_data[1:len(format_data)-1]
        json_data = '{"key":'+format_data+'}' 
        result.append(pd.Series([df['id_4'].unique()[0],json_data],index = headers))                                        
        return df   

    df.groupby('id_4').apply(func)

b = open('output.csv', 'w')
writer = csv.writer(b)
writer.writerow(headers)
writer.writerows(result[1:len(result)])

The CSV contains some 100 000 data, memory is (15 MB). when I execute this, after a long time the process is killed automatically. I think its a memory issue.

As I am newbie to this python and pandas, Is there any way to optimize the above code to work properly or increasing the memory is the only way.

I am using 5GB RAM Linux System.

EDIT:

df = pd.read_csv('Vill_inter.csv')
with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    for id_4, row in itertools.izip(df.index.values, df.to_dict(orient='records')):
        writer.writerow((id_4, json.dumps(row))
5
  • ['col1','col2',............,'col100'] is equivalent to ['col'+str(n) for n in range(1, 101)] Commented Aug 4, 2015 at 12:11
  • no error, just letting you know an alternative way to write it to express the same thing more concisely in real python without any "shorthand" notation such as ........... It can also help clean up your code. Commented Aug 4, 2015 at 12:18
  • As far as errors go, your code sample has mismatched parens and braces. Commented Aug 4, 2015 at 12:19
  • @Francis Usher i just pasted a snippet of code. i didn't noticed parens and braces. Commented Aug 4, 2015 at 12:21
  • @Francis Usher is any solution is there for this?? or any alternate way to achieve the result?? Commented Aug 4, 2015 at 12:22

1 Answer 1

2

Pandas dataframe can be directly serializable in JSON with to_json method.

Your output format is not very clear but have a look at this:

# Generate dataframe
df = pd.DataFrame(np.random.randn(5, 100), columns=['col' + str(n) for n in xrange(1, 101)])
# Create id_4 column
df.index += 1
df.index.name = 'id_4'
# Reindex df to have the column id_4 in the output, remove if you only columns1 to X
df.reset_index(drop=False, inplace=True)

# Dump data to disk, or buffer
path = 'out.json'
df.to_json(path, orient='records')

It is gonna be much faster than your loops and will probably solve your error.

EDIT:

Apparently the output should be a custom fileformat. In this case you can output the dataframe using to_dict(orient='records). The output will be a list where each element represents a row as a dictionary. You can serialize the dictionary using the dumps function of the json module (built-in).

Something like this:

import json
import itertools

with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    for id, row in itertools.izip(df.index.values, df.to_dict(orient='records')):
        writer.writerow((id, json.dumps(row)))
Sign up to request clarification or add additional context in comments.

3 Comments

also its possible to add key to that json
It depends if you indexed your dataframe with the column id_4 or not. If you skip the line df.reset_index(..), it works.
Please do not change your requirements and try to understand the code. or ask another question. My answer should be accepted ..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.