In my pandas program I am reading a csv and converting some columns as json
For ex: my csv is like this:
id_4 col1 col2 .....................................col100
1 43 56 .....................................67
2 46 67 ....................................78
What I want to achieve is:
id_4 json
1 {"col1":43,"col2":56,.....................,"col100":67}
2 {"col1":46,"col2":67,.....................,"col100":78}
The code what I have tried is as follows:
df = pd.read_csv('file.csv')
def func(df):
d = [
dict([
(colname, row[i])
for i,colname in enumerate(df[['col1','col2',............,'col100']])
for row in zip(df['col1'].astype(str),df['col2'].astype(str),...............,df['col100'].astype(str))]
format_data = json.dumps(d)
format_data = format_data[1:len(format_data)-1]
json_data = '{"key":'+format_data+'}'
result.append(pd.Series([df['id_4'].unique()[0],json_data],index = headers))
return df
df.groupby('id_4').apply(func)
b = open('output.csv', 'w')
writer = csv.writer(b)
writer.writerow(headers)
writer.writerows(result[1:len(result)])
The CSV contains some 100 000 data, memory is (15 MB). when I execute this, after a long time the process is killed automatically. I think its a memory issue.
As I am newbie to this python and pandas, Is there any way to optimize the above code to work properly or increasing the memory is the only way.
I am using 5GB RAM Linux System.
EDIT:
df = pd.read_csv('Vill_inter.csv')
with open('output.csv', 'w') as f:
writer = csv.writer(f)
for id_4, row in itertools.izip(df.index.values, df.to_dict(orient='records')):
writer.writerow((id_4, json.dumps(row))
['col1','col2',............,'col100']is equivalent to['col'+str(n) for n in range(1, 101)]........... It can also help clean up your code.