What is the fastest way to process each line of a csv and write to a new csv ? Is there a way to use the least memory as well as be the fastest? Please see the following code. It requests a csv from an API but it takes very long to go through the for loop I commented. Also I think it is using all the memory on my server.
from pandas import *
import csv
import requests
reportResult = requests.get(api,headers=header)
csvReader = csv.reader(utf_8_encoder(reportResult.text))
reportData = []
#for loop takes a long time
for row in csvReader:
combinedDict = dict(zip(fields, row))
combinedDict = cleanDict(combinedDict)
reportData.append(combinedDict)
reportDF = DataFrame(reportData, columns = fields)
reportDF.to_csv('report.csv',sep=',',header=False,index=False)
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')
def cleanDict(combinedDict):
if combinedDict.get('a_id', None) is not None:
combinedDict['a_id'] = int(
float(combinedDict['a_id']))
combinedDict['unique_a_id'] = ('1_a_'+
str(combinedDict['a_id']))
if combinedDict.get('i_id', None) is not None:
combinedDict['i_id'] =int(
float(combinedDict['i_id']))
combinedDict['unique_i_id'] = ('1_i_'+
str(combinedDict['i_id']))
if combinedDict.get('pm', None) is not None:
combinedDict['pm'] = "{0:.10f}".format(float(combinedDict['pm']))
if combinedDict.get('s', None) is not None:
combinedDict['s'] = "{0:.10f}".format(float(combinedDict['s']))
return combinedDict
When I run the python memory profiler , why is the line on the for loop having memory increment? Is the actual for loop saving something in memory, or is my utf-8 convertor messing something up?
Line # Mem usage Increment Line Contents
================================================
162 1869.254 MiB 1205.824 MiB for row in csvReader:
163 #print row
164 1869.254 MiB 0.000 MiB combinedDict = dict(zip(fields, row))
When I put the "@profile" symbol on the utf_8-encoder function as well, I see the memory on the above for loop disappeared:
163 for row in csvReader:
But now there is memory on the convertor's for loop (i didn't let it run as long as last time so it only got to 56MB before I did ctrl+C):
Line # Mem usage Increment Line Contents
================================================
154 663.430 MiB 0.000 MiB @profile
155 def utf_8_encoder(unicode_csv_data):
156 722.496 MiB 59.066 MiB for line in unicode_csv_data:
157 722.496 MiB 0.000 MiB yield line.encode('utf-8')