My following code works correctly, but far too slowly. I would greatly appreciate any help you can provide:
import gf
import csv
cic = gf.ct
cii = gf.cit
li = gf.lt
oc = "Output.csv"
with open(cic, "rb") as input1:
reader = csv.DictReader(cie,gf.ctih)
with open(oc,"wb") as outfile:
writer = csv.DictWriter(outfile,gf.ctoh)
writer.writerow(dict((h,h) for h in gf.ctoh))
next(reader)
for ci in reader:
row = {}
row["ci"] = ci["id"]
row["cyf"] = ci["yf"]
with open(cii,"rb") as ciif:
reader2 = csv.DictReader(ciif,gf.citih)
next(reader2)
with open(li, "rb") as lif:
reader3 = csv.DictReader(lif,gf.lih)
next(reader3)
for cii in reader2:
if ci["id"] == cii["id"]:
row["ci"] = cii["ca"]
for li in reader3:
if ci["id"] == li["en_id"]:
row["cc"] = li["c"]
writer.writerow(row)
The reason I open reader2 and reader3 for every row in reader is because reader objects iterate through once and then are done. But there has to be a much more efficient way of doing this and I would greatly appreciate any help you can provide!
If it helps, the intuition behind this code is the following: From Input file 1, grab two cells; see if input file 2 has the same Primary Key as in input file 1, if so, grab a cell from input file 2 and save it with the two other saved cells; see if input file 3 has the same primary key as in input file 1, if so, grab a cell from inputfile3 and save it. Then output these four values. That is, I'm grabbing meta-data from normalized tables and I'm trying to denormalize it. There must be a way of doing this very efficiently in Python. One problem with the current code is that I iterate through reader objects until I find the relevant ID, when there must be a simpler way of searching for a given ID in a reader object...
dict) you should be able to get fast lookup of your index. Right now you're repeating working by reading in file 2 and file 3 on every loop.