I'm going to need to import 30k rows of data from a CSV file into a Vertica database. The code I've tried with is taking more than an hour to do so. I'm wondering if there's a faster way to do it? I've tried to import using csv and also by looping through a dataframe to insert, but it just isn't fast enough. Infact, it's way too slow. Could you please help me?
rownum=df.shape[0]
for x in range(0,rownum):
a=df['AccountName'].values[x]
b=df['ID'].values[x]
ss="INSERT INTO Table (AccountName,ID) VALUES (%s,%s)"
val=(a,b)
cur.execute(ss,val)
connection.commit()
val = (df['AccountName'].values.tolist(), df['ID'].values.tolist())and then usecur.executemany(ss, val)without theforloop. That should be faster but I'm not sure if there might be further improvements. Also, 1 space of indentation makes this code difficult to read; are you sure thatconnection.commit()is definitely not in theforloop - it only takes 1 space to make that mistake. I suggest you go by PEP8 rules and use 4 spaces