1

I have a python script that reads a large (4GB!!!) CSV file into MySQL. It works as is, but is DOG slow. The CSV file has over 4 million rows. And it is taking forever to insert all the records into the database.

Could I get an example of how I would use executemany in this situation?

Here is my code:

source = os.path.join('source_files', 'aws_bills', 'march-bill-original-2019.csv')
try:
    with open(source) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        next(csv_reader)
        insert_sql = """ INSERT INTO billing_info (InvoiceId, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName, RateId, SubscriptionId, PricingPlanId, UsageType, Operation, AvailabilityZone, ReservedInstance, ItemDescription, UsageStartDate, UsageEndDate, UsageQuantity, BlendedRate, BlendedCost, UnBlendedRate, UnBlendedCost, ResourceId, Engagement, Name, Owner, Parent) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) """
        #for row in csv_reader:
        for row_idx, row in enumerate(csv_reader):
            try:
                cursor.execute(insert_sql,row)
                #cursor.executemany(insert_sql, 100)
                mydb.commit()
                print('row', row_idx, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat())
            except Exception as e:
                print("MySQL Exception:", e)
        print("Done importing data.") 

Again, that code works to insert the records into the database. But I am hoping to speed this up with executemany if I can get an example of how to do that.

1

1 Answer 1

1

Good Night I saw that the question is a little old and I don't know if you still need it. I was doing something similar recently, initially I transformed the csv into a list so that the executemany function accepts the data, right after performing the request passing its insert with the list, in your case it would look like this:

import pandas as pd

df = pd.read_csv(r'path_your_csv')
df1=pd.DataFrame(df)
df1=df1.astype(str)
List_Values=df1.values.tolist()
insert_sql = """ INSERT INTO billing_info (InvoiceId, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName, RateId, SubscriptionId, PricingPlanId, UsageType, Operation, AvailabilityZone, ReservedInstance, ItemDescription, UsageStartDate, UsageEndDate, UsageQuantity, BlendedRate, BlendedCost, UnBlendedRate, UnBlendedCost, ResourceId, Engagement, Name, Owner, Parent) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) """

cursor.executemany(insert_sql, List_Values)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.