extremely slow to write sql query into csv file

Question

I have a sql query that containers approximately 500k rows and 47 columns and I want this query to be dumped into a csv file , so I can import afterwards the file into a table onto a new database hosted into another server. My code does not use any fancy library that would cause some overhead into the process , but nonetheless the writing takes around 15 minutes to complete. I believe there is something wrong with my code but can't define what would speed things up. The connection uses cx_oracle driver in python.

import config
from pathlib import WindowsPath
import csv

con = cx_Oracle.connect(f'{config.USER_ODS}/{config.PASS_ODS}@{config.HOST_ODS}:{config.PORT_ODS}/{config.SERVICENAME_ODS}')
sql = 'SELECT * FROM ods.v_hsbc_ucmdb_eim'
cur = con.cursor()


output = WindowsPath('result.csv')

with output.open('w',encoding="utf-8") as f:
    writer = csv.writer(f, lineterminator="\n")
    cur.execute(sql)
    col_names = [row[0] for row in cur.description]
    writer.writerow(col_names)
    for row in cur:
        writer.writerow(row)

for row in cur: is the slowest way you can do this. I would look at the other fetch methods in particular fetchmany(). See also Batch procesing in particular Loading CSV which you can reverse to export CSV. — Adrian Klaver
– Adrian Klaver, Commented Feb 9, 2023 at 0:36

J_H · Accepted Answer · 2023-02-09 01:24:19Z

1

You’re saying it takes approximately 1 ms per row to read from the database and write to the file system. Certainly there are databases and file systems that can process rows more quickly. Given a possibly busy multiuser Oracle DB, and finite network bandwidth, the throughput you report seems perfectly plausible. You didn't specify whether the 47 columns are mostly integers or BLOB binary large objects / text.

If you believe that higher throughput is feasible in your environment, show us timing results that isolate database performance versus available network bandwidth. RTT roundtrip times (ping) would be of interest, as well. A simple "CREATE TABLE eim_temp AS SELECT * FROM ods.v_hsbc_ucmdb_eim;" is a good way to focus on DB disk / CPU processing capacity, without network delays being part of the equation.

edited Feb 9, 2023 at 1:24

answered Feb 8, 2023 at 22:39

J_H

21.3k5 gold badges29 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

extremely slow to write sql query into csv file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related