0

I have a sql query that containers approximately 500k rows and 47 columns and I want this query to be dumped into a csv file , so I can import afterwards the file into a table onto a new database hosted into another server. My code does not use any fancy library that would cause some overhead into the process , but nonetheless the writing takes around 15 minutes to complete. I believe there is something wrong with my code but can't define what would speed things up. The connection uses cx_oracle driver in python.

import config
from pathlib import WindowsPath
import csv

con = cx_Oracle.connect(f'{config.USER_ODS}/{config.PASS_ODS}@{config.HOST_ODS}:{config.PORT_ODS}/{config.SERVICENAME_ODS}')
sql = 'SELECT * FROM ods.v_hsbc_ucmdb_eim'
cur = con.cursor()


output = WindowsPath('result.csv')

with output.open('w',encoding="utf-8") as f:
    writer = csv.writer(f, lineterminator="\n")
    cur.execute(sql)
    col_names = [row[0] for row in cur.description]
    writer.writerow(col_names)
    for row in cur:
        writer.writerow(row)
1
  • 2
    for row in cur: is the slowest way you can do this. I would look at the other fetch methods in particular fetchmany(). See also Batch procesing in particular Loading CSV which you can reverse to export CSV. Commented Feb 9, 2023 at 0:36

1 Answer 1

1

You’re saying it takes approximately 1 ms per row to read from the database and write to the file system. Certainly there are databases and file systems that can process rows more quickly. Given a possibly busy multiuser Oracle DB, and finite network bandwidth, the throughput you report seems perfectly plausible. You didn't specify whether the 47 columns are mostly integers or BLOB binary large objects / text.

If you believe that higher throughput is feasible in your environment, show us timing results that isolate database performance versus available network bandwidth. RTT roundtrip times (ping) would be of interest, as well. A simple "CREATE TABLE eim_temp AS SELECT * FROM ods.v_hsbc_ucmdb_eim;" is a good way to focus on DB disk / CPU processing capacity, without network delays being part of the equation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.