How can I use Python to run large SQL queries and export to CSV?

Question

I'm looking to export the result set of a SQL query into a CSV by using Python, the code below works for a small query but when trying on a larger query (sample below) it gives the following error:

TypeError: 'NoneType' object is not iterable

SQL Query - Example code (Very close to actual code but with sensitive info removed):

DECLARE @Chosen_Month DATE
SET @Chosen_Month = '2021-01-01';

IF OBJECT_ID('tempdb..#Base_Data') IS NOT NULL
DROP TABLE #Base_Data;

SELECT
     a.region
    ,a.customer_name
    ,SUM(b.transactions) AS transactions
    ,SUM(b.turnover) AS turnover
    ,SUM(b.revenue) AS revenue
INTO
    #Base_Data
FROM
    customer_table AS a
    INNER JOIN transaction_table AS b ON a.company_id = b.company_id
WHERE
    b.trans_date = @Chosen_Month
GROUP BY
    a.region
    ,a.customer_name

IF OBJECT_ID('tempdb..#Ranked_Data') IS NOT NULL
DROP TABLE #Ranked_Data;

SELECT
    *
    ,ROW_NUMBER() OVER(ORDER BY transactions DESC) AS trans_rank
    ,ROW_NUMBER() OVER(ORDER BY turnover DESC) AS turnover_rank
    ,ROW_NUMBER() OVER(ORDER BY revenue DESC) AS revenue_rank
FROM
    #Base_Data
    
SELECT
    *
FROM
    #Ranked_Data
WHERE
    revenue_rank <= 50
ORDER BY 
    revenue_rank ASC

I tried looking around to maybe split the SQL query into multiple executes and avoid running these null outputs but couldn't get to a working stage. How can I account for large queries that have objects like scalar variables running throughout as well? I'm fairly new to Python and would appreciate any help on this! Python code below:

import pyodbc
import csv

new_file_path = r'S:\Andy\Python\testdump.csv'
query_path = r'S:\Andy\Python\testquery.sql'

def read(conn):
    cursor = conn.cursor()
    with open(query_path, 'r') as sql_query_file:
        raw_data = cursor.execute(sql_query_file.read())
        
    with open(new_file_path, 'w', newline='') as csv_file:
        csv_out = csv.writer(csv_file)
        csv_out.writerow([i[0] for i in raw_data.description])
        for row in raw_data:
            csv_out.writerow(row)
            print("Finished export")
            
conn = pyodbc.connect(
    "Driver={Driver_name_here};"
    "Server=server_name_here;"
    "Database=database_name_here;"
    "Trusted_Connection=yes;"
)

read(conn)
conn.close()

It'll be very hard for anyone to help you with this question without a reference to the query you're executing. The fact you're mentioning the query "includes semi colons" suggests cursor.execute isn't returning any results for at least one of those statements (which is why the for loop fails). Can you provide a minimal reproducible example? — crcvd
– crcvd, Commented Mar 16, 2021 at 21:00
@crcvd I've added a close example to the SQL query that I was using, hope this helps — AndyMcCammont
– AndyMcCammont, Commented Mar 16, 2021 at 21:21
Also SQL Server has common table expressions, no need to swap something into temp tables manually. Let the optimizer decide when to swap the data. — astentx
– astentx, Commented Mar 16, 2021 at 21:54
As of PEP-249 cursor.execute does not return anything: Return values are not defined. So it is not iterable. But in pyodbc doc it is described as it returns the same cursor itself, so it may be confusing. You need fetchmany calls to get the data in chunks and write that chuncks into file. — astentx
– astentx, Commented Mar 16, 2021 at 22:20

Parfait · Accepted Answer · 2021-03-16 21:42:23Z

2

Consider pure SQL with CTEs and parameterization on date and avoid any temp tables. Regarding large amount of data, you might be experiencing timeout issues. See pyodbc.connection docs on use of timeout argument.

SQL

WITH Base_Data AS (
    SELECT
          a.region
        , a.customer_name
        , SUM(b.transactions) AS transactions
        , SUM(b.turnover) AS turnover
        , SUM(b.revenue) AS revenue
    FROM
        customer_table AS a
        INNER JOIN transaction_table AS b ON a.company_id = b.company_id
    WHERE
        b.trans_date = ?        -- PARAM PLACEHOLDER
    GROUP BY
          a.region
        , a.customer_name
), Ranked_Data AS (
    SELECT
        , *
        , ROW_NUMBER() OVER(ORDER BY transactions DESC) AS trans_rank
        , ROW_NUMBER() OVER(ORDER BY turnover DESC) AS turnover_rank
        , ROW_NUMBER() OVER(ORDER BY revenue DESC) AS revenue_rank
    FROM
        Base_Data
)
    
SELECT
    *
FROM
    Ranked_Data
WHERE
    revenue_rank <= 50
ORDER BY 
    revenue_rank ASC

Python

def sql_to_csv(conn):
    # COMBINE FILE CONTEXT MANAGERS
    with open(query_path, 'r') as sql_query_file, \
         open(new_file_path, 'w', newline='') as csv_file:
       
        # BIND PARAM TO QUERY
        raw_data = cursor.execute(sql_query_file.read(), ['2020-01-01'])
        
        csv_out = csv.writer(csv_file)
        csv_out.writerow([i[0] for i in raw_data.description])

        for row in raw_data:
            csv_out.writerow(row)
         print("Finished export")     # DE-INDENT STATUS PRINT OUTSIDE LOOP

conn = pyodbc.connect(
    "Driver={Driver_name_here};"
    "Server=server_name_here;"
    "Database=database_name_here;"
    "Trusted_Connection=yes;",
    timeout = 3                       # ADJUST ACCORDINGLY
)
cursor = conn.cursor()

try:                                  # EXCEPTION HANDLING TO ALWAYS CLOSE DB OBJECTS
    sql_to_csv(conn)
except Exception as e:
    print(e)
finally:
    cursor.close()
    conn.close()

answered Mar 16, 2021 at 21:42

Parfait

108k19 gold badges103 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

AndyMcCammont Over a year ago

Thanks for this, the goal would be ultimately to have a template that acts as a process for running existing queries with multiple parameters. So changing the query_path and new_file_path variables rather than multiple different SQL variables in the code for each query would be preferable, is that possible?

Parfait Over a year ago

Hmmm...are you asking a new question? Let's stay on topic to current specific issue. A more illustrative example would be needed for your later question (too much for comments, so a new question).

Parfait Over a year ago

IIUC - why not pass query_path and new_file_path variables into sql_to_csv() method: sql_to_csv(query_path, new_file_path)?

Collectives™ on Stack Overflow

How can I use Python to run large SQL queries and export to CSV?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related