Importing SQL query into Pandas results in only 1 column

Question

I'm trying to import the results of a complex SQL query into a pandas dataframe. My query requires me to create several temporary tables since the final result table I want includes some aggregates. My code looks like this:

    cnxn = pyodbc.connect(r'DRIVER=foo;SERVER=bar;etc')
    cursor = cnxn.cursor()
    cursor.execute('SQL QUERY HERE')
    cursor.execute('SECONDARY SQL QUERY HERE')
    ...
    df = pd.DataFrame(cursor.fetchall(),columns = [desc[0] for desc in cursor.description])

I get an error that tells me shapes aren't matching:

    ValueError: Shape of passed values is (1,900000),indices imply (5,900000)

And indeed, the result of all the SQL queries should be a table with 5 columns rather than 1. I've run the SQL query using Microsoft SQL Server Management Studio and it works and returns the 5 column table that I want. I've tried to not pass any column names into the dataframe and printed out the head of the dataframe and found that pandas has put all the information in 5 columns into 1. The values in each row is a list of 5 values separated by commas, but pandas treats the entire list as 1 column. Why is pandas doing this? I've also tried going the pd.read_sql route but I still get the same error.

EDIT:

I have done some more debugging, taking the comments into account. The issue doesn't appear to stem from the fact that my query is nested. I tried a simple (one line) query to return a 3 column table and I still got the same error. Printing out fetchall() looks like this:

    [(str1,str2,str3,datetime.date(stuff),datetime.date(stuff)), 
    (str1,str2,str3,datetime.date(stuff),datetime.date(stuff)),...]

Have you tried running the queries together in an anonymous code block, e.g., 'SET NOCOUNT ON; SQL_QUERY_1; SQL_QUERY_2;'? — Gord Thompson
– Gord Thompson, Commented Apr 9, 2018 at 18:23
Could you provide a sample of how cursor.fetchall() looks like? — mcard
– mcard, Commented Apr 9, 2018 at 18:33

mcard · Accepted Answer · 2018-04-09 18:46:09Z

10

Use pd.DataFrame.from_records instead:

df = pd.DataFrame.from_records(cursor.fetchall(),
                               columns = [desc[0] for desc in cursor.description])

edited Apr 9, 2018 at 18:46

answered Apr 9, 2018 at 18:37

mcard

6271 gold badge9 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

enumaris Over a year ago

Great, that did the trick! By the way, the command is cursor.description, I typo'd it when I typed the question. I will edit my question for that.

Parfait · Accepted Answer · 2018-04-12 18:16:53Z

2

Simply adjust the pd.DataFrame() call as right now cursor.fetchall() returns one-length list of tuples. Use tuple() or list to map child elements into their own columns:

df = pd.DataFrame([tuple(row) for row in cur.fetchall()],
                  columns = [desc[0] for desc in cursor.description])

edited Apr 12, 2018 at 18:16

answered Apr 9, 2018 at 18:39

Parfait

108k19 gold badges103 silver badges138 bronze badges

Collectives™ on Stack Overflow

Importing SQL query into Pandas results in only 1 column

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related