4

I'm trying to import the results of a complex SQL query into a pandas dataframe. My query requires me to create several temporary tables since the final result table I want includes some aggregates. My code looks like this:

    cnxn = pyodbc.connect(r'DRIVER=foo;SERVER=bar;etc')
    cursor = cnxn.cursor()
    cursor.execute('SQL QUERY HERE')
    cursor.execute('SECONDARY SQL QUERY HERE')
    ...
    df = pd.DataFrame(cursor.fetchall(),columns = [desc[0] for desc in cursor.description])

I get an error that tells me shapes aren't matching:

    ValueError: Shape of passed values is (1,900000),indices imply (5,900000)

And indeed, the result of all the SQL queries should be a table with 5 columns rather than 1. I've run the SQL query using Microsoft SQL Server Management Studio and it works and returns the 5 column table that I want. I've tried to not pass any column names into the dataframe and printed out the head of the dataframe and found that pandas has put all the information in 5 columns into 1. The values in each row is a list of 5 values separated by commas, but pandas treats the entire list as 1 column. Why is pandas doing this? I've also tried going the pd.read_sql route but I still get the same error.

EDIT:

I have done some more debugging, taking the comments into account. The issue doesn't appear to stem from the fact that my query is nested. I tried a simple (one line) query to return a 3 column table and I still got the same error. Printing out fetchall() looks like this:

    [(str1,str2,str3,datetime.date(stuff),datetime.date(stuff)), 
    (str1,str2,str3,datetime.date(stuff),datetime.date(stuff)),...]  
4
  • Have you tried running the queries together in an anonymous code block, e.g., 'SET NOCOUNT ON; SQL_QUERY_1; SQL_QUERY_2;'? Commented Apr 9, 2018 at 18:23
  • Just tried that and got the same exact error. Commented Apr 9, 2018 at 18:29
  • can you Union the two results? Commented Apr 9, 2018 at 18:31
  • Could you provide a sample of how cursor.fetchall() looks like? Commented Apr 9, 2018 at 18:33

2 Answers 2

10

Use pd.DataFrame.from_records instead:

df = pd.DataFrame.from_records(cursor.fetchall(),
                               columns = [desc[0] for desc in cursor.description])
Sign up to request clarification or add additional context in comments.

1 Comment

Great, that did the trick! By the way, the command is cursor.description, I typo'd it when I typed the question. I will edit my question for that.
2

Simply adjust the pd.DataFrame() call as right now cursor.fetchall() returns one-length list of tuples. Use tuple() or list to map child elements into their own columns:

df = pd.DataFrame([tuple(row) for row in cur.fetchall()],
                  columns = [desc[0] for desc in cursor.description])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.