19

Given:

CREATE PROCEDURE my_procedure
    @Param INT
AS
    SELECT Col1, Col2
    FROM Table
    WHERE Col2 = @Param

I would like to be able to use this as:

import pandas as pd
import pyodbc

query = 'EXEC my_procedure @Param = {0}'.format(my_param)
conn = pyodbc.connect(my_connection_string)

df = pd.read_sql(query, conn)

But this throws an error:

ValueError: Reading a table with read_sql is not supported for a DBAPI2 connection. Use an SQLAlchemy engine or specify an sql query

SQLAlchemy does not work either:

import sqlalchemy
engine = sqlalchemy.create_engine(my_connection_string)
df = pd.read_sql(query, engine)

Throws:

ValueError: Could not init table 'my_procedure'

I can in fact execute the statement using pyodbc directly:

cursor = conn.cursor()
cursor.execute(query)
results = cursor.fetchall()
df = pd.DataFrame.from_records(results)

Is there a way to send these procedure results directly to a DataFrame?

1
  • 3
    What version of pandas are you using? And can you try to use pd.read_sql_query instead of read_sql? (there was a bug in read_sql regarding executing stored procedures) Commented Oct 1, 2014 at 8:06

4 Answers 4

11

Use read_sql_query() instead.

Looks like @joris (+1) already had this in a comment directly under the question but I didn't see it because it wasn't in the answers section.

Use the SQLA engine--apart from SQLAlchemy, Pandas only supports SQLite. Then use read_sql_query() instead of read_sql(). The latter tries to auto-detect whether you're passing a table name or a fully-fledged query but it doesn't appear to do so well with the 'EXEC' keyword. Using read_sql_query() skips the auto-detection and allows you to explicitly indicate that you're using a query (there's also a read_sql_table()).

import pandas as pd
import sqlalchemy

query = 'EXEC my_procedure @Param = {0}'.format(my_param)
engine = sqlalchemy.create_engine(my_connection_string)
df = pd.read_sql_query(query, engine)
Sign up to request clarification or add additional context in comments.

Comments

11

This worked for me after added SET NOCOUNT ON thanks @CRAFTY DBA

sql_query = """SET NOCOUNT ON; EXEC db_name.dbo.StoreProc '{0}';""".format(input)

df = pandas.read_sql_query(sql_query , conn)

1 Comment

I had to do this as well. In order to get my query to work i had to create the string with triple quotes using single quotes around my stored procedure's variables. Looked like this """EXEC sProcedure 'variable1', 'variable2'"""
7

https://code.google.com/p/pyodbc/wiki/StoredProcedures

I am not a python expert, but SQL Server sometimes returns counts for statement executions. For instance, a update will tell how many rows are updated.

Just use the 'SET NO COUNT;' at the front of your batch call. This will remove the counts for inserts, updates, and deletes.

Make sure you are using the correct native client module.

Take a look at this stack overflow example.

It has both a adhoc SQL and call stored procedure example.

Calling a stored procedure python

Good luck

2 Comments

SET NOCOUNT ON unfortunately doesn't help in this case.
Adding SET NOCOUNT ON at the start of my stored procedure worked. Thanks. Also, I tried adding "SET NOCOUNT ON; " to the start of my query and that worked as well.
3

Using ODBC syntax for calling stored procedures (with parameters instead of string formatting) works for loading dataframes using pandas 0.14.1 and pyodbc 3.0.7. The following examples use the AdventureWorks2008R2 sample database.

First confirm expected results calling the stored procedure using pyodbc:

import pandas as pd
import pyodbc
connection = pyodbc.connect(driver='{SQL Server Native Client 11.0}', server='ServerInstance', database='AdventureWorks2008R2', trusted_connection='yes')
sql = "{call dbo.uspGetEmployeeManagers(?)}"
params = (3,)
cursor = connection.cursor()
rows = cursor.execute(sql, params).fetchall()
print(rows)

Should return:

[(0, 3, 'Roberto', 'Tamburello', '/1/1/', 'Terri', 'Duffy'), (1, 2, 'Terri', 'Duffy',
'/1/', 'Ken', 'Sánchez')]

Now use pandas to load the results into a dataframe:

df = pd.read_sql(sql=sql, con=connection, params=params)
print(df)

Should return:

   RecursionLevel  BusinessEntityID FirstName    LastName OrganizationNode  \
0               0                 3   Roberto  Tamburello            /1/1/
1               1                 2     Terri       Duffy              /1/

  ManagerFirstName ManagerLastName
0            Terri           Duffy
1              Ken         Sánchez

EDIT

Since you can't update to pandas 0.14.1, load the results from pyodbc using pandas.DataFrame.from_records:

# get column names from pyodbc results
columns = [column[0] for column in cursor.description]
df = pd.DataFrame.from_records(rows, columns=columns)

3 Comments

This does not work in pandas 0.14.0. I'm unable at present to upgrade to 0.14.1 to test there, but thanks for the tip.
@joeb1415 Updated on how to populate dataframe without using read_sql
Minor, you probably want to from_records(rows, columns=columns, coerce_float=True) so that you don't end up with stringed numeric columns, which is often the case with oracle/sql

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.