Running SQL in Python and apply parameters from Python Dataframe

Question

I'm loading some data from SQL database to Python, but I need to apply some criteria from Python Dataframe, to be simplified, see example below:

    some_sql = """
               select column1,columns2 
               from table 
               where a between '{}' and '{}'
                    or a between '{}' and '{}'
                    or a between '{}' and '{}'
              """.format(date1,date2,date3,date4,date5,date6)

date1,date2,date3,date4,date5,date6 are sourced from Python Dataframe. I can manually specify all 6 parameters, but I do have over 20 in fact...

     df = DataFrame({'col1':['date1','date3','date5'],
                     'col2':['date2','date4','date6']})

is there a way I am able to do a loop here to be more efficient

DocZerø · Accepted Answer · 2017-08-07 12:20:09Z

1

Setup

# Create a dummy dataframe
df = pd.DataFrame({'col1':['date1','date3','date5'],
                   'col2':['date2','date4','date6']})

# Prepare the SQL (conditions will be added later)
some_sql = """
select column1,columns2 
from table 
where """

First approach

conditions = []
for row in df.iterrows():
    # Ignore the index
    data = row[1]
    conditions.append(f"or a between '{data['col1']}' and '{data['col2']}'")

some_sql += '\n'.join(conditions)

By using iterrows() we can iterate through the dataframe, rows by row.

Alternative

some_sql += '\nor '.join(df.apply(lambda x: f"a between '{x['col1']}' and '{x['col2']}'", axis=1).tolist())

Using apply() should be faster that iterrows():

Although apply() also inherently loops through rows, it does so much more efficiently than iterrows() by taking advantage of a number of internal optimizations, such as using iterators in Cython.

source

Another alternative

some_sql += '\nor '.join([f"a between '{row['col1']}' and '{row['col2']}'" for row in df.to_dict('records')])

This converts the dataframe to a list of dicts, and then applies a list comprehension to create the conditions.

Result

select column1,columns2 
from table 
where a between 'date1' and 'date2'
or a between 'date3' and 'date4'
or a between 'date5' and 'date6'

edited Aug 7, 2017 at 12:20

answered Aug 7, 2017 at 7:58

DocZerø

8,61511 gold badges44 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Baiii Over a year ago

Thank you Kristof, further to my question: if I need to insert some params in the middle of SQL query, for instance, "select case when xxx then yyy", xxx and yyy are params. Shall I break sql code into pieces to apply conditions, or use iterators?

Chris Travers · Accepted Answer · 2017-08-07 08:05:25Z

As a secondary note to Kristof's answer above, I would note that even as an analyst one should probably be careful about things like SQL injection, so inlining data is something to be avoided.

If possible you should define your query once with placeholders and then create a param list to go with the placeholders. This also saves on the formatting too.

So in your case your query looks like:

some_sql = """
           select column1,columns2 
           from table 
           where a between ? and ?
                or a between ? and ?
                or a between ? and ?

And our param list generation is going to look like:

conditions = []
for row in df.iterrows():
    # Ignore the index
    data = row[1]
    conditions.append(data['col1'])
    conditions.append(data['col2'])

Then execute your SQL with placeholder syntax and params list as placeholders.

Collectives™ on Stack Overflow

Running SQL in Python and apply parameters from Python Dataframe

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related