0

I am trying to build a SQL that needs to be filtered by two parameters (2 columns), and the second column needs to match multiple values.

Given below is the SQL I have build thus far (thanks for the help of Martijn Pieters)

import psycopg2
import pandas as pd
import datetime

# Connecting to db

con = psycopg2.connect(db_details)
cur = con.cursor()
cur.execute("select * from sales limit 10")
rows = cur.fetchall()

params = {'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
      'store_name': 'store_1', 'store_2'}

df = pd.read_sql("""
     select store_name,count(*) from sales 
     where created_at >= %(earliest)s
     and store_name = %(store_name)s""",
 params=params, con=con)

The above SQL has one date parameter which is used in the where clause and I added one more parameter namely the store_name, where rows match either one of two values.

Would like to know how could I add in this additional parameter to the existing query.

I tried to create a the parameter (similar to the date filter) and pass that to the existing query but get a syntax error when I give it two values:

    'store_name': 'store_1', 'store_2'}
                                      ^
SyntaxError: invalid syntax

pointing to the params field.

4
  • You have a simple syntax error because you do need to at least get the basics of the language learned a bit better. You are trying to run when you first need to properly walk. It would also be helpful if you picked up more fundamentals about SQL too; you can't just pass in two values for a single parameter because SQL would use a different kind of operator to test for those values, instead of store_name = ... for a single value, you'd have to use store_name in (<value1>, <value2>, ..., <valueN>). I was hoping your question would really just focus on that latter problem. Commented Sep 13, 2018 at 11:15
  • In future, you do need to start your question with the error output included, because with that attempt to define store_name at least people trying to answer your question will see this is a columnname IN (value, value, value) test, not a straight single-value columnname = value test. A SQL IN test with parameters requires extra work from the Python programmer because now you have to generate SQL text for those parameters. Commented Sep 13, 2018 at 11:17
  • Had you just set 'store_name': 'store_1' (so a single string value), the code would have worked as is. So the real problem here is not defining parameters, it is defining parameters with more than one value. Commented Sep 13, 2018 at 11:22
  • Pretty close to being a dupe of Python/psycopg2 WHERE IN statement, at least as far as the goal of the whole excercise is concerned. Commented Sep 13, 2018 at 11:33

2 Answers 2

2

You have two problems:

  • You used invalid Python syntax; the comma in a dictionary separates key-value pairs, so the 'store_2' string would be another key-value pair but is missing the : value parts. If you want to define a value with more than one string, you'd have to use a tuple or a list there, were you explicitly use either (...) or [...] to separate that syntax from the key: value, key: value notation:

    params = {
        'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
        'store_name': ('store_1', 'store_2'),  # tuple with two values
    }
    
  • Generally speaking, SQL parameters can only work with single values. The store_name parameter can only be given a single value, not a sequence of values. That's because SQL parameters are a bridge between the SQL query and the dynamic values to be used in that query, with parameters designed to act as placeholder for each individual dynamic value.

    That said, the psycopg2 library specifically supports tuples, this is an exception to most Python database libraries however.

    Next, if you want to filter rows on matching either 'store_1' or 'store_2', the correct SQL syntax would be to use two store_name = ... tests with OR between them and parentheses around (to keep that part separate from the date test connected with AND to the store name test), or by using store_name IN ('store_1', 'store_2'). An IN test compares a column name against multiple values listed in the (...) parentheses.

Given that you are using psycopg2 here, you can get away with the store_name key referencing a tuple value, but you do need to use IN for your query:

params = {
    'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
    'store_name': ('store_1', 'store_2')
}

df = pd.read_sql("""
     SELECT store_name, count(*) FROM sales 
     WHERE created_at >= %(earliest)s
     AND store_name IN %(store_name)s""",
     params=params, con=con)

On a separate note: the pd.read_sql() function [explicitly states that only sqlite is supported when using a DBAPI connection](If a DBAPI2 object, only sqlite3 is supported):

If a DBAPI2 object, only sqlite3 is supported.

You are using such an object; most Python database adapters are DBAPI2 libraries; DBAPI2 is a Python standard for such libraries.

You should really use a SQLAlchemy connection string instead. Your code happens to work because you never attempt to write any data back to the database and the psycopg connection and cursor objects are largely compatible with the sqlite3 library versions, but you could run into problems down the road.

Sign up to request clarification or add additional context in comments.

4 Comments

In case of psycopg it is possible to pass multiple values with a single placeholder. Tuples and lists have special adaptations, of which tuples adaptation is useful with IN predicates and ROW constructors.
@IljaEverilä: That's actually news to me! And indeed, this is documented. That does help here.
Some of the MySQL drivers have similar features as well, though I think they were either vaguely documented or not at all.
@IljaEverilä: this exposes a different issue: passing in a DBAPI2 connection may work, but only SQLite3 is actually supported, and there may well be issues with type conversions.
-1

I don't see why this would not work :

params = {'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
          'store_name': '<put what you want here>'}

df = pd.read_sql("""
         select store_name,count(*) from sales 
         where created_at >= %(earliest)s
         and store_name = %(store_name)s""",
     params=params, con=con)

Because you want two stores this is a bit more complex.

This should work :

params = {'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
          'store_names': ','.join(('store_1', 'store_2'))}

df = pd.read_sql("""
         select store_name,count(*) from sales 
         where created_at >= %(earliest)s
         and store_name in (%(store_names)s)""",
     params=params, con=con)

7 Comments

thanks I am having issue passing in more than store_name in the params. Could you please assist.
@KevinNash Could you please edit your question and show the code that you've tried and the error message ?
I have updated my initial message with the change that I have done along with the syntax error that it throws. It works fine if I pass just one store name
You can't use a single parameter for IN. You need to generate separate parameters for each of the values. That's because SQL parameters can only handle one value, and you don't want that value to include the commas.
You are asking for rows where the store_name column has a value of 'store_1,store_2', not for rows where the store_name value is either 'store_1' or 'store_2'.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.