1

I am trying to parameterize some parts of a SQL Query using the below dictionary:

query_params = dict(
        {'target':'status',
         'date_from':'201712',
         'date_to':'201805',
         'drform_target':'NPA'
      })

sql_data_sample = str("""select *
                                 from table_name
                                     where dt = %(date_to)s
                                     and %(target)s in (%(drform_target)s)

                        ----------------------------------------------------
                        union all
                        ----------------------------------------------------

                        (select *,
                                 from table_name
                                     where dt  = %(date_from)s
                                     and %(target)s in ('ACT')
                                     order by random() limit 50000);""")

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

However this returns a dataframe with no records at all. I am not sure what the error is since no error is being thrown.

df_data_sample.shape
Out[7]: (0, 1211)

The final PostgreSql query would be:

select *
        from table_name
            where dt = '201805'
            and status in ('NPA')

----------------------------------------------------
union all
----------------------------------------------------
(select *
        from table_name
            where dt  = '201712'
            and status in ('ACT')
            order by random() limit 50000);-- This part of random() is only for running it on my local and not on server.

Below is a small sample of data for replication. The original data has more than a million records and 1211 columns

service_change_3m   service_change_6m   dt  grp_m2          status
0                   -2                  201805  $50-$75     NPA
0                    0                  201805  < $25       NPA
0                   -1                  201805  $175-$200   ACT
0                    0                  201712  $150-$175   ACT
0                    0                  201712  $125-$150   ACT
-1                   1                  201805  $50-$75     NPA

Can someone please help me with this?

UPDATE: Based on suggestion by @shmee.. I am finally using :

target = 'status'
query_params = dict(
        {
         'date_from':'201712',
         'date_to':'201805',
         'drform_target':'NPA'
      })

sql_data_sample = str("""select *
                                 from table_name
                                     where dt = %(date_to)s
                                     and {0} in (%(drform_target)s)

                        ----------------------------------------------------
                        union all
                        ----------------------------------------------------

                        (select *,
                                 from table_name
                                     where dt  = %(date_from)s
                                     and {0} in ('ACT')
                                     order by random() limit 50000);""").format(target)

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)
12
  • your queries are different. the first query has status in ('ACTIVE') and your second query has status in ('ACT') Commented Jul 18, 2018 at 5:14
  • Hello @aydow.. Sorry, typo.. Updated the question, thanks!! Commented Jul 18, 2018 at 5:17
  • no worries. does the query work with a literal string using pandas? Commented Jul 18, 2018 at 5:22
  • nope. It is the same empty df even if I enclose the values in the dict in " ". Commented Jul 18, 2018 at 5:33
  • i mean have you tried using select * from table_name where dt = '201805' and status in ('NPA') ---------------------------------------------------- union all ---------------------------------------------------- (select * from table_name where dt = '201712' and status in ('ACT') order by random() limit 50000) as sql_data_sample and not included the query_params? Commented Jul 18, 2018 at 5:37

1 Answer 1

2

Yes, I am quite confident that your issue results from trying to set column names in your query via parameter binding (and %(target)s in ('ACT')) as mentioned in the comments.

This results in your query restricting the result set to records where 'status' in ('ACT') (i.e. Is the string 'status' an element of a list containing only the string 'ACT'?). This is, of course, false, hence no record gets selected and you get an empty result.

This should work as expected:

import psycopg2.sql

col_name = 'status'
table_name = 'public.churn_data'
query_params = {'date_from':'201712',
                'date_to':'201805',
                'drform_target':'NPA'
               }

sql_data_sample = """select * 
                     from {0} 
                     where dt = %(date_to)s 
                     and {1} in (%(drform_target)s)
                     ----------------------------------------------------
                     union all
                     ----------------------------------------------------
                     (select * 
                      from {0} 
                      where dt  = %(date_from)s 
                      and {1} in ('ACT') 
                      order by random() limit 50000);"""

sql_data_sample = sql.SQL(sql_data_sample).format(sql.Identifier(table_name), 
                                                  sql.Identifier(col_name))

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.