How to dynamically check postgresql constraints in psycopg2?

Question

I'm generating random data to fill a database (without knowing how is the database before runtime). I can fill it if it has no constraints, but when it has i can't differenciate between values passing the check and values that don't.

Let's see an example. Table definition:

CREATE TABLE test (
    id INT,
    age INT CONSTRAINT adult CHECK (age > 18),
    PRIMARY KEY (id)
);

The data of that table that i have during runtime is:

Table and columns names
Columns types
Column UNIQUE, and NOT NULL
Column constraint definition as a string
Foreign keys

I can get more data from postgresql internal tables preferably from the information squema

I want to check the constraint before making an insert with that data. It's valid for me to do so using the database to check it, or to check it in code.

Here is a short snippet, try to detect when the check is False before the execution of the insert query:

# Data you have access to:
t_name = 'test'
t_col_names = ['id', 'age']
col_constraints = {
    'id': '',
    'age': 'age > 18'}
# you can access more data, 
# but you have to query the database to do so
id_value = 1
#I want to check values HERE
age_value = 17
#I want to check values HERE
values = (id_value, age_value)
#I could want to check HERE

query = "INSERT INTO test (id, age) VALUES (%s, %s);"
db_cursor.execute(query, values)

db_cursor.close()

Because of how data is generated in my application, managing the error thrown is not an option if it's done while/after executing the insert query, it would increment the cost of generating random data dramatically. EDIT to explain why try: is not an option:

If I wait for the exception, the problematic element that provoke a thrown error would already be in multiple queries.

Let's see in the previous example how this could happen. I generate a random data pool to pick from and generate tuples of insert values:

age_pool = (7, 19, 23, 48)
id_pool = (0,2,3,...,99) #It's not that random for better understanding

Now if I generate 100 insert queries and supposing 25% of them has a 7 in them (an age < 18). From a single value i have 25 invalid queries that will try to execute in the database (a costly operation by the way) to fail hopelessly. After that i would have to generate more random data in this case 25 more insert queries that could have the same problem if i generate a 8 for example.

On the other hand if i check just after generating the element, i check if it's a valid value and for one single element i have multiple valid combinations of values.

I omit the creation of db_cursor to keep the attention on the problem. — Raulillo
– Raulillo, Commented Sep 14, 2019 at 15:27
I hope you're generating data for test purposes only; I will assume so. As @bfris points you'll need a parser for the constraints, and some will not fall into their simple category. How about: check (a>b and a>c and a<b+c) or check (concat(a,b) is distinct from (c,d)). Just an example. Additionally what you indicated as "a costly operation" I would classify as "minimal acceptable reject level". Finally for any given constraint how will you know you have at least 1 that pass and at least 1 that fails. — Belayer
– Belayer, Commented Sep 15, 2019 at 20:53
@Belayer is there a way to re-use postgresql parser or psycopg2 parser? About the "for any given constraint how..." question, I will assume there will be no too complex checks. There will be always at least 1 case that pass and 1 that fails. If i discover a way to generate both cases without brute force (pass&fail) awesome, but it's not needed. I can brute force until a timeout and then leave it pending for manual revision where an administrator will be able to insert both manually, or discard the check if it's impossible or nonsense. — Raulillo
– Raulillo, Commented Sep 15, 2019 at 21:28
I'm not familiar psycopg2 so I cannot say anything about that. As far as calling the Postgres constraint parser, I've never seen anything on; I would doubt if it's possible. Good luck. — Belayer
– Belayer, Commented Sep 16, 2019 at 0:00

Raulillo · Accepted Answer · 2019-09-23 17:00:00Z

You could use eval():

def constraint_check(constraints, keys, values):
    vals = dict(zip(keys, values))
    for k, v in constraints.items():
        if v and not eval(v.replace(k, str(vals[k]))):
            return False
    return True

t_name = 'test'
t_col_names = ['id', 'age']
col_constraints = {
    'id': '',
    'age': 'age > 18'}

id_value = 1
age_value = 17

values = (id_value, age_value)

if constraint_check(col_constraints, ('id', 'age'), values):
    query = "INSERT INTO test (id, age) VALUES (%s, %s);"
    db_cursor.execute(query, values)

However, this will work well only for very simple constraints. A Postgres check expression may include constructs specific for Postgres and not known in Python. For example, the app fails with this obviously valid constraint:

create table test(
    id int primary key, 
    age int check(age between 18 and 60));

I do not think you can implement the complete Postgres expression parser in Python in an easy way and whether this would be profitable to achieve the intended effect.

bfris · Accepted Answer · 2019-09-14 16:23:37Z

0

It's not clear why a try...except clause is not desired. You test for the precise exception and keep going.

How about:

problem_inserts = []
try:
    db_cursor.execute(query, values)
    db_cursor.close()
except <your exception here>:
    problem_inserts.append(query)

In this snippet, you keep a list of all queries that didn't go through properly. I don't know what else you can do. I don't think you want to change the data to make it fit into the table.

answered Sep 14, 2019 at 16:23

bfris

5,9432 gold badges26 silver badges45 bronze badges

2 Comments

Raulillo Over a year ago

I have updated my question to explain it better. I didn't mean that i can't use try statement. The point is I can't wait until the query is executing because it would increase the queries that will fail.

bfris Over a year ago

Aha! I didn't fully appreciate that you are trying to generate random data for the INSERTs. I looked for ways to lower the cost at the Postgres side (by transactions or similar), but coudn't find anything. All that's left is to do some difficult meta programming. You'll have to make your own parser and logic for handling CONSTRAINTS on fields in the table. If you're lucky, maybe you can get away with only handling very simple comparisons (>, <, =, <=, >=).

Collectives™ on Stack Overflow

How to dynamically check postgresql constraints in psycopg2?

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related