0

I'm generating random data to fill a database (without knowing how is the database before runtime). I can fill it if it has no constraints, but when it has i can't differenciate between values passing the check and values that don't.

Let's see an example. Table definition:

CREATE TABLE test (
    id INT,
    age INT CONSTRAINT adult CHECK (age > 18),
    PRIMARY KEY (id)
);

The data of that table that i have during runtime is:

  • Table and columns names
  • Columns types
  • Column UNIQUE, and NOT NULL
  • Column constraint definition as a string
  • Foreign keys

I can get more data from postgresql internal tables preferably from the information squema

I want to check the constraint before making an insert with that data. It's valid for me to do so using the database to check it, or to check it in code.

Here is a short snippet, try to detect when the check is False before the execution of the insert query:

# Data you have access to:
t_name = 'test'
t_col_names = ['id', 'age']
col_constraints = {
    'id': '',
    'age': 'age > 18'}
# you can access more data, 
# but you have to query the database to do so
id_value = 1
#I want to check values HERE
age_value = 17
#I want to check values HERE
values = (id_value, age_value)
#I could want to check HERE

query = "INSERT INTO test (id, age) VALUES (%s, %s);"
db_cursor.execute(query, values)

db_cursor.close()

Because of how data is generated in my application, managing the error thrown is not an option if it's done while/after executing the insert query, it would increment the cost of generating random data dramatically. EDIT to explain why try: is not an option:

If I wait for the exception, the problematic element that provoke a thrown error would already be in multiple queries.

Let's see in the previous example how this could happen. I generate a random data pool to pick from and generate tuples of insert values:

age_pool = (7, 19, 23, 48)
id_pool = (0,2,3,...,99) #It's not that random for better understanding

Now if I generate 100 insert queries and supposing 25% of them has a 7 in them (an age < 18). From a single value i have 25 invalid queries that will try to execute in the database (a costly operation by the way) to fail hopelessly. After that i would have to generate more random data in this case 25 more insert queries that could have the same problem if i generate a 8 for example.

On the other hand if i check just after generating the element, i check if it's a valid value and for one single element i have multiple valid combinations of values.

4
  • I omit the creation of db_cursor to keep the attention on the problem. Commented Sep 14, 2019 at 15:27
  • I hope you're generating data for test purposes only; I will assume so. As @bfris points you'll need a parser for the constraints, and some will not fall into their simple category. How about: check (a>b and a>c and a<b+c) or check (concat(a,b) is distinct from (c,d)). Just an example. Additionally what you indicated as "a costly operation" I would classify as "minimal acceptable reject level". Finally for any given constraint how will you know you have at least 1 that pass and at least 1 that fails. Commented Sep 15, 2019 at 20:53
  • @Belayer is there a way to re-use postgresql parser or psycopg2 parser? About the "for any given constraint how..." question, I will assume there will be no too complex checks. There will be always at least 1 case that pass and 1 that fails. If i discover a way to generate both cases without brute force (pass&fail) awesome, but it's not needed. I can brute force until a timeout and then leave it pending for manual revision where an administrator will be able to insert both manually, or discard the check if it's impossible or nonsense. Commented Sep 15, 2019 at 21:28
  • I'm not familiar psycopg2 so I cannot say anything about that. As far as calling the Postgres constraint parser, I've never seen anything on; I would doubt if it's possible. Good luck. Commented Sep 16, 2019 at 0:00

2 Answers 2

2
+50

You could use eval():

def constraint_check(constraints, keys, values):
    vals = dict(zip(keys, values))
    for k, v in constraints.items():
        if v and not eval(v.replace(k, str(vals[k]))):
            return False
    return True

t_name = 'test'
t_col_names = ['id', 'age']
col_constraints = {
    'id': '',
    'age': 'age > 18'}

id_value = 1
age_value = 17

values = (id_value, age_value)

if constraint_check(col_constraints, ('id', 'age'), values):
    query = "INSERT INTO test (id, age) VALUES (%s, %s);"
    db_cursor.execute(query, values)

However, this will work well only for very simple constraints. A Postgres check expression may include constructs specific for Postgres and not known in Python. For example, the app fails with this obviously valid constraint:

create table test(
    id int primary key, 
    age int check(age between 18 and 60));

I do not think you can implement the complete Postgres expression parser in Python in an easy way and whether this would be profitable to achieve the intended effect.

Sign up to request clarification or add additional context in comments.

Comments

0

It's not clear why a try...except clause is not desired. You test for the precise exception and keep going.

How about:

problem_inserts = []
try:
    db_cursor.execute(query, values)
    db_cursor.close()
except <your exception here>:
    problem_inserts.append(query)

In this snippet, you keep a list of all queries that didn't go through properly. I don't know what else you can do. I don't think you want to change the data to make it fit into the table.

2 Comments

I have updated my question to explain it better. I didn't mean that i can't use try statement. The point is I can't wait until the query is executing because it would increase the queries that will fail.
Aha! I didn't fully appreciate that you are trying to generate random data for the INSERTs. I looked for ways to lower the cost at the Postgres side (by transactions or similar), but coudn't find anything. All that's left is to do some difficult meta programming. You'll have to make your own parser and logic for handling CONSTRAINTS on fields in the table. If you're lucky, maybe you can get away with only handling very simple comparisons (>, <, =, <=, >=).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.