I'm generating random data to fill a database (without knowing how is the database before runtime). I can fill it if it has no constraints, but when it has i can't differenciate between values passing the check and values that don't.
Let's see an example. Table definition:
CREATE TABLE test (
id INT,
age INT CONSTRAINT adult CHECK (age > 18),
PRIMARY KEY (id)
);
The data of that table that i have during runtime is:
- Table and columns names
- Columns types
- Column
UNIQUE, andNOT NULL - Column constraint definition as a string
- Foreign keys
I can get more data from postgresql internal tables preferably from the information squema
I want to check the constraint before making an insert with that data. It's valid for me to do so using the database to check it, or to check it in code.
Here is a short snippet, try to detect when the check is False before the execution of the insert query:
# Data you have access to:
t_name = 'test'
t_col_names = ['id', 'age']
col_constraints = {
'id': '',
'age': 'age > 18'}
# you can access more data,
# but you have to query the database to do so
id_value = 1
#I want to check values HERE
age_value = 17
#I want to check values HERE
values = (id_value, age_value)
#I could want to check HERE
query = "INSERT INTO test (id, age) VALUES (%s, %s);"
db_cursor.execute(query, values)
db_cursor.close()
Because of how data is generated in my application, managing the error thrown is not an option if it's done while/after executing the insert query, it would increment the cost of generating random data dramatically.
EDIT to explain why try: is not an option:
If I wait for the exception, the problematic element that provoke a thrown error would already be in multiple queries.
Let's see in the previous example how this could happen. I generate a random data pool to pick from and generate tuples of insert values:
age_pool = (7, 19, 23, 48)
id_pool = (0,2,3,...,99) #It's not that random for better understanding
Now if I generate 100 insert queries and supposing 25% of them has a 7 in them (an age < 18). From a single value i have 25 invalid queries that will try to execute in the database (a costly operation by the way) to fail hopelessly. After that i would have to generate more random data in this case 25 more insert queries that could have the same problem if i generate a 8 for example.
On the other hand if i check just after generating the element, i check if it's a valid value and for one single element i have multiple valid combinations of values.