1

TL;DR:

I am trying to pass the names and values of one or more columns into a function in Postgres, which is used in a check constraint. This seems to work properly, until the column names need quoting (i.e., contain upper case letters), when I get 'column "x" does not exist' messages. If I quote the identifier the behaviour of the function changes.

I can't seem to find a way to reference both the column name and it's value in a function, called from a check constraint, if the column identifier needs quoting.


The full story:

In Postgres, I am trying to emulate a Unique constraint using a user defined function and a Check constraint.

I want to do this because I need a "conditional" Unique constraint, where the uniqueness may not be enforced if other conditions in the Check are/aren't met.

(I appreciate that an obvious answer might be "You don't want to do this", or "This is a bad idea", but I would appreciate answers that instead resolve the issue I'm having more directly.)

Current attempt:

As there may be more than one column included as part of the unique, I created function that accepts a table, an array of columns and an array of values.

CREATE OR REPLACE FUNCTION is_unique(_table text, _columns text[], _values text[]) RETURNS boolean AS
$$
DECLARE 
    result integer;
    statement text = 'SELECT (EXISTS (SELECT 1 FROM ' || quote_ident(_table) || ' WHERE ';
    first boolean = true;
BEGIN
    FOR i IN array_lower(_columns,1)..array_upper(_columns,1) LOOP
        IF first THEN
            statement = statement || quote_ident(_columns[i]) || '=' || _values[i];
            first = false;
        ELSE
            statement = statement || ' AND '|| quote_ident(_columns[i]) || '=' || _values[i];
        END IF;
    END LOOP;
    statement = statement || '))::int';
    EXECUTE statement INTO result;
    RETURN NOT result::boolean;
END
$$
LANGUAGE 'plpgsql';

What I am trying to do in this function is form a statement of the form:

SELECT 1 FROM _table WHERE _column[i]=_value[i] AND ...

This can then be used as part of a Check constraint, such as:

CHECK (is_unique('sometable'::text,'{somecolumn}'::text[],ARRAY[somecolumn]::text[]))

What is happening:

This structure appears to work when used with columns that do not need to be quoted, but otherwise the behaviour seems to break. Once I insert a single row, I am not able to insert another row even if the value is unique. I believe the problem is the value of the column is possibly being compared to itself, or the identifiers are being compared.

Does anyone have any suggestions as to how I should change my code to resolve this issue? Unfortunately, coping with quoted identifiers is important in this case.

4
  • 1
    Your function (could be simpler, but) works sqlfiddle.com/#!15/01204/8 -- have you heard about partial unique indexes? postgresql.org/docs/current/static/indexes-partial.html Commented Aug 12, 2014 at 12:06
  • A partial index might be useful, but to follow up on your suggestion a little further - while the test for uniqueness works, I can't seem to create a CHECK with this. If I try ALTER TABLE "test table" ADD CHECK (is_unique('test table', array['int col'], array['int col'])); there is a syntax error on the right half of the =. How can I resolve this? Commented Aug 12, 2014 at 13:00
  • @obfuscation and the exact text of the syntax error is ...? Commented Aug 12, 2014 at 13:34
  • I am still not convinced this function with dynamic SQL is the best solution. Have you tried solving it with a partial index? You mentioned you were preparing a more complete test case. Can you post a fiddle with it? Aside: do not quote the language name. LANGUAGE plpgsql is correct. Commented Aug 12, 2014 at 14:44

1 Answer 1

2

I think you might really be looking for partial unique indexes or exclusion constraints. The description is a bit too vague to really tell - there's no sample data, no example of "this should be allowed, this shouldn't", etc.

Consider:

CREATE UNIQUE INDEX some_idx_name
ON some_table (col1, col2, col3) WHERE (col1 != 4 AND col5 IS NOT NULL);

Attempts to emulate a unique index with a check constraint and a function are doomed to failure. It's not even "you don't want to do this", it's "this fundamentally cannot work".

Unique constraints and indexes are partially exempt from transactional visibility rules. An attempt to insert a duplicate into a unique index where the transaction that created the first copy hasn't committed yet will block until the first transaction commits or rolls back. That's why unique constraints work even though transactions can't see each others' uncommitted data. You cannot emulate this because PostgreSQL does not offer dirty-read isolation to transactions, there's simply no way to do it. (OK, so you could kind-of do it if you wrote your check constraint function in C, but it'd have nasty race conditions).

The only way it's possible to do what you want is if you LOCK TABLE ... IN EXCLUSIVE MODE in your function before doing anything. Failure to do so is guaranteed to result in concurrency-related bugs. However, if you do take an exclusive lock, then all writes will have to proceed serially, with only one transaction at a time having uncommitted changes to the table. Worse, attempts at concurrent writes will usually result in transactions being aborted due to deadlocks caused by lock upgrades.

So the only way you can do it reliably is have the application LOCK TABLE ... IN EXCLUSIVE MODE at the start of the transaction, before taking other locks, if it thinks it might need to write to it. I'm sure you can imagine how much fun that is for performance.

(BTW, functions called in check constraints are strictly supposed to be IMMUTABLE and not access data other than the arguments they're passed. PostgreSQL won't currently stop you breaking that rule because it's really handy to access nearly-always-unchanged lookup tables etc - but it does mean you might get unexpected results from the check constraint if you look at data that might change.)


Also, the function is quite inefficient - you're looping when you really don't need to and can just use some simple SQL. (Also, please indent your code for the sake of those coming after you).

This block:

FOR i IN array_lower(_columns,1)..array_upper(_columns,1) LOOP
  IF first THEN
    statement = statement || quote_ident(_columns[i]) || '=' || _values[i];
    first = false;
  ELSE
    statement = statement || ' AND '|| quote_ident(_columns[i]) || '=' || _values[i];
  END IF;
END LOOP;

is just a slow way of writing:

SELECT
  string_agg( 
    format('%I = %L', _columns[i], _values[i]),
    ' AND '
    ORDER BY i
  )
FROM generate_subscripts(_columns, 1) i;

but even then, there's still a bug: If the user passes NULL, you'll generate = NULL, which is plain wrong. You need to special-case NULL values, or use IS DISTINCT FROM, e.g.

format('%I IS DISTINCT FROM %L', _columns[i], _values[i])

however IS DISTINCT FROM can't use an index, so CASE might be more appropriate:

CASE
  WHEN _values[i] IS NOT NULL THEN
    format('%I = %L', _columns[i], _values[i]),
  ELSE
    format('%I IS NULL', _columns[i])
END
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for your comments – I appreciate that this circumstance is an incredibly unusual one and that I'm trying to do something that I shouldn't, as well as poorly explaining my motivation (this would likely make it more confusing). Is there anything that you can see I am doing wrong in the CHECK part of my solution? Given @pozs has shown the uniqueness condition works, I am still unsure why the function doesn't work when using capitalised column names.
@obfuscation Well, your username is proving pretty accurate ;-) as you haven't provided anything close to complete and runnable enough to really answer that usefully. Try showing a complete example - CREATE TABLE, INSERTs, etc.
@obfuscation Do you understand the issue with concurrency though? That not only is your attempt to abuse a check constraint logically wrong, but it's guaranteed not to work unless you take an exclusive lock?
I do appreciate the concurrency issue. In this case, I can guarantee only a single connection will be made, however. In the process of producing a more complete runnable example, I found a small typo which I believe was causing the problems, where array['colname'] should have been array["colname"]. However, given your patience and significant input on the general problem, I have accepted your answer. Thanks for the help and advice.
Oh, and with regards to the NULL behaviour, this is something I had previously considered, and handle as part of a condition expression in the CHECK constraint (i.e., either one or more columns are null, or the values are unique). However, I had omitted this detail to try and create a more minimal working example.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.