14

I have a column containing an array of authors. How can I use the ~* operator to check if any of its values match a given regular expression?

The ~* operator takes the string to check on the left and the regular expression to match on the right. The documentation says the ANY operator has to be on the right side so, obviously

SELECT '^p' ~* ANY(authors) FROM book;

does not work as PostgreSQL tries to match the string ^p against expressions contained in the array.

Any idea?

6
  • 1
    You should normalize your data, so you don't have "an array" in a single column. Commented Feb 28, 2014 at 15:14
  • Debatable... I do not need / want to maintain a table for authors. It could be telephone numbers though. Commented Feb 28, 2014 at 15:36
  • Well, whether or not you want to maintain a separate table, this is precisely why everyone says to do so. Searching a list within a single column always gets messy, sooner or later. Commented Feb 28, 2014 at 15:38
  • Well, it works with symetrical operators like = and it does a great job. Commented Feb 28, 2014 at 15:43
  • That's fine; knock yourself out. Just a friendly tip from someone who has also had tables that worked fine for this kind of thing, until they didn't. :) I don't know the answer to your question, or I'd try to help that way, as well. Commented Feb 28, 2014 at 15:45

7 Answers 7

13

The first obvious idea is to use your own regexp-matching operator with commuted arguments:

create function commuted_regexp_match(text,text) returns bool as
'select $2 ~* $1;'
language sql;

create operator ~!@# (
 procedure=commuted_regexp_match(text,text),
 leftarg=text, rightarg=text
);

Then you may use it like this:

SELECT '^p' ~!@# ANY(authors) FROM book;

Another different way of looking at it to unnest the array and formulate in SQL the equivalent of the ANY construct:

select bool_or(r) from 
  (select author ~* '^j' as r
    from (select unnest(authors) as author from book) s1 
  ) s2;
Sign up to request clarification or add additional context in comments.

Comments

7
SELECT * FROM book where   EXISTS ( SELECT * from unnest(author) as X where x ~* '^p' )  

Comments

4

Here's an idea if you can make reasonable assumptions about the data. Just concatenate the array into a string and do a regex-search against the whole string.

select array_to_string(ARRAY['foo bar', 'moo cow'], ',') ~ 'foo'

Off the cuff, and without any measurements to back me up, I would say that most performance issues related to the regex stuff could be dealt with by smart uses of regex, and maybe some special delimiter characters. Creating the string may be a performance issue, but I wouldn't even dare to speculate on that.

Comments

2

I use this:

create or replace function regexp_match_array(a text[], regexp text)
returns boolean
strict immutable
language sql as $_$
select exists (select * from unnest(a) as x where x ~ regexp);
$_$;

comment on function regexp_match_array(text[], text) is
  'returns TRUE if any element of a matches regexp';

create operator ~ (
 procedure=regexp_match_array,
 leftarg=text[], rightarg=text
);

comment on operator ~(text[], text) is
  'returns TRUE if any element of ARRAY (left) matches REGEXP (right); think ANY(ARRAY) ~ REGEXP';

Then use it much like you'd use ~ with text scalars:

=> select distinct gl from x where gl ~ 'SH' and array_length(gl,1) < 7;
┌──────────────────────────────────────┐
│                  gl                  │
├──────────────────────────────────────┤
│ {MSH6}                               │
│ {EPCAM,MLH1,MSH2,MSH6,PMS2}          │
│ {SH3TC2}                             │
│ {SHOC2}                              │
│ {BRAF,KRAS,MAP2K1,MAP2K2,SHOC2,SOS1} │
│ {MSH2}                               │
└──────────────────────────────────────┘
(6 rows)

Comments

1

You can define your own operator to do what you want.

Reverse the order of the arguments and call the appropriate function :

create function revreg (text, text) returns boolean 
language sql immutable 
as $$ select texticregexeq($2,$1); $$;

(revreg ... please choose your favorite name).

Add a new operator using our revreg() function :

CREATE OPERATOR ### (
    PROCEDURE = revreg,
    LEFTARG = text,
    RIGHTARG = text
 );

Test:

 test=# SELECT '^p' ### ANY(ARRAY['ika', 'pchu']);
  t
 test=# SELECT '^p' ### ANY(ARRAY['ika', 'chu']);
  f
 test=# SELECT '^p' ### ANY(ARRAY['pika', 'pchu']);
  t
 test=# SELECT '^p' ### ANY(ARRAY['pika', 'chu']);
  t

Note that you may want to set JOIN and RESTICT clauses to the new operator to help the planner.

Comments

1

My solution

SELECT a.* FROM books a
CROSS JOIN LATERAL (
   SELECT author
   FROM unnest(authors) author
   WHERE author ~ E'p$'
   LIMIT 1
)b;

Use cross lateral join, subquery is evaluated for every row of table "books", if one of rows returned by unnest, meets the condition, subquery returns one row (becouse of limit).

Comments

1

I use a generalization of Reece's approach:

select format($$
    create function %1$s(a text[], regexp text) returns boolean
    strict immutable language sql as
    %2$L;
    create operator %3$s (procedure=%1$s, leftarg=text[], rightarg=text);
    $$, /*1*/nameprefix||'_array_'||oname, /*2*/q1||o||q2, /*3*/oprefix||o
  )
from (values
        ('tilde'      , '~'  ), ('bang_tilde'      , '!~'  ),
        ('tilde_star' , '~*' ), ('bang_tilde_star' , '!~*' ),
        ('dtilde'     , '~~' ), ('bang_dtilde'     , '!~~' ),
        ('dtilde_star', '~~*'), ('bang_dtilde_star', '!~~*')
     ) as _(oname, o),
     (values
        ('any', '',  'select exists (select * from unnest(a) as x where x ', ' regexp);'),
        ('all', '@', 'select true = all (select x ', ' regexp from unnest(a) as x);')
     ) as _2(nameprefix, oprefix, q1, q2)
\gexec

Executing this in psql creates 16 functions and 16 operators that cover all applicable 8 matching operators for arrays -- plus 8 variations prefixed with @ that implement the ALL equivalent.

Very handy!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.