11

I am trying to extract a substring from a text column using a regular expression, but in some cases, there are multiple instances of that substring in the string.

In those cases, I am finding that the query does not return the first occurrence of the substring. Does anyone know what I am doing wrong?

For example:

If I have this data:

create table data1
(full_text text, name text);

insert into data1 (full_text)
values ('I 56, donkey, moon, I 92')

I am using

UPDATE data1
SET name = substring(full_text from '%#"I ([0-9]{1,3})#"%' for '#')

and I want to get 'I 56' not 'I 92'

3
  • Please add the definition of the table (as create table) , some sample data (ideally as insert statements) and the expected output based on that data (formatted text please no screen shots) Commented Feb 12, 2016 at 20:52
  • Edit your question. Do not post code in comments. Commented Feb 12, 2016 at 20:57
  • 2
    split_part(yourColumn, 'delimiter', 1) Commented Feb 12, 2016 at 20:59

3 Answers 3

15

You can use regexp_matches() instead:

update data1
  set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1];

As no additional flag is passed, regexp_matches() only returns the first match - but it returns an array so you need to pick the first (and only) element from the result (that's the [1] part)

It is probably a good idea to limit the update to only rows that would match the regex in the first place:

update data1
  set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1]
where full_text ~ 'I [0-9]{1,3}'
Sign up to request clarification or add additional context in comments.

4 Comments

I find it a little weird the regexp_matches() is implemented so that by default, it will return an array with a single item... they should've added a regexp_match() function
In my encounter with this issue I was extremely confused about why my array access wasn't working. I find this to be a related question/answer: Why is it necessary to add parenthesis when accessing an array returned by a fun
I have a big performance issue looking for a patern in a narrative because regex_match goes through the whole text searching for possible matches. I only need it to get first one that usually is at the beginning of the narrative and exit right away, so it can go through thousand of records more efficiently.
... perhaps regexp_substr() in Postgresql version 15 can do the job of getting only the first one.
2

Try the following expression. It will return the first occurrence:

SUBSTRING(full_text, 'I [0-9]{1,3}')

Comments

0

You can use regexp_match() In PostgreSQL 10+

select regexp_match('I 56, donkey, moon, I 92', 'I [0-9]{1,3}');

Quote from documentation:

In most cases regexp_matches() should be used with the g flag, since if you only want the first match, it's easier and more efficient to use regexp_match(). However, regexp_match() only exists in PostgreSQL version 10 and up. When working in older versions, a common trick is to place a regexp_matches() call in a sub-select...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.