84

How can I count the number of occurrences of a substring within a string in PostgreSQL?


Example:

I have a table

CREATE TABLE test."user"
(
  uid integer NOT NULL,
  name text,
  result integer,
  CONSTRAINT pkey PRIMARY KEY (uid)
)

I want to write a query so that the result contains column how many occurrences of the substring o the column name contains. For instance, if in one row, name is hello world, the column result should contain 2, since there are two o in the string hello world.

In other words, I'm trying to write a query that would take as input:

enter image description here

and update the result column:

enter image description here


I am aware of the function regexp_matches and its g option, which indicates that the full (g = global) string needs to be scanned for the presence of all occurrences of the substring).

Example:

SELECT * FROM regexp_matches('hello world', 'o', 'g');

returns

{o}
{o}

and

SELECT COUNT(*)  FROM regexp_matches('hello world', 'o', 'g');

returns

2

But I don't see how to write an UPDATE query that would update the result column in such a way that it would contain how many occurrences of the substring o the column name contains.

1

6 Answers 6

96

A common solution is based on this logic: replace the search string with an empty string and divide the difference between old and new length by the length of the search string

(CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'substring', ''))) 
/ CHAR_LENGTH('substring')

Hence:

UPDATE test."user"
SET result = 
    (CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'o', ''))) 
    / CHAR_LENGTH('o');
Sign up to request clarification or add additional context in comments.

3 Comments

This is a solid answer, and it's right. You may be interested on my write up all the methods of doing this
Thanks! Does anyone know, why there is no simpler way? I mean, REPLACE already goes through the trouble of scanning the whole string for all the occurrences, why not have something that does half of the work of REPLACE - just count the occurrences
@AleksandrLevchuk: Well, you can write your own User Defined Function doing this calculation, e.g. there's Oracle's REGEXP_COUNT in enterprisedb.com/docs/en/9.5/eeguide/….
78

A Postgres'y way of doing this converts the string to an array and counts the length of the array (and then subtracts 1):

select array_length(string_to_array(name, 'o'), 1) - 1

Note that this works with longer substrings as well.

Hence:

update test."user"
    set result = array_length(string_to_array(name, 'o'), 1) - 1;

5 Comments

If someone needs regexp, this solution with "regexp_split_to_array" instead of "string_to_array" works too.
This solution is substantially slower than @dnoeth's suggestion. I don't think it's more-Postgres-y. When things are faster and more portable in a different method, I think we call that better. =)
@EvanCarroll Unfortunately, dnoeth's answer won't work for regex matches, since you may not know the length of the match. This answer will work for both regex matches and raw string matches. I think what we call better is the solution that works for everything you are trying to do :)
upvoted but i don't see what's "postgres'y" about it; let's not borrow pythonic characterizations :)
@user2297550 . . . Most databases don't support arrays natively.
12

Other way:

UPDATE test."user" SET result = length(regexp_replace(name, '[^o]', '', 'g'));

1 Comment

This is the simplest way.
9

Return count of character,

 SELECT (LENGTH('1.1.1.1') - LENGTH(REPLACE('1.1.1.1','.',''))) AS count
--RETURN COUNT OF CHARACTER '.'

3 Comments

This was the most readable solution for me.
This works fine for just one character, but the division is needed in the accepted answer for substrings longer than one character
@StonedTensor - you could try with the TRANSLATE() function!
7
Occcurence_Count = LENGTH(REPLACE(string_to_search,string_to_find,'~'))-LENGTH(REPLACE(string_to_search,string_to_find,''))

This solution is a bit cleaner than many that I have seen, especially with no divisor. You can turn this into a function or use within a Select.
No variables required. I use tilde as a replacement character, but any character that is not in the dataset will work.

Comments

1
SELECT array_length(string_to_array('a long name here', 'o'),1)
  • 1 is for 1-dimension array
  • 'o' is the occurrence you want to count

1 Comment

Clever :) To be exact, you need to subtract one at the end. E.g. on one occurance the array will have the length == 2

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.