How to remove delimited sections from text in PostgreSQL?

Question

I want to eliminate some text pattern from a string, my string has a pipe delimiter, and the parameters do not always follow each other.

This is my string

TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3

I want to eliminate TType=SEND and URL=min://j?_a=3&ver=1.1

Therefore my final result should be

Status=OK|day=3

What i have tried. Not working in postgresql .

select REGEXP_REPLACE('TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3', 
'(TType=.*?(\||$))|(URL=.*?(\||$))', '')

@WiktorStribiżew It has failed to eliminate the URL parameter within its delimiter — Omari Victor Omosa
– Omari Victor Omosa, Commented Feb 15, 2021 at 15:11
The string is dynamic not always with same content, so TType does not always occur at the start — Omari Victor Omosa
– Omari Victor Omosa, Commented Feb 15, 2021 at 15:12

theodim · Accepted Answer · 2021-02-24 13:17:56Z

Answer:

SELECT 
REGEXP_REPLACE(
 REGEXP_REPLACE('TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3',
  '(TType|URL)=[^|]*\|?', '','g'),
'\|$', '');

Explanation:

The .*? part in your pattern, although not greedy, consumes colons as well, so doesn't behave as intended. This is fixed by [^|]* that consumes any non colon character, zero or more times.
Then you would also need to add the global flag 'g', in order to replace all occurences of pattern, as described in the documentation.
Finally, in case a parameter you need to eliminate occurs last (since the parameters can appear in any order), you need to add an extra replacement step to eliminate a residual colon at the end of the string.

For example without the extra step, the following

SELECT
REGEXP_REPLACE('Status=OK|URL=min://j?_a=3&ver=1.1|day=3|TType=SEND',
  '(TType|URL)=[^|]*\|?', '','g');

produces

Status=OK|day=3|

while, addding the extra step, the following

SELECT 
REGEXP_REPLACE(
 REGEXP_REPLACE('Status=OK|URL=min://j?_a=3&ver=1.1|day=3|TType=SEND',
  '(TType|URL)=[^|]*\|?', '','g'),
'\|$', '');

produces the desired

Status=OK|day=3

S-Man · Accepted Answer · 2021-02-15 15:08:09Z

3

step-by-step demo:db<>fiddle

SELECT
    string_agg(elements,'|')                                                 -- 3
FROM mytable,
    regexp_split_to_table(mystring, '\|') as elements                        -- 1
WHERE split_part(elements, '=', 1) = ANY(ARRAY['TType', 'URL']) IS NOT TRUE  -- 2

Split the string into params like A=B. Move every into a separate record
Split these elements at the = character and filter for elements without key = TType or URL
Finally aggregate all these first splits to a string list.

answered Feb 15, 2021 at 15:08

S-Man

24k9 gold badges51 silver badges78 bronze badges

9 Comments

Omari Victor Omosa Over a year ago

If i could have this in the select statement, before ..from

S-Man Over a year ago

Which part? The WHERE or the regexp_split_to_table?

S-Man Over a year ago

Works with subquery: dbfiddle.uk/…

S-Man Over a year ago

There you can use a FILTER clause instead the WHERE clause: dbfiddle.uk/…

Omari Victor Omosa Over a year ago

The solution is okay however i would want to have the solution between select and from . i.e. select (the solution here) from table

|

Lukasz Szozda · Accepted Answer · 2021-02-23 20:43:02Z

The S-Man's answer is a working one 👍

Sure upvoted, solution is okay however it does not fully satisfy my question. since i would want the solution to be within select and from

If this is a "mandatory" requirement then I see the following options:

create a function
use LATERAL JOIN to enclose all the logic into one place, related PostgreSQL: using a calculated column in the same query

The final query may look like:

SELECT t.*, s.result
FROM t
LEFT JOIN LATERAL (
   SELECT string_agg(elements,'|') AS result
   FROM regexp_split_to_table(t.col, '\|') as elements
   WHERE split_part(elements, '=', 1) = ANY(ARRAY['TType', 'URL']) IS NOT TRUE) s ON TRUE

db<>fiddle demo

Alternatively by using subquery in SELECT list:

SELECT t.*, 
(
   SELECT string_agg(elements,'|') AS result
   FROM regexp_split_to_table(t.col, '\|') as elements
   WHERE split_part(elements, '=', 1) = ANY(ARRAY['TType', 'URL']) IS NOT TRUE
) AS result
FROM t

db<>fiddle demo 2

Haleemur Ali · Accepted Answer · 2021-02-24 14:38:24Z

2

The following regex based solution should do the trick:

SELECT TRIM(REGEXP_REPLACE(
         'TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3', 
         '(TType|URL)=[^|]*(\||$)', '', 'g'), '|')
-- outputs:
-- Status=OK|day=3

How the pattern works:

(TType|URL)=[^|]*(\||$)
|-----------|----|-----
1           2    3

the pattern starts consuming if any substring starts with either TType or URL followed by =
the pattern consumes any character that is not |
the pattern consumes either the | or the end of the string

The g flag is described in the documentation as

flag g specifies replacement of each matching substring rather than only the first one.

It is necessary here as we want to replace all substrings that match our pattern.

Finally, sometimes a single | character might remain at the end of the string. Any trailing | character is trimmed from the result using TRIM

edited Feb 24, 2021 at 14:38

answered Feb 24, 2021 at 4:09

Haleemur Ali

28.6k6 gold badges67 silver badges89 bronze badges

2 Comments

Omari Victor Omosa Over a year ago

Thanks, (Status|TType|day)=[^|]*(\||$) i have tried and it is ok. is it possible to remove the pipe at the end from above regex. this is the result URL=min://j?_a=3&ver=1.1|

Haleemur Ali Over a year ago

@OmariVictorOmosa, see updated answer. using the TRIM function, the trailing | character can be removed.

Steve Chambers · Accepted Answer · 2021-02-24 16:27:58Z

2

There were a few issues with your attempted regular expression:

Even though a non-greedy .*? match was used, this could still include pipe symbols. This can be rectified by using a matcher that allows anything except a pipe symbol (this can be greedy): [^|]*
It should use the 'g' flag to replace all occurrences, not just the first.
It only looks for the pipe at the end, not at the beginning. This means it will leave the last pipe intact at the end if it matches the string after the last pipe (i.e. URL=... in your example).

From addressing the points above, here is a working version:

Rextester demo: https://rextester.com/CYBP40923

edited Feb 24, 2021 at 16:27

answered Feb 24, 2021 at 12:22

Steve Chambers

39.8k29 gold badges179 silver badges222 bronze badges

2 Comments

Omari Victor Omosa Over a year ago

great soln (Status|TType|day)=[^|]*([|]|$) why is it that this leaves a pipe in the end of the result.

Steve Chambers Over a year ago

It's because the regex is only looking for pipe symbols (or alternatively the end of the text) at the end of each match not at the beginning. If you need to deal with that then unfortunately you will need some repetition as you need to be looking for the pipe at either the start or the end but not both: ((Status|TType|day)=[^|]*[|]|[|](Status|TType|day)=[^|]*). (Have now edited my answer accordingly).

Collectives™ on Stack Overflow

How to remove delimited sections from text in PostgreSQL?

5 Answers 5

Comments

9 Comments

Comments

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

9 Comments

Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related