2

I want to eliminate some text pattern from a string, my string has a pipe delimiter, and the parameters do not always follow each other.

This is my string

TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3

I want to eliminate TType=SEND and URL=min://j?_a=3&ver=1.1

Therefore my final result should be

Status=OK|day=3

What i have tried. Not working in postgresql .

select REGEXP_REPLACE('TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3', 
'(TType=.*?(\||$))|(URL=.*?(\||$))', '')
4
  • @WiktorStribiżew It has failed to eliminate the URL parameter within its delimiter Commented Feb 15, 2021 at 15:11
  • The string is dynamic not always with same content, so TType does not always occur at the start Commented Feb 15, 2021 at 15:12
  • 1
    Ah, ok, I see, these params are consecutive. Commented Feb 15, 2021 at 15:12
  • @WiktorStribiżew any luck? Commented Feb 15, 2021 at 18:46

5 Answers 5

5
+50

Answer:

SELECT 
REGEXP_REPLACE(
 REGEXP_REPLACE('TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3',
  '(TType|URL)=[^|]*\|?', '','g'),
'\|$', '');

Explanation:

  1. The .*? part in your pattern, although not greedy, consumes colons as well, so doesn't behave as intended. This is fixed by [^|]* that consumes any non colon character, zero or more times.

  2. Then you would also need to add the global flag 'g', in order to replace all occurences of pattern, as described in the documentation.

  3. Finally, in case a parameter you need to eliminate occurs last (since the parameters can appear in any order), you need to add an extra replacement step to eliminate a residual colon at the end of the string.

For example without the extra step, the following

SELECT
REGEXP_REPLACE('Status=OK|URL=min://j?_a=3&ver=1.1|day=3|TType=SEND',
  '(TType|URL)=[^|]*\|?', '','g');

produces

Status=OK|day=3|

while, addding the extra step, the following

SELECT 
REGEXP_REPLACE(
 REGEXP_REPLACE('Status=OK|URL=min://j?_a=3&ver=1.1|day=3|TType=SEND',
  '(TType|URL)=[^|]*\|?', '','g'),
'\|$', '');

produces the desired

Status=OK|day=3
Sign up to request clarification or add additional context in comments.

Comments

3

step-by-step demo:db<>fiddle

SELECT
    string_agg(elements,'|')                                                 -- 3
FROM mytable,
    regexp_split_to_table(mystring, '\|') as elements                        -- 1
WHERE split_part(elements, '=', 1) = ANY(ARRAY['TType', 'URL']) IS NOT TRUE  -- 2
  1. Split the string into params like A=B. Move every into a separate record
  2. Split these elements at the = character and filter for elements without key = TType or URL
  3. Finally aggregate all these first splits to a string list.

9 Comments

If i could have this in the select statement, before ..from
Which part? The WHERE or the regexp_split_to_table?
Works with subquery: dbfiddle.uk/…
There you can use a FILTER clause instead the WHERE clause: dbfiddle.uk/…
The solution is okay however i would want to have the solution between select and from . i.e. select (the solution here) from table
|
2

The S-Man's answer is a working one 👍

Sure upvoted, solution is okay however it does not fully satisfy my question. since i would want the solution to be within select and from

If this is a "mandatory" requirement then I see the following options:

  1. create a function
  2. use LATERAL JOIN to enclose all the logic into one place, related PostgreSQL: using a calculated column in the same query

The final query may look like:

SELECT t.*, s.result
FROM t
LEFT JOIN LATERAL (
   SELECT string_agg(elements,'|') AS result
   FROM regexp_split_to_table(t.col, '\|') as elements
   WHERE split_part(elements, '=', 1) = ANY(ARRAY['TType', 'URL']) IS NOT TRUE) s ON TRUE

db<>fiddle demo

Alternatively by using subquery in SELECT list:

SELECT t.*, 
(
   SELECT string_agg(elements,'|') AS result
   FROM regexp_split_to_table(t.col, '\|') as elements
   WHERE split_part(elements, '=', 1) = ANY(ARRAY['TType', 'URL']) IS NOT TRUE
) AS result
FROM t

db<>fiddle demo 2

Comments

2

The following regex based solution should do the trick:

SELECT TRIM(REGEXP_REPLACE(
         'TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3', 
         '(TType|URL)=[^|]*(\||$)', '', 'g'), '|')
-- outputs:
-- Status=OK|day=3

How the pattern works:

(TType|URL)=[^|]*(\||$)
|-----------|----|-----
1           2    3
  1. the pattern starts consuming if any substring starts with either TType or URL followed by =
  2. the pattern consumes any character that is not |
  3. the pattern consumes either the | or the end of the string

The g flag is described in the documentation as

flag g specifies replacement of each matching substring rather than only the first one.

It is necessary here as we want to replace all substrings that match our pattern.

Finally, sometimes a single | character might remain at the end of the string. Any trailing | character is trimmed from the result using TRIM

2 Comments

Thanks, (Status|TType|day)=[^|]*(\||$) i have tried and it is ok. is it possible to remove the pipe at the end from above regex. this is the result URL=min://j?_a=3&ver=1.1|
@OmariVictorOmosa, see updated answer. using the TRIM function, the trailing | character can be removed.
2

There were a few issues with your attempted regular expression:

  1. Even though a non-greedy .*? match was used, this could still include pipe symbols. This can be rectified by using a matcher that allows anything except a pipe symbol (this can be greedy): [^|]*
  2. It should use the 'g' flag to replace all occurrences, not just the first.
  3. It only looks for the pipe at the end, not at the beginning. This means it will leave the last pipe intact at the end if it matches the string after the last pipe (i.e. URL=... in your example).

From addressing the points above, here is a working version:

SELECT REGEXP_REPLACE('TType=SEND|Status=OK|URL=min://j?_a=3&ver=1.1|day=3', '((Status|TType)=[^|]*[|]|[|](Status|TType)=[^|]*)', '', 'g')

Rextester demo: https://rextester.com/CYBP40923

2 Comments

great soln (Status|TType|day)=[^|]*([|]|$) why is it that this leaves a pipe in the end of the result.
It's because the regex is only looking for pipe symbols (or alternatively the end of the text) at the end of each match not at the beginning. If you need to deal with that then unfortunately you will need some repetition as you need to be looking for the pipe at either the start or the end but not both: ((Status|TType|day)=[^|]*[|]|[|](Status|TType|day)=[^|]*). (Have now edited my answer accordingly).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.