3

I'd like to use some processing function instead of replacement text in regexp_replace function. Is it possible to use a function with matches as arguments, like:

SELECT REGEXP_REPLACE( description, '\[([is]?:?)(\d+)\]', some_custom_function(\1, \2), 'gi' ) 
  FROM some_table
  WHERE id = 123;

Of course, backreferences \1 and \2 do not work here, but is there some way to use variables instead of them?

Background: description may contain more than one string with a pattern like [12345] or [i:12345] or [s:12345] and I'd like to process such matches into advanced strings, so I thought the best way is to write a custom function and call it with matches of regex_replace. I just did not found in the docs, how could I use matches as variable/arguments. Is this possible or what is a better way to accomplish my goal?

Example description field:

Lorem ipsum dolor sit amet [123], consectetur adipiscing elit, 
sed do eiusmod tempor incididunt ut labore et [234] magna aliqua.

Desired output:

Lorem ipsum dolor sit amet "one-two-three", consectetur adipiscing elit, 
sed do eiusmod tempor incididunt ut labore et "two-three-four" magna aliqua.
1

3 Answers 3

2
+300

Create a function with text argument(s), e.g.:

create or replace function the_function(text, text)
returns text language sql as $$
    select format('=replacement of %s and %s=', $1, $2)
$$;

and use it in regexp_replace():

select regexp_replace(
    description, 
    '\[([is]?:?)(\d+)\]', 
    the_function('\1', '\2'), 
    'gi') 
from some_table
where id = 123;

Db<>fiddle.

Sign up to request clarification or add additional context in comments.

8 Comments

Thank you, it is exactly what I was looking for. I did not figure out that is possible to quote two arguments separately. Somehow it did not click...
I don't see how this answers the question. Isn't the_function called with literally \1 and \2?
@xehpuk - the syntax is important here. The lesson from this question is that you can use back references as arguments of a function in the form of text literals.
@klin But then the function never sees the matched values, only the string literals. What's the difference between the_function('\1', '\2') and '=replacement of \1 and \2='? How do you achieve the desired output?
@w.k I agree. But with this solution, you don't call the_function with the matches. I'm surprised it worked for you.
|
1

This may be what you're trying to do:

CREATE OR REPLACE FUNCTION my_replace(description TEXT)
RETURNS TEXT AS $$
  SELECT string_agg(concat(l, some_custom_function(u[1], u[2])), '')
  FROM ROWS FROM (
    regexp_split_to_table(description, '\[(?:([is]):)?(\d+)\]', 'i'),
    regexp_matches(description, '\[(?:([is]):)?(\d+)\]', 'gi')
  ) AS m(l, u)
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;

I have changed your regex from \[([is]?:?)(\d+)\] to \[(?:([is]):)?(\d+)\] so that you don't match [i123] or [:123]. This also leads to the "prefix" being NULL instead of '' for matches like [123]. Revert to your original if this was not your intend.

Example for your custom function:

CREATE OR REPLACE FUNCTION some_custom_function(prefix TEXT, number TEXT)
RETURNS TEXT AS $$
  SELECT concat((upper(prefix) || '='), CASE number
    WHEN '123' THEN '"one-two-three"'
    WHEN '234' THEN '"two-three-four"'
    ELSE CASE
      WHEN number IS NOT NULL THEN '#foo'
    END
  END)
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;

Test with no, lowercase and uppercase "prefix":

SELECT unnest, my_replace(unnest)
FROM unnest(array[$$Lorem ipsum dolor sit amet [123], consectetur adipiscing elit [42],
sed do eiusmod [i:123] tempor incididunt ut labore et [S:234] magna aliqua.$$])

The result is:

Lorem ipsum dolor sit amet "one-two-three", consectetur adipiscing elit #foo,
sed do eiusmod I="one-two-three" tempor incididunt ut labore et S="two-three-four" magna aliqua.

Try it out: db<>fiddle

It basically works like this:

  1. Split/unzip the text into non-matches and matches.
  2. Map each of them.
  3. Aggregate/zip them back into a text.

Documentation for regexp_split_to_table and regexp_matches:

Comments

0

Is this possible or what is a better way to accomplish my goal?

I'd recommend just using a User Defined Function for this and iterating through the characters using a parsing method rather than trying to incorporate it into a regular expression.

Below is a proof of concept to start off with. There is scope for improving this by using a table of values and their replacements instead of hard-coding them, which would be cleaner and certainly necessary if there are a lot more values - please let me know in the comments if this is required.

SQL

CREATE OR REPLACE FUNCTION replaceDescription(description TEXT)
   RETURNS TEXT
   LANGUAGE PLPGSQL
AS
$$
DECLARE 
   pos INT := 1;
   chr CHAR;
   parsing BOOLEAN := FALSE;
BEGIN
   WHILE pos <= CHAR_LENGTH(description) LOOP
      chr := SUBSTRING(description, pos, 1);
      IF chr = '[' THEN
          parsing := TRUE;
      ELSIF chr = ']' THEN
          parsing := FALSE;
      ELSIF parsing THEN
          IF chr = '1' THEN
              description := CONCAT(LEFT(description, pos - 1),
                                    'one-',
                                    RIGHT(description, CHAR_LENGTH(description) - pos));
              pos := pos + 3;
          ELSIF chr = '2' THEN
              description := CONCAT(LEFT(description, pos - 1),
                                    'two-',
                                    RIGHT(description, CHAR_LENGTH(description) - pos));
              pos := pos + 3;
          ELSIF chr = '3' THEN
              description := CONCAT(LEFT(description, pos - 1),
                                    'three-',
                                    RIGHT(description, CHAR_LENGTH(description) - pos));
              pos := pos + 5;
          ELSIF chr = '4' THEN
              description := CONCAT(LEFT(description, pos - 1),
                                    'four-',
                                    RIGHT(description, CHAR_LENGTH(description) - pos));
              pos := pos + 4;
          END IF;
      END IF;
      pos := pos + 1;
   END LOOP;
   
   RETURN REPLACE(description, '-]', ']');
END;
$$;

Demo

Rextester demo: https://rextester.com/IBGE47544

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.