0

I have the following rows (sample) in my postgresql from which I am trying to extracts words.

subset:|actuals_source:|comments:February_2021_FFRPrice_RegModelsOnly_Demand Sales Volume*CSOPprice_Commit_a57f193aede
subset:|actuals_source:|comments:china_market_CSOP_Demand Sales Volume_PFJan2021
subset:|actuals_source:zeros|comments:Functional_test_2_Cores_FWD_facing_regressor_leak_fix_commit_c596a521e07

The words I am trying to extract starts from comments: and are before Commit_c596a521e07 if the Commit word exists or it's till the end of line. (Also note that Commit can also be of lowercase as in commit)

So the extracted column should look like:

February_2021_FFRPrice_RegModelsOnly_Demand Sales Volume*CSOPprice
china_market_CSOP_Demand Sales Volume_PFJan2021
Functional_test_2_Cores_FWD_facing_regressor_leak_fix

I have tried the following pattern, but unfortunately it returns everything after Commit

substring(col_name from '.*comments:(.*?)(?:Commit|$)')

Is there a way to limit the selection, such that it returns words only before Commit (wherever Commit exists) or till the end of line.

1
  • Your data is in capture group 1 Commented May 6, 2021 at 16:27

1 Answer 1

2

You can match c in Commit in a case insensitive way using [cC] and make sure you match a _ before Commit and use

substring(col_name from 'comments:(.*?)(_[Cc]ommit|$)')

Note that the substring function returns the Group 1 value here:

If the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned

The pattern matches

  • comments: - a literal string
  • (.*?) - Group 1: any zero or more chars as few as possible
  • (_[Cc]ommit|$) - Group 2 (this value won't be returned): _Commit or _commit or end of string.

See a PostgreSQL demo:

WITH demo(col_name) AS (
   VALUES
      ('subset:|actuals_source:|comments:February_2021_FFRPrice_RegModelsOnly_Demand Sales Volume*CSOPprice_Commit_a57f193aede'),
      ('subset:|actuals_source:|comments:china_market_CSOP_Demand Sales Volume_PFJan2021'),
      ('subset:|actuals_source:zeros|comments:Functional_test_2_Cores_FWD_facing_regressor_leak_fix_commit_c596a521e07')
   )
SELECT substring(col_name from 'comments:(.*?)(_[Cc]ommit|$)') FROM demo;

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.