PostgreSQL regex_replace substitution for 2 groups

Question

I have column in Postgres db which has text in char varying data type. The text includes an uri which contains file name and resembles as below;

  The file is a file of \\88-77-99-666.abc.example.com\Folder1\Folder2\Folder3\Folder4\20221122\12345678.PDF [9bc8rer55c655f4cb5df763c61862d3fdde9557b0] is the sha1 of the file.

I am trying to get the file name 12345678.PDF and date 20221122 from the text content. However, regexp_replace either gives me everything till file name or everything after filename. I am trying to get only file name

1>> Regexp_replace(data, '.+\\', '')

Yields filename and everything after it

 2>> Regexp_replace(data, '\[.*', '')

Yields filename and everything after it

If I capture two groups like below I get same result as 1.

Regexp_replace(data, '.+\\|\[', '')

How can I substitute 2 groups and only get filename? Or what is the better way to achieve this? And I need to get the date value but if I can figure this out maybe I will be able to apply the learning for to extract date value. Thanks for your time.

You're running a replace function, so you'll need to capture the part that you want to keep and replace the rest of the string with it. Try something like Regexp_replace(data, '.+\\(.+)`.*', '\1') — CAustin
– CAustin, Commented Nov 23, 2022 at 1:54
I tried it but I getting the full string back. I tried this substring(data from '\w*.PDF') which returns the desired results but if the extension is not PDF then I am not getting the result. I could use \w*\.[aA-zZ] but the string has domain as example.vpc.com` resulting in undesired result. Trying to figure out how to further qualify the substring to get extensions such as Pdf, pdf, DOC, doc and its likes — Alsheik
– Alsheik, Commented Nov 23, 2022 at 5:44
@WiktorStribiżew I tried your suggestion and getting null results — Alsheik
– Alsheik, Commented Nov 23, 2022 at 18:43

Wiktor Stribiżew · Accepted Answer · 2022-11-25 12:08:29Z

1

You can use

SELECT REGEXP_MATCHES(
  'The file is a file of \\88-77-99-666.abc.example.com\Folder1\Folder2\Folder3\Folder4\20221122\2779780.PDF [9bc8rer55c655f4cb5df763c61862d3fdde9557b0] is the sha1 of the file.',
  '([^[:space:]\\/]+)\s+\[([^][]+)') AS Result;

See the DB fiddle, result:

Details:

([^[:space:]\\/]+) - Group 1: one or more chars other than \, / and whitespace
\s+ - one or more whitespaces
\[ - a [ char
([^][]+) - Group 2: one or more chars other than [ and ].

answered Nov 25, 2022 at 12:08

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Alsheik Over a year ago

Thanks @WiktorStribiżew, your answer got me what I wanted with slight change ([^[:space:]\\/]+)\s+\[ as it gets the required filename.pdf portion and I decided to go with substring vs regex_matches to avoid the curly braces, SQLize. Again thanks for the help. Now on to figuring out how to extract the date folder before \filename.pdfwill update the thread once I find the answer.

Alsheik Over a year ago

To find the date folder, '`\([0-9]{8})\`' gives me the desired results as its in YYYYMMDD. SQLize. Hope this helps someone.

Alsheik Over a year ago

Unable to get the 2xbackslashes in my previous comment, refer to SQLize link for answer

Collectives™ on Stack Overflow

PostgreSQL regex_replace substitution for 2 groups

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related