0

Hello I am trying to extract the 7digit with a big query for extracting the 2670782 and 2670788 on this data

desc field data below

is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type 8888888 specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing 8888888 software like Aldus PageMaker including versions of Lorem Ipsum.

>> https://hello.com/pudding/answer/2670782?hl=en&ref_topic=7072943
>> https://hello.com/pudding/answer/2670788?hl=en&ref_topic=7072943

I have a query but there are also other 7digit number on the data other than the 2670782 and 2670788. so first I wanted to check if the line starts with ">>" and includes "hello.com" and I can extract it.

Here is the query that I have but it will grab the 8888888 as well which is not supposed to be.

SELECT
 desc,
 REGEXP_EXTRACT_ALL(desc, r"\/(\d{7})") AS num
FROM
 `table`
WHERE
 REGEXP_CONTAINS(DESCRIPTION, r"(>> )")
 AND REGEXP_CONTAINS(desc, r"(hello.com)")

I believe I need to check if the line starts with >> and it contains hello.com in a single regex formula and then I can extract the 7 digit number after the /. I am stuck so
Any help would be much appreciated!!

2
  • I'm lost. Is the sample data one row, multiple rows? What do do you mean "if the line starts with >>"? Commented Jun 29, 2021 at 1:33
  • In KaBoom's answer try adding a (:?m) at the beginning of the regex to allow ^ to match both start of string and newline. Commented Jun 29, 2021 at 5:16

1 Answer 1

2

You can use this regex if each of your inputs is one line

^>>.+hello.com.+\/(\d{7})

I test this regex in regex101.com with your input and the 1-line input assumption

UPDATE: You can replace the ">>" with newline character, then use the below regex to extract the number

hello.com.+\/(\d{7})

Here is the example:

WITH
  sample AS (
  SELECT
    '''start here not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing 8888888 software like Aldus PageMaker including versions of Lorem Ipsum. >> hello.com/pudding/answer/2670782?hl=en&ref_topic=7072943 >> hello.com/pudding/answer/2670788?hl=en&ref_topic=7072943
''' AS txt
  UNION ALL
  SELECT
    '''
is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type 8888888 specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing 8888888 software like Aldus PageMaker including versions of Lorem Ipsum.

>> https://hello.com/pudding/answer/2670786?hl=en&ref_topic=7072943
>> https://hello.com/pudding/answer/2670785?hl=en&ref_topic=7072943
'''),
  sample_new_line AS (
  SELECT
    REGEXP_REPLACE(txt, '>>', '\n') AS txt
  FROM
    sample)
SELECT
  REGEXP_EXTRACT_ALL(txt, r"hello.com.+\/(\d{7})") AS num
FROM
  sample_new_line;
Sign up to request clarification or add additional context in comments.

3 Comments

it does not work when I try to have multiple line here should be the input .................start here not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing 8888888 software like Aldus PageMaker including versions of Lorem Ipsum. >> hello.com/pudding/answer/2670782?hl=en&ref_topic=7072943 >> hello.com/pudding/answer/2670788?hl=en&ref_topic=7072943
Add this to the start of the regex string (:?m) . This allows the up-arrow to begin string or begin line.
@lipo I update my answer with your new test.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.