0

I am trying to pull a product code from a long set of string formatted like a URL address. The pattern is always 3 letters followed by 3 or 4 numbers (ex. ???### or ???####). I have tried using REGEXP and LIKE syntax, but my results are off for both/I am not sure which operators to use.

The first select statement is close to trimming the URL to show just the code, but oftentimes will show a random string of numbers it may find in the URL string.

The second select statement is more rudimentary, but I am unsure which operators to use.

Which would be the quickest solution?

SELECT columnName, SUBSTR(columnName, LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", columnName), LENGTH(columnName) - LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", REVERSE(columnName))) AS extractedData FROM tableName

SELECT columnName FROM tableName WHERE columnName LIKE '%___###%' OR columnName LIKE '%___####%' 
-- Will take a substring of this result as well

Example Data:

randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc

In this case, the desired string is "xyz123" and the location of said pattern is variable based on each entry.

EDIT

SELECT column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), SUBSTR(column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), LENGTH(column) - LOCATE(column REGEXP "^.*[a-zA-Z]{3}[0-9]{3,4}", REVERSE(column))) AS extractData From mainTable

This expression is still not grabbing the right data, but I feel like it may get me closer.

4
  • stackoverflow.com/questions/16646686/mysql-regexp-vs-like Commented Oct 7, 2021 at 8:07
  • What is your MySQL version? Commented Oct 7, 2021 at 8:29
  • @WiktorStribiżew I am running My SQL 8.0+ Commented Oct 7, 2021 at 8:50
  • Then try REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])') Commented Oct 7, 2021 at 8:57

2 Answers 2

0

I suggest using

REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])')

Details:

  • (?<=[&?]random_code=[^&#]{0,256}-) - immediately on the left, there must be & or &, random_code=, and then zero to 256 chars other than & and # followed with a - char
  • [a-zA-Z]{3} - three ASCII letters
  • [0-9]{3,4} - three to four ASCII digits
  • (?![^&#]) - that are followed either with &, # or end of string.

See the online demo:

WITH cte AS ( SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc' val
              UNION ALL
              SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz4567&hello_world=us&etc_etc'
              UNION ALL
              SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz89&hello_world=us&etc_etc'
              UNION ALL
              SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz00000&hello_world=us&etc_etc'
              UNION ALL
              SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-aaaaa11111&hello_world=us&etc_etc')
SELECT REGEXP_SUBSTR(val,'(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])') output
FROM cte

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

I'd make use of capture groups: (?<=[=\-\\])([a-zA-Z]{3}[\d]{3,4})(?=[&]) I assume with [^=\-] you wanted to capture string with "-","\" or "=" in front but not include those chars in the result. To do that use "positive lookbehind" (?<=. I also added a lookahead (?= for "&". If you'd like to fidget more with regex I recommend RegExr

4 Comments

Will select starting part if the number of digits is more than 4... dbfiddle.uk/…
@Revuimar I do not need to capture any "-" or "=" characters; I was hoping it would help with testing purposes to see where my match began.
@Akina in your 4th output example there are five digits, so no match should be detected I would hope
@Gordon This is true, and this is the problem of the provided solution. I hope that the answer author will fix this issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.