3

My string is as follows:

'NAME NAME NAME 400ML NAME CODE'

I need to identify ML, go left to catch all digits before that and stop on first space to get:

400ML

Current code I have:

 SUBSTR(FIELD,CHARINDEX('ML',FIELD), 2)
5
  • 1
    Why not just regexp_substr(field, '\\d+ML')? Are there any non-digit chars you need to match before ML? Commented Jul 17, 2020 at 18:09
  • No, only numbers are before ML but before number there might be a word that has ML stirng in it. That way number-ML will not be found in first place. NAMEML CODE MLNAME 400 ML CODE CODE. This can help: regexp_substr(REPLACE(field, ' ', ''), '\\d+ML') Commented Jul 17, 2020 at 18:16
  • 1
    If there may be spaces between the number and ML, you may match them with \s*. Also, what is ML is part of a longer word? Try regexp_substr(field, '\\d+\\s*ML\\b') Commented Jul 17, 2020 at 18:31
  • Yes, @WiktorStribiżew - I forgot to mention the case with space between number and ML. Your solution work's perfectly! Commented Jul 17, 2020 at 18:35
  • 1
    Great, I posted a full answer with an explanation and demo. Commented Jul 17, 2020 at 18:44

3 Answers 3

2

I suggest using

regexp_substr(field, '\\d+\\s*ML\\b')

This regex will make sure the ML is matched as a whole word, and if there are any whitespaces between a number and ML, they will also be matched.

See the regex demo.

Regex details

  • \d+ - 1 or more digits
  • \s* - 0 or more whitespaces
  • ML - a string ML
  • \b - a word boundary.
Sign up to request clarification or add additional context in comments.

1 Comment

Nice example of character class usage, especially the double backslashes, +1.
2

You can use regexp_substr():

select regexp_substr(field, '[^ ]+ML')

Or for specifically alphanumeric characters:

select regexp_substr(field, '[a-zA-Z0-9]+ML')

If Snowflake is not greedy (which seems unlikely but is possible), then you can do:

select trim(regexp_substr(' ' || field, ' [a-zA-Z0-9]*ML'))

4 Comments

I noticed that WORD-M 400ML CODE returns ML
@marcin2x4 . . . Usually regular expression matching is greedy by default. I am surprised this is not the case in Snowflake. Here is an example of the code working: dbfiddle.uk/….
try with 'WORD-ML 400ML CODE' :)
@marcin2x4 . . . I would argue that it is doing the right thing. But if you want to require at least one character then use + in the second method.
2

To extract number with ML as suffix use

select regexp_substr(field, '[0-9]+ML')

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  ML                       'ML'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.