Isolate string starting before specific string

Question

My string is as follows:

'NAME NAME NAME 400ML NAME CODE'

I need to identify ML, go left to catch all digits before that and stop on first space to get:

400ML

Current code I have:

 SUBSTR(FIELD,CHARINDEX('ML',FIELD), 2)

Why not just regexp_substr(field, '\\d+ML')? Are there any non-digit chars you need to match before ML? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jul 17, 2020 at 18:09
No, only numbers are before ML but before number there might be a word that has ML stirng in it. That way number-ML will not be found in first place. NAMEML CODE MLNAME 400 ML CODE CODE. This can help: regexp_substr(REPLACE(field, ' ', ''), '\\d+ML') — marcin2x4
– marcin2x4, Commented Jul 17, 2020 at 18:16
If there may be spaces between the number and ML, you may match them with \s*. Also, what is ML is part of a longer word? Try regexp_substr(field, '\\d+\\s*ML\\b') — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jul 17, 2020 at 18:31
Yes, @WiktorStribiżew - I forgot to mention the case with space between number and ML. Your solution work's perfectly! — marcin2x4
– marcin2x4, Commented Jul 17, 2020 at 18:35

Wiktor Stribiżew · Accepted Answer · 2020-07-17 18:43:08Z

2

I suggest using

regexp_substr(field, '\\d+\\s*ML\\b')

This regex will make sure the ML is matched as a whole word, and if there are any whitespaces between a number and ML, they will also be matched.

See the regex demo.

Regex details

\d+ - 1 or more digits
\s* - 0 or more whitespaces
ML - a string ML
\b - a word boundary.

answered Jul 17, 2020 at 18:43

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ryszard Czech Over a year ago

Nice example of character class usage, especially the double backslashes, +1.

Gordon Linoff · Accepted Answer · 2020-07-17 17:58:21Z

2

You can use regexp_substr():

select regexp_substr(field, '[^ ]+ML')

Or for specifically alphanumeric characters:

select regexp_substr(field, '[a-zA-Z0-9]+ML')

If Snowflake is not greedy (which seems unlikely but is possible), then you can do:

select trim(regexp_substr(' ' || field, ' [a-zA-Z0-9]*ML'))

edited Jul 17, 2020 at 17:58

answered Jul 17, 2020 at 17:31

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

4 Comments

marcin2x4 Over a year ago

I noticed that WORD-M 400ML CODE returns ML

Gordon Linoff Over a year ago

@marcin2x4 . . . Usually regular expression matching is greedy by default. I am surprised this is not the case in Snowflake. Here is an example of the code working: dbfiddle.uk/….

marcin2x4 Over a year ago

try with 'WORD-ML 400ML CODE' :)

Gordon Linoff Over a year ago

@marcin2x4 . . . I would argue that it is doing the right thing. But if you want to require at least one character then use + in the second method.

Ryszard Czech · Accepted Answer · 2020-07-17 18:25:44Z

2

To extract number with ML as suffix use

select regexp_substr(field, '[0-9]+ML')

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  ML                       'ML'

answered Jul 17, 2020 at 18:25

Ryszard Czech

18.7k4 gold badges27 silver badges39 bronze badges

Collectives™ on Stack Overflow

Isolate string starting before specific string

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related