1

As Power Query does not provide support for RegEx how can I extract a number of 8 digits from a string.

The number can appear anywhere in the string.

I have tried to use M but didn't find any complete solution. I also tried the option "Column From Examples" but the complexity of the cases does not let me have a reliable result either.

I googled it but it seems there are no examples suitable for PowerQuery/M

abc123456 -->null

abc12345678 -->12345678

abc 12345678 -->12345678

abc 12345678aaaa -->12345678

abc 1111 ddd12345678aaaa 3467-->12345678

abc 1111 ddd123456aaaa 3467-->null

etc...

EDIT

I have added the following code as suggested by Alexis Olson

= Table.AddColumn(
    Sheet1_Sheet,
    "PatternMatched",
    (r) =>
        List.First(
            List.Select(
                List.Transform(
                    {0..Text.Length(r[String]) - 8},
                    each Text.Range(r[String], _, 8)
                ),
                each _ = Text.Select(_, {"0".."9"})
            )
        )
)

the code differs only for Sheet1_Sheet instead of #"Previous Step" because this is the field holding the data but I get the following error message "Expression.Error: The field 'String' of the record wasn't found. Details: Column1=abc123456" What am I doing wrong ?

EDIT2

I have found the error in my formula. Here the correct one

Table.AddColumn(
    #"Renamed Columns",
    "PatternMatched",
    (r) =>
        List.First(
            List.Select(
                List.Transform(
                    {0..Text.Length(r[Column]) - 8},
                    each Text.Range(r[Column], _, 8)
                ),
                each _ = Text.Select(_, {"0".."9"})
            )
        )
)

I had simply to replace from Alexis Olson's answer:

  • #"Previous Step" with #"Renamed Columns" because that is the previous step name in the "Applied Steps" section of the "Query Settings" in PQ
  • r[String] with r[Column] because "Column" is the name of the column containing the data I want to find the pattern in my PQ

1 Answer 1

3

It's not simple, but you could add a custom column that does this:

Table.AddColumn(
    #"Previous Step",
    "PatternMatched",
    (r) =>
        List.First(
            List.Select(
                List.Transform(
                    {0..Text.Length(r[String]) - 8},
                    each Text.Range(r[String], _, 8)
                ),
                each _ = Text.Select(_, {"0".."9"})
            )
        )
)

Let's take a look at what's happening here with the example r[String] = abc12345678.

  1. Since Text.Length(r[String]) = 11, we can look at substring ranges of length 8 starting at index 0 through index 11 - 8 = 3.
  2. So List.Transform({0,1,2,3}, each Text.Range("abc12345678", _, 8)) transforms the list {0,1,2,3} into a list of 8-long substring ranges: {"abc12345", "bc123456", "c1234567", "12345678"}.
  3. Now for each of these substrings, check if it consists of only digits by comparing each string to a version of itself containing only digits using Text.Select(_, {"0".."9"}) to strip all but the digit characters. Then List.Select each substring where this condition is true.
  4. The result is a list of substrings that are a length of 8 and contain only digits (empty if none exist). Use List.First to return the first string from this list.

Result Screenshot

Sign up to request clarification or add additional context in comments.

2 Comments

I have made a slight adjustment to your code in order to adapt it my sheet but now I have an error message. I have edited my original question with my code.
I have found the mistake in my formula and added a further edit with the explanation of the solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.