1

I have an URLs from the access log. Example: /someService/US/getPersonFromAllAccessoriesByDescription/67814/alloy%20nudge%20w

/someService/NZ/asdNmasdf423-asd342e/getDealerFromSomethingSomething/FS443GH/front%20parking%20sen

I cannot make any assumption on the service name or the function name.

I'm trying to find a regex that can only match in the first log:

67814
alloy%20nudge%20w

and in the second:

asdNmasdf423-asd342e
FS443GH
front%20parking%20sen

with some heuristic, I tried to use [a-zA-Z0-9_%-]{15,}|[A-Z0-9]{5,} match only long strings but the function names(getPersonFromAllAccessoriesByDescription, getDealerFromSomethingSomething) also had been caught.

I was thinking about regex that can do the same as [a-zA-Z0-9_%-]{15,} but with condition that it must be at least one digit, so this way the function names will be skipped.

Thank you

1 Answer 1

1

Your heuristics is fine, use

\b(?=[a-zA-Z_%-]*[0-9])[a-zA-Z0-9_%-]{5,}

See proof.

Explanation

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [a-zA-Z_%-]*             any character of: 'a' to 'z', 'A' to
                             'Z', '_', '%', '-' (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    [0-9]                    any character of: '0' to '9'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [a-zA-Z0-9_%-]{5,}       any character of: 'a' to 'z', 'A' to 'Z',
                           '0' to '9', '_', '%', '-' (at least 5
                           times (matching the most amount possible))
Sign up to request clarification or add additional context in comments.

2 Comments

Works perfectly, I tried with lookahead but without \b. Thank you for help
@Alex Please kindly accept the answer by clicking the grey tick on the left.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.