0

I'm trying to make a regex to capture a string, but I don't know why the output puts the last character before the separator in another group

This is the regex I'm using:

(\w|\d|\s)*

This is the string I'm using for test:

Eleccion Nacional 2017

So in one group I get: Eleccion Nacional 2017

And in another I get: 7

Could anyone please explain to me why this is happening?

2
  • You are alternating for a word (\w) or a digit (\d) or a whitespace (\s) Commented May 20, 2019 at 16:30
  • You are repeating the capturing group giving the value of the last iteration in the capturing group. You could also use \w+(?:\s+\w+)* to prevent matching leading and trailing spaces as \w also matches \d Commented May 20, 2019 at 16:41

2 Answers 2

1

Welcome!

Here we might want to simply add a list of chars with a capturing group:

([A-Za-z0-9\s]+)

Of-course, we can add more boundaries to it, if necessary, such as:

([A-Za-z\s]+[0-9]{4})

Demo

Or we can try your original expression:

([\w\d\s]+)

Demo

RegEx Circuit

jex.im also helps to visualize the expressions.

We can check in the visualizer how your original expression works:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

The first result is the whole match: the pattern (\w|\d|\s)* matches all of the input Eleccion Nacional 2017. This first result isn't the result of a capture group. It is the whole character sequence matched by the entire pattern.

The second result is the value of the capture group, which contains only the final match of the parenthesized group which has the * applied.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.