0

I need to extract an last number from an URL followed by a dash.

Example:

http://www.example.com/p-test-test1-a-12345.html

i need to extract the 12345 using regex.

i tried this -\d(.*?).html which gives me 2345 not sure why it removes 1 any idea?

2
  • 1
    What do you think \d would match? Commented Dec 4, 2013 at 8:57
  • Which technique are you using : grouping or Pattern and Matcher. Commented Dec 4, 2013 at 9:31

4 Answers 4

4

It removes the first digit as you have invalid pattern it captures everything after -digit

-\d(.*?).html

-\d - matches a dash followed by a digit

(.*?) - captures any character (except new line) 0 or more times till next token is satisifed

. - matches any character (except new line)

html - matches html


Try this pattern:

PATTERN

(?<=-)\d+(?=\.html)
Sign up to request clarification or add additional context in comments.

Comments

2

You must add \d to group: -(\d.*?).html

if it must be only digits then -(\d+)\.html is better.

1 Comment

it will match any numer followed by !html as well
1

Try This :

String pattern2 = ".*?(\\d+)\\.html";
System.out.println(s.replaceAll(pattern2, "$1"));

Comments

1

You're looking for a dash, then a digit, then capturing all characters before ".html", which is why the 1 was not captured.

Try this instead:

-(\d+)\.html

2 Comments

There're no needs to make the + ungreedy.
@M42 You're quite right. I suppose that's what happens when I edit existing code without thinking it through properly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.