0

I have a bunch of urls that share the following pattern:

http://www.ebay.com/itm/Crosman-Pumpmaster-760-Pump-177-Pellet-4-5-mm-BB-Air-Rifle-Black-760B-/251635693266?pt=LH_DefaultDomain_0&hash=item3a96a7f6d2

I want to extract item3a96a7f6d2. The http://www.ebay.com/itm/ and &hash= are fixed patterns while the string in between can change. I wrote:

                String prodPatternString = "(http://www.ebay.com/itm/)(.*?)(hash=)(.*?)";
                Pattern prodPattern = Pattern.compile(prodPatternString);
                Matcher prodMatcher = prodPattern.matcher(prodUrl);
                while(prodMatcher.find()){
                    String pid = matcher.group(4);
                }

But it gives me an error saying "No match found". Any help will be greatly appreciated. Thanks.

3 Answers 3

1

You need to change matcher.group(4); line to prodMatcher.group(4); and then remove the ? present inside the last capturing group because .*? will do a non-greedy match of zero or more characters, so it would match also an empty string even though characters present since it's in non-greedy form.

String prodUrl = "http://www.ebay.com/itm/Crosman-Pumpmaster-760-Pump-177-Pellet-4-5-mm-BB-Air-Rifle-Black-760B-/251635693266?pt=LH_DefaultDomain_0&hash=item3a96a7f6d2";
String prodPatternString = "(http://www.ebay.com/itm/)(.*?)(hash=)(.*)";
Pattern prodPattern = Pattern.compile(prodPatternString);
Matcher prodMatcher = prodPattern.matcher(prodUrl);
while(prodMatcher.find()){
        String pid = prodMatcher.group(4);
        System.out.println(pid);
}

Output:

item3a96a7f6d2
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot! But I still don't understand why the ? needs to be removed in the last capturing group while it is needed in the second capturing group? I think these two patterns are the same. Thanks
no both are different. It's based on the following pattern. .*? in this regex http://www.ebay.com/itm/.*?hash= matches all the characters which are next to http://www.ebay.com/itm/, upto hash= . But .*? in hash=.*? matches an empty character because there isn't a pattern following the .*? and .*? will do a shortest possible match. So here the shortest possible match is an empty string since * repeats the previous char zero or more times.
you could check what i said with (hash=)(.*?)$
0

You should check out the lastindexof method. Then you can substring the url starting at the last index of '&hash=' and ending at the full length of the string. This will get the item=x

Comments

0

You can use this regex:

(http://www.ebay.com/itm/)(.*?)(hash=)([^&]*)

RegEx Demo

.*? is matching too little in the 4th capturing group in your regex.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.