0

I am trying to fetch the content between the tags. So i made regex for the same.

    final String REGEX_BOLD_END = ".*[<][/][B|b][>].*";
    String input = "<B>Contetnt here</B>";
    Pattern pattern_start = Pattern.compile(".*[<][B|b][>].*");
    Matcher matcher_start = pattern_start.matcher(input);
    Pattern pattern_end = Pattern.compile(REGEX_BOLD_END);
    Matcher matcher_end = pattern_end.matcher(input);
    System.out.println("Tag open");
    if (matcher_start.matches()) {
        System.out.println("At:" + matcher_start.start() + "\tTo:" + matcher_start.end());
        System.out.println(matcher_start.group(0));
    } else {
        System.out.println("Not matched");
    }
    System.out.println("Tag Close");
    if (matcher_end.matches()) {
        System.out.print("At:" + matcher_end.start() + "\tTo:" + matcher_end.end());
    } else {
        System.out.println("Not matched");
    }

My aim is to get the Content here. So i was thinking to get the start and end index and then fetch the substring out of the original input. But i am getting something what i was not expecting.

output:

Tag open

At:0    To:20
<B>Contetnt here</B>
Tag Close
At:0    To:20

Please point out where i am making mistake.

5
  • 1
    You seem to be parsing HTML. In this case, why not use an HTML parser instead? Commented Feb 21, 2014 at 12:29
  • Actually, there are more custom tags, this is just an example tag. Commented Feb 21, 2014 at 12:34
  • Which is all the more a reason to use an HTML parser... Commented Feb 21, 2014 at 12:34
  • I can't use HTML parser, that is why i switched to Pattern matching. Commented Feb 21, 2014 at 12:39
  • By pure curiosity: why can't you use an HTML parser? Commented Feb 21, 2014 at 12:40

1 Answer 1

2

If you're thinking of using substring in relation to Regex'es, you're doing it wrong. The whole point of regular expressions is to not bother with indexes or substring.

Try this instead:

Pattern p = Pattern.compile("<[b|B]>(.*)</[b|B]>");
Matcher m = p.matcher(textToMatch);
if (m.find()) {
    String firstMatch = m.group(1);
}

Edit: Complete, compiling command line program, which outputs "Yay!" when input is "<b>yay!</b>" as per requirement.

import java.util.regex.*;
class Test {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("<[b|B]>(.*)</[b|B]>");
        Matcher m = p.matcher(args[0]);
        if (m.find()) {
            System.out.println(m.group(1));
        }
        else System.out.println("No match");
    }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.