1

Can someone help me out with a regex to match a string which starts with the following eg: The string can begin with any html tag eg: < span > or < p > etc so basically I want a regex to check if a string begins with any opening html tag <> and then followed by [apple videoID=

Eg:

<span>[apple videoID= 

Here's what I've tried :

static String pattern =  "^<[^>]+>[apple videoID=";
static Pattern pattern1 = Pattern.compile(pattern);

What is wrong in the above?

8
  • 5
    Obligatory link Commented Apr 11, 2014 at 16:30
  • You have to implement a grammar of some sort to detect matching characters in a string. Regular expressions are called regular for a reason, it means they can only parse languages that can be described by finite state machines (automata) and matching characters would require infinite states. Commented Apr 11, 2014 at 16:34
  • what do you mean? It's a pattern that I'm compiling. Pattern.compile() Commented Apr 11, 2014 at 16:35
  • 1
    It seems to work for me, but it can be cleaner: ^<[^>]+><apple videoID= Commented Apr 11, 2014 at 16:37
  • 1
    [A-za-z] This range seems to be wrong. It should be [A-Za-z] Commented Apr 11, 2014 at 16:46

4 Answers 4

2

You have a typo in the following line.

static String pattern = "^<[^>]+>[apple videoID=";

This string is not a valid regular expression because you have an unclosed [ right before the word apple, hence the "Unclosed character class" PatternSyntaxException. You either meant to type

static String pattern = "^<[^>]+><apple videoID=";

assuming that apple is an html tag, or

static String pattern = "^<[^>]+>\\[apple videoID=";

if you really did want the [ in front of apple. This is because [ is a special character in regular expressions and must be escaped with a \ which is a special character in Java strings and must be escaped with a \. Therefore \\[.

Sign up to request clarification or add additional context in comments.

Comments

0

simple as this:

<[.]+><apple videoID=[.]*

2 Comments

I think it should be <.*><apple videoID= as tag can contain anything, not just chars
@Sam I've updated the post. Please check. That's not working for me. It says unclosed characeter class
0

Try this pattern :

"^<[A-Za-z]+>\\[apple videoID=$"

This pattern will match [apple videoID=

Hope this will help you..!

1 Comment

not a chance. the ^ and $ means this must be a complete line.
0

Here is the solution

Pattern.CASE_INSENSITIVE helps to fetch the pattern either in upper case or lower case.

Tested and Executed.

    package sireesh.yarlagadda;

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

    public class Pattern {

        public static void main(String[] args) {
            String text="<span><apple videoID=";

            String patternString = "<[a-zA-Z]*>\\<apple videoID=";

            Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
            Matcher matcher = pattern.matcher(text);

            System.out.println("lookingAt = " + matcher.lookingAt());
            System.out.println("matches   = " + matcher.matches());
        }

    }

5 Comments

Oh I guess you got me wrong. I don't want to consider any html between the apple tag. I want to match the string which begins with 1 html tag. It could be span or p etc .. . 1 html tag followed by "<apple videoID=" thats what I want
All the second asterisk does (>*) is match 0+ >s. As you can see, that will cause unintended consequences.
Changed the solution to fit the requirement. Removed asterisk. Case sensitive is added. @Sam
what's the point of [a-zA-Z] if you are using CASE_INSENSITIVE ?
You are right. It even works with [a-z] . Just to play safe. @njzk2

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.