6

I want to extract a certain like of string using Regex in Java. I currently have this pattern:

pattern = "^\\a.+\\sed$\n";

Supposed to match on a string that starts with "a" and ends with "sed". This is not working. Did I miss something ?

Removed the \n line at the end of the pattern and replaced it with a "$": Still doesn't get a match. The regex looks legit from my side.

What I want to extract is the "a sed" from the temp string.

String temp = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
                pattern = "(?s)^a.*sed$";
                       pr = Pattern.compile(pattern);

                math = pr.matcher(temp);
1
  • is "a sed" exactly what you looking for? Commented Dec 16, 2015 at 11:33

2 Answers 2

3

UPDATE

You want to match a sed, so you can use a\\s+sed if there is only whitespace between a and sed:

String s = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
Pattern pattern = Pattern.compile("a\\s+sed");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    System.out.println(matcher.group(0)); 
} 

See IDEONE demo

Now, if there can be anything between a and sed, use a tempered greedy token:

Pattern pattern = Pattern.compile("(?s)a(?:(?!a|sed).)*sed");
                                         ^^^^^^^^^^^^^  

See another IDEONE demo.

ORIGINAL ANSWER

The main problem with your regex is the \n at the end. $ is the end of string, and you try to match one more character after a string end, which is impossible. Also, \\s matches a whitespace symbol, but you need a literal s.

You need to remove \\s and \n and make . match a newline, and also it is advisbale to use * quantifier to allow 0 symbols in-between:

pattern = "(?s)^a.*sed$";

See the regex demo

The regex matches:

  • ^ - start of string
  • a - a literal a
  • .* - 0 or more any characters (since (?s) modifier makes a . match any character including a newline)
  • sed - a literal letter sequence sed
  • $ - end of string
Sign up to request clarification or add additional context in comments.

3 Comments

Please check my update. I think one of the solutions should work for you.
Will this work if there is a tab or only spaces? Thank you! Doesn't a general regex to match "beginning of character" and "end of character" exist?
No idea what solution you refer to. \s matches any whitespace including tabs, spaces, newlines. Beginning and end of character? No, nothing like that exists, but I suspect you are talking about \\b, a word boundary: \\bsed\\b will match sed in a sed tool, but won't match it in seduce. As for the tempered greedy token solution, . will match any character (including newline symbols) since I have used (?s) DOTALL modifier.
1

Your temp string cannot match the pattern (?s)^a.*sed$, because this pattern says that your temp string must begin with the character a and end with the sequence sed, which is not the case. Your string has trailing characters after the "sed" sequence. If you only want to extract that a...sed portion of the whole string, try using the unanchored pattern "a.*sed" and use the find() method of the Matcher class:

Pattern pattern = Pattern.compile("a.*sed");
Matcher m = pattern.matcher(temp);
if (m.find())
{
    System.out.println("Found string "+m.group());
    System.out.println("From "+m.start()+" to "+m.end());
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.