If I understand you correctly you
- want to separate your text on keywords
FIRST NOW THEN and preserve them in resulting parts
- but don't want to split on those keywords if they appear inside quotes.
If my guess is correct instead of split method, you can use find to iterate over all
- quotes
- words which are not inside quotes,
- whitespaces.
This would let you add all quotes and whitespaces to result and focus only on checking words which are not inside quotation to see if you should split on them or not.
Regex representing such parts can look like Pattern.compile("\"[^\"]*\"|\\S+|\\s+");
IMPORTANT: we need to search for ".." first, otherwise \\S+ would also match "NOW CLICK" as "NOW and CLICK" as two separate parts which will prevent it to be seen as single quotation. This is why we want to place "[^"]*" regex (which represents quotations) at start of subregex1|subregex2|subregex3 series.
This regex will allow us to iterate over text
FIRST i go to the homepage NOW i click on button "NOW CLICK" very quick THEN i will become a text result.
as tokens
FIRST
i
go
to
the
homepage
NOW
i
click
on
button
"NOW CLICK"
very
quick
THEN
i
will
become
a
text
result.
THEN
i
will
become
a
text
result.
Notice that "NOW CLICK" will be treated as single token. Because of that even if it will contain inside keyword on which you want to split, it will never be equal to such keyword (because it will contain other characters like ", or simply other words in quote). This will prevent it from being treated as delimiter on which text should be split.
Using this idea we can create code like:
String text = "FIRST i go to the homepage NOW i click on button \"NOW CLICK\" very quick THEN i will become a text result.";
List<String> keywordsToSplitOn = List.of("FIRST", "NOW", "THEN");
//lets search for quotes ".." | words | whitespaces
Pattern p = Pattern.compile("\"[^\"]*\"|\\S+|\\s+");
Matcher m = p.matcher(text);
StringBuilder sb = new StringBuilder();
List<String> result = new ArrayList<>();
while(m.find()){
String token = m.group();
if (keywordsToSplitOn.contains(token) && sb.length() != 0){
result.add(sb.toString());
sb.delete(0, sb.length());//clear sb
}
sb.append(token);
}
if (sb.length() != 0){//include rest of text after last keyword
result.add(sb.toString());
}
result.forEach(System.out::println);
Output:
FIRST i go to the homepage
NOW i click on button "NOW CLICK" very quick
THEN i will become a text result.
String.split()removes the matched text that is the split delimiter, so you'll never getFIRST i go to the homepage— you would geti go to the homepage