How to find char pattern offset in a String

Question

I have text file which has text with newline char like this. I read that text file into a String

random Text
State v. USA
some more text
USA v.
NY
Some more text
USA
v.LA ,  MN v. ND
USA vs. MN

I want to know offset (i.e. starting and ending char index) of patterns like [Some word starting with cap] v. [Some word starting with cap]

Or [Some word starting with cap] vs. [Some word starting with cap]

For above example "State v. USA" => Start=11 and End=22

"USA v. NY" => Start=36 and End=45

I started with something like this http://rubular.com/r/T7Ii2WDADw which is not covering all cases .

So, the program could return a Map where key is Start+","+End and value is actual text like "State v. USA"

John Eipe · Accepted Answer · 2012-09-04 17:40:26Z

2

To cover both the cases you need to use this regex.

\w+\s((v.)|(vs.))\s\w+

In java code.

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Testapp {

public static void main(String[] args) {
String text = "USA v. Russia \n Some other text \n India vs. Aus";
String regex="\\w+\\s((v.)|(vs.))\\s\\w+";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(text);

while (matcher.find()) {
    System.out.println(matcher.group()+ ":" +"start =" + matcher.start() + " end = " + matcher.end());
}
}
}

Output:

Starting & ending index ofUSA v. Russia:start=0 end = 13
Starting & ending index ofIndia vs. Aus:start=34 end = 47

edited Sep 4, 2012 at 17:40

answered Sep 4, 2012 at 17:24

John Eipe

11.4k24 gold badges77 silver badges121 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jtahlborn · Accepted Answer · 2012-09-04 17:33:21Z

2

This would be a working regex: \w+\s+vs?[.]\s+\w+

Then, using Matcher.find(), you could get the beginning and end of each match using Matcher.start(0) and Matcher.end(0).

edited Sep 4, 2012 at 17:33

answered Sep 4, 2012 at 17:28

jtahlborn

53.8k5 gold badges80 silver badges122 bronze badges

4 Comments

Watt Over a year ago

Thanks! But, I just tested, and it doesn't cover the cases when there is newline. Please see here rubular.com/r/6xA0SBCLy0

jtahlborn Over a year ago

you didn't indicate you wanted any/multiple whitespace. updated.

jtahlborn Over a year ago

you example also includes "v.State". if you intend to match that as well, change the '\s+' to '\s*'.

Watt Over a year ago

I thought my example illustrated any/multiple white spaces.Thanks for RegExp and java code, this is what I needed.

AlexR · Accepted Answer · 2012-09-04 17:05:51Z

1

Method String.indexOf(String) does exactly what you need.

answered Sep 4, 2012 at 17:05

AlexR

116k16 gold badges137 silver badges216 bronze badges

5 Comments

Watt Over a year ago

I might have oversimplified the question to make you think indexOf() will work. I dont know actual finding string beforehead, please see in question, I am working on RegExp. I needed a solution using RegExp Find() or Matcher(). If you can, please elaborate how to find above mentioned pattern "USA v. State" offset using String.indexOf(String). Thanks!

Baz Over a year ago

@S.Singh int start = string.indexOf("USA v. State") will give you the start int end = start + "USA v. State".length() will give you the end.

Watt Over a year ago

I don't know if it is "USA v. State" or something else. It could be Iraq v. USA or anything. Only thing I know it will contain "v." or "vs." Also, I need offset for ALL the occurrences, not just the first one. That is why I have mentioned about Map as return. Let me know if it is not clear.

Baz Over a year ago

@S.Singh Well, then you should have said so in your question ;)

Watt Over a year ago

@Baz My bad, I thought that would be obvious to RegExp experts when they see the rubular link :)

Collectives™ on Stack Overflow

How to find char pattern offset in a String

3 Answers 3

Comments

4 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related