2

Given the following text, I'm trying to parse out the string "TestFile" after Address::

File: TestFile
Branch


        OFFICE INFORMATION
            Address: TestFile
            City: L.A.
            District.: 43
            State: California
            Zip Code: 90210

        DISTRICT INFORMATION
            Address: TestFile2
            ....

I understand that lookbehinds require zero-width so quantifiers are not allowed, meaning this won't work:

(?<=OFFICE INFORMATION\n\s*Address:).*(?=\n)

I could use this

(?<=OFFICE INFORMATION\n            Address:).* 

but it depends on consistent spacing, which isn't dynamic and thus not ideal.

How do I reliably parse out "TestFile" and not "TestFile2" as shown in my example above. Note that Address appears twice but I only need the first value.

Thank you

4
  • 1
    Why not use String.split(":") to create an array of everything whose elements are separated by :. You can then just iterate through that array and get whatever information you need. Commented Nov 30, 2015 at 3:27
  • I agree, @JaskaranbirSingh - @Casey, why don't you just split() on newlines, trim() whitespaces, and split(":")? Commented Nov 30, 2015 at 3:32
  • @JaskaranbirSingh, that sounds much better. Would the results be returned in an array such that first row, first column would be Address and first row second column would be TestFile? @anubhava, your solution seems to work but if array is more efficient I'd rather that. Can anybody provide a link to a good example of using String.split(":") Thanks Commented Nov 30, 2015 at 3:52
  • Added an example on how you could use split method. Commented Nov 30, 2015 at 4:19

3 Answers 3

1

You don't really need to use a lookbehind here. Get your matched text using captured group:

(?:\bOFFICE INFORMATION\s+Address:\s*)(\S+)

RegEx Demo

captured group #1 will have value TestFile

JS Code:

var re = /(?:\bOFFICE INFORMATION\s+Address:\s*)(\S+)/; 
var m;
var matches = []; 
if ((m = re.exec(input)) !== null) {
    if (m.index === re.lastIndex)
        re.lastIndex++;
    matches.push(m[1]);
}
console.log(matches);
Sign up to request clarification or add additional context in comments.

5 Comments

your solution seems to work but if array is more efficient I'd rather that. I'll be sure to award you the checkmark if I go with your option. Any opinion on whether array is superior? Thank you
Your solution eventually proved most useful for my situation, as I had trouble implementing the other solutions. So thank you.
it seems that your RegEx demo is using php RegEx. Are you able to provide an example using Java RegEx? Currently your expression returns "OFFICE INFORMATION Address: TestFile" Not just TestFile
My mistake, it seems RegEx101 shows 1 Match even with Javascript, but my javascript returns "OFFICE INFORMATION Address: TestFile" Not just TestFile. Any idea why?
If you notice my answer I suggested to use captured group #1 for your matched text.
1

Working with Array:

// A sample String
String questions = "File: TestFile Branch OFFICE INFORMATION Address: TestFile  City: L.A.   District.: 43       State: California     Zip Code: 90210       DISTRICT INFORMATION           Address: TestFile2";

// An array list to store split elements
ArrayList arr = new ArrayList();

// Split based on colon and spaces.
// Including spaces resolves problems for new lines etc
for(String x : questions.split(":|\\s"))
// Ignore blank elements, so we get a clean array
    if(!x.trim().isEmpty())
        arr.add(x);

This will give you an array which is:

[File, TestFile, Branch, OFFICE, INFORMATION, Address, TestFile, City, L.A., District., 43, State, California, Zip, Code, 90210, DISTRICT, INFORMATION, Address, TestFile2]

Now lets analyze... suppose you want information corresponding to Address, or element Address. This element is at position 5 in array. That means element 6 is what you want.

So you would do this:

String address = arr.get(6);

This will return you testFile.

Similarly for City, element 8 is what you want. The count starts from 0. You can ofcourse modify my matching pattern or even create a loop and get yourself even better ways to do this task. This is just a hint.

Here is one such example loop:

// Every i+1 is the property tag, and every i+2 is the property name for 
// Skip first 6 elements because they are of no real purpose to us
for(int i = 6; i<(arr.size()/2)+6; i+=2)
    System.out.println(arr.get(i));

This gives following output:

TestFile
L.A.
43
California
Code

Ofcourse this loop is unrefined, refine it a little and you will get every element correctly. Even the last element. Or better yet, use ZipCode instead of Zip Code and dont use spaces in between and you will have a perfect loop with nothing much to be done in addition).

The advantage over using direct regex: You wont have to specify the regex for every single element. Iteration is always more handy to get things done automatically.

2 Comments

I really appreciate you taking the time to explain how to use an array. The thing is, with my text file I need to determine (for instance) if "Address:" exists, then parse out its value to the right. The problem occurs with your method since there's more than one Address, and sometimes there might only be one, or three! So I can't isolate whether the currently iterated array value containing "Address" is the one I'm looking for. RegEx proves most useful, despite having to write them all.
+1 for clearly explaining the process, and you can bet this helps me on other aspects of my projects if not for this specific question. Thanks mate, truly. Take care.
0

See this

//read input from file
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File("D:/tests/sample.txt"))));
StringBuilder string = new StringBuilder();
String line = "";

while((line = reader.readLine()) != null){
    string.append(line);
    string.append("\n");
}
//now string will contain the input as
/*File: TestFile
Branch


        OFFICE INFORMATION
            Address: TestFile
            City: L.A.
            District.: 43
            State: California
            Zip Code: 90210

        DISTRICT INFORMATION
            Address: TestFile2
            ....*/
Pattern regex = Pattern.compile("(OFFICE INFORMATION.*\\r?\\n.*Address:(?<officeAddress>.*)\\r?\\n)");
Matcher regexMatcher = regex.matcher(string.toString());
while (regexMatcher.find()) {
    System.out.println(regexMatcher.group("officeAddress"));//prints TestFile
} 

You can see the named group officeAddress in the pattern which is needed to be extracted.

1 Comment

Appreciate the feedback, I learned a little bit from you, thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.