0

I have a corpus of data which is full of instances of the form:

'be in'('force', 'the closed area').
'advise'('coxswains', 'mr mak').
'be'('a good', 'restricted area').
'establish from'('person \'s id', 'the other').

I want to read in this data from a .txt file and populate a 2D array with only the information inside the single quotes, i.e.

be in          [0][0], force         [0][1], the closed area [0][2]
advise         [1][0], coxswains     [1][1], mr mak          [1][2]
be             [2][0], a good        [2][1], restricted area [2][2]
establish from [3][0], person \'s id [3][1], the other       [3][2]

^Those array indexes are in there just as a conceptual reference, as I said above, just the info in the single quotes is desirable, e.g. index [0][0] would be be in and index [3][1] would be person \'s id

But as with the example index [3][1] we might have single quotes that are preceded by a backslash, which should not be interpreted as delimiters.

This is what I have thus far:

BufferedReader br_0 = new BufferedReader(new FileReader("/home/matthias/Workbench/SUTD/2_January/Prolog/horn_data_test.pl"));
    String line_0;
    while ((line_0 = br_0.readLine()) != null) 
    {

        String[] items = line_0.split("'");
        String[][] dataArray = new String [3][262978];
        int i;
        for (String item : items) 
        {
            for (i = 0; i<items.length; i++)
            {
                if (i == 0) 
                {
                    System.out.println("first arg: " + items[i]);
                } 
                if (i == 1) 
                {
                    System.out.println("first arg: " + items[i]);
                }
                if (i == 2)
                {
                    System.out.println("second arg: " + items[i]);
                }
            }
        }           
    }
    br_0.close();

I know I need something like:

if (the character under consideration == ' && the one before it is not \)
put it into first index, etc. etc. 

But how to make it stop before the next delimiter character? What's the best way to populate that array? The input file is quite large so I'm trying to optimize for efficiency.

2 Answers 2

1

You can use regex with Pattern and Matcher like this :

public static void main(String[] args) throws IOException {

    String[] stringArr = { "'be in'('force', 'the closed area').",
            "'advise'('coxswains', 'mr mak').",
            "'be'('a good', 'restricted area').",
            "'establish from'('person \'s id', 'the other')." };
    int i = 0;
    Pattern p = Pattern.compile("'(.*?)'(?![a-zA-Z])");
    String[][] arr = new String[4][3];
    for (int count = 0; count < stringArr.length; count++) {
        Matcher m = p.matcher(stringArr[count]);
        int j = 0;
        while (m.find()) {

            arr[i][j++] = m.group(1);
        }
        i++;

    }

    for (int k = 0; k < arr.length; k++) {
        for (int j = 0; j < arr[k].length; j++) {
            System.out.println("arr[" + k + "][" + j + "] " + arr[k][j]);
        }
    }

}

O/P :

arr[0][0] be in
arr[0][1] force
arr[0][2] the closed area
arr[1][0] advise
arr[1][1] coxswains
arr[1][2] mr mak
arr[2][0] be
arr[2][1] a good
arr[2][2] restricted area
arr[3][0] establish from
arr[3][1] person 's id
arr[3][2] the other
Sign up to request clarification or add additional context in comments.

5 Comments

that works flawlessly for grabbing the data. do you know how to put the info to an array using java? one thing i'm not sure about is, the size of the input file is variable.
Wouldn't it create problem for string: 'person \'- id'? I thought requirement is if (the character under consideration == ' && the one before it is not ) This doesn't test one before, it tests one-next
@anubhava, yeah but actually that's also ok, I suppose the question was sort of ill formed
@anubhava -Yes. It will be a problem if the input is like that. I just looked at what the OP had posted and tried to answer it
@TheLostMind but if it's a dynamic array the .length() won't work. How else can I iterate through?
0

You can use this regex for matching single quoted string with support of escaped quote:

'(.*?)(?<!\\)'

Use matcher.group(1) for the string inside the quote.

RegEx Demo

5 Comments

\\) will escape the following ). So we will get an error - Unclosed group near index 13 :) . We will have to use 4 back slashes
@TheLostMind: What I showed here is pure regex NOT Java compatible string representation. Which would be: "'(.*?)(?<!\\\\)'"
I know. Since the question is tagged under java, I thought it would be better if the OP is told that he / she will need 4 backslashes. Otherwise, you might have a comment here saying - Not working :)
isn't regex 101 for javascript? i tried for a while to use it with output i wanted to manipulate using sed and finally i didn't get it
@user3787253: No regex101 supports PCRE and Python as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.