I have a corpus of data which is full of instances of the form:
'be in'('force', 'the closed area').
'advise'('coxswains', 'mr mak').
'be'('a good', 'restricted area').
'establish from'('person \'s id', 'the other').
I want to read in this data from a .txt file and populate a 2D array with only the information inside the single quotes, i.e.
be in [0][0], force [0][1], the closed area [0][2]
advise [1][0], coxswains [1][1], mr mak [1][2]
be [2][0], a good [2][1], restricted area [2][2]
establish from [3][0], person \'s id [3][1], the other [3][2]
^Those array indexes are in there just as a conceptual reference, as I said above, just the info in the single quotes is desirable, e.g. index [0][0] would be be in and index [3][1] would be person \'s id
But as with the example index [3][1] we might have single quotes that are preceded by a backslash, which should not be interpreted as delimiters.
This is what I have thus far:
BufferedReader br_0 = new BufferedReader(new FileReader("/home/matthias/Workbench/SUTD/2_January/Prolog/horn_data_test.pl"));
String line_0;
while ((line_0 = br_0.readLine()) != null)
{
String[] items = line_0.split("'");
String[][] dataArray = new String [3][262978];
int i;
for (String item : items)
{
for (i = 0; i<items.length; i++)
{
if (i == 0)
{
System.out.println("first arg: " + items[i]);
}
if (i == 1)
{
System.out.println("first arg: " + items[i]);
}
if (i == 2)
{
System.out.println("second arg: " + items[i]);
}
}
}
}
br_0.close();
I know I need something like:
if (the character under consideration == ' && the one before it is not \)
put it into first index, etc. etc.
But how to make it stop before the next delimiter character? What's the best way to populate that array? The input file is quite large so I'm trying to optimize for efficiency.