I want to parse a string using R, and I'd like to get out a list of objects. Brackets, spaces and commas in the string dictate the structure of the final list:
each pair of brackets is separated by a space and the words in each pair of brackets has to form a new object of the list;
words in brackets are separated by comma and should form different elements in each listed object;
the mentioned structure can also be found nested within a pair of brackets.
Here is an example of the string:
x <- "(K01596,K01610) (K01689) (K01834,K15633,K15634,K15635) (K00927) (K00134,K00150) (K01803) ((K01623,K01624,K11645) (K03841,K02446,K11532,K01086,K04041),K01622)"
The desired output should like this:
list(c("K01596","K01610"), "K01689", c("K01834","K15633","K15634","K15635"), "K00927", c("K00134","K00150"), "K01803", list(list(c("K01623","K01624","K11645"), c("K03841","K02446","K11532","K01086","K04041")), "K01622"))
I manage to solve how to do the parsing for case 1)
match <- gregexpr("\\((?>[^()]|(?R))*\\)", x, perl = T)
x2 <- as.list(substring(x, match[[1]], match[[1]] + attr(match[[1]], "match.length") - 1))
and case 2) is also easy, I can just remove the brackets with gsub and split the words using strsplit. The problem is how to parse case 3), when I have a nested level like:
((K01623,K01624,K11645) (K03841,K02446,K11532,K01086,K04041),K01622)
and I have to get out a listed object that is a list itself:
list(list(c("K01623","K01624","K11645"), c("K03841","K02446","K11532","K01086","K04041")), "K01622")