I have data in a table in which one cell in every row is a multiline string, which is formatted a a bit like a document with references at the end of it. For example, one of those strings looks like:
item A...1
item B...2
item C...3
item D...2
1=foo
2=bar
3=baz
My eventual goal is to extract foo/bar/baz into columns and count the matching items. So for the above, I'd end up with a row including:
foo | bar | baz
----+-----+----
1 | 2 | 1
I tried to start by extracting the "reference" mappings, as a nested data.table looking like this:
code | reason
-----+-------
1 | foo
2 | bar
3 | baz
Here's how I tried to do it, using data.table and stringr.
encounter_alerts[, whys := lapply(
str_extract_all(text, regex('^[0-9].*$', multiline = TRUE)),
FUN = function (s) { fread(text = s, sep = '=', header = FALSE, col.names = c('code', 'reason')) }
)]
I am very confused by the error message I get when I try to do this:
Error in fread(text = s, sep = "=", header = FALSE, col.names = c("code", :
file not found: 1=foo
I am explicitly using text rather than file so I'm not sure how it's trying to interpret the line of text as a filename!
When I test this with a single row, it seems to work fine:
> fread(text = str_extract_all(encounter_alerts[989]$text, regex('^[0-9].*$', multiline = TRUE))[[1]], sep = '=', header = FALSE, col.names = c('code', 'reason'))
code reason
1: 1 foo
2: 2 bar
What am I doing wrong? Is there a better way to do this?
Thanks!