I have a tsv that has some fields with some escaped double quotation marks (DQM). Those are fine, but the problem comes when the DQM delimiter comes after one of those and then when the tsv is opened it doesn't recognize the last DQM and so the following field gets included with the previous.
For example, the line:
"9812" "tt0167609" "tvSeries" "L'homme du \"Picardie\"" "L'homme du \"Picardie\"" 0 "1968" "\N" "13" "Drama"
when opened in a SS, places the first 3 fields fine. But the fourth field it includes all the way up to 1968 when what it should do is only put in the first "L'homme du \"Picardie\", the second of the same in the next field, and so on. And so the problem appears to be that it's not recognizing the " after the \". I tried clicking different options when opening it in SS programs, but nothing fixes it.
Now I've found out that I can fix this before opening it in a SS program by replacing \"" with \""" in a text editor, but I'd like to be able to do it in R when the file is generated.
I've tried several combinations of strings, but I just can't figure it out and I'm hoping someone can point me in the right direction. The following are just some of what I tried.
tv.Subset <- str_replace(tv.Subset, "\\\"\"", "\\\"\"\"") - one of my first attempts, simply escape each character in the string
tv.Subset <- str_replace(tv.Subset, '\\\"\"', '\\\"\"\"') - I wondered if single quotation marks for delimiters might be the trick
tv.Subset <- str_replace(tv.Subset, "\\\\\\"\\"", "\\\\\\"\\"\\"") - I read that you need to do double backslashes to respect both R and regex
Thanks.
cat(str_replace("\"\"", r"("")", r"(""")"), "\n")read.table("file.tsv")if there are really TABs between the fields. It escapes the backslash and reads all fields correctly.read.delim(filename, ...)and playing with the argumentquote= "\""should fix your problem