I want to extract a string between two other strings. One string is a carriage return, whereas the other is a variation of almost similar characters:
dput(head(decisions$Title))
c("Zinaida Shumilina et al. v. Belarus \r\n
CCPR/C/120/D/2142/2012",
"K.E.R. vs. Canada \r\n
CCPR/C/120/D/2196/2012",
"Lounis Khelifati v Algeria \r\n
CCPR/C/120/D/2267/2013",
"Hibaq Said Hash v. Denmark \r\n
CCPR/C/120/D/2470/2014",
"Anton Batanov v. Russian Federation \r\n
CCPR/C/120/D/2532/2015",
"S. Z. v. Denmark \r\n
CCPR/C/120/D/2625/2015"
)
I essentially want to extract the country names between "v." and the carriage return \r. However, "v." is sometimes "v", "vs.", "vs" and "v:".
Based on the answer from a related SO question, I tried the following:
res <- str_match(decisions$Title, "(v\\.|vs\\.|v)(.*?)\\r")
res[,3]
Unfortunately, this doesn't get all variations, or in some cases it returns data such as "ruz Tahirovich Nasyrlayev v. Turkmenistan" when trying to extract the country name from "Navruz Tahirovich Nasyrlayev v. Turkmenistan CCPR/C/117/D/2219/2012".
Is there another way to achieve this?