Suppose I have the following strings:
string <- c(
"DATE_OF_BIRTH_B1",
"HEIGHT_BABY2",
"WEIGHT_BABY_3",
"OTHER_CONDITION_4",
"OTHER_OPERATION_5"
)
How can I use regex in gsub() to extract:
- Everything except the trailing underscore up until the number suffixes in the first three strings;
- Nothing from the last two strings.
In other words, my expected gsub() output is:
"DATE_OF_BIRTH_B", "HEIGHT_BABY", "WEIGHT_BABY"
I managed to use gsub("(.+_B[A-Z]*)_?[0-9]", "\\1", string) to extract the desired substrings from the first three strings, but it failed to excluded the last two strings.
Could anyone help to correct and improve my regex, with a bit of explanation? Many thanks!
.+:sub("(.+_B[A-Z]*)_?[0-9]|.+", "\\1", string)