i'm using R, and i have a df with a column that has bins that follow one of these two formats:
(x.xx,y.yy] or (x.xx,y.yy)
they are all positive integers with multiple decimals
i want to split them into
lower upper
x.xx y.yy
i first filter all all NAs out of the bin column (there are a few across multiple dfs):
filter(!is.na(bin))
i'm currently using this regex:
mutate(
lower = as.numeric(sub("^[\\(\\[]([0-9.-]+),", "\\1", bin))
upper = as.numeric(sub(",([0-9.-]+)[\\)\\]]$", "\\1", bin))
)
but it produces all NAs
I haven't tried many alternatives, any help would be appreciated and thank you in advance
here is a test data example:
> test_bins <- c("[0.15,0.273]", "(0.273,0.397]", "(0.397,0.52]", "[0.52,0.643]")
> lower_values <- sapply(test_bins, function(x) as.numeric(sub("^[\\[\\(]([0-9.]+),", "\\1", x)))
Warning messages:
1: In FUN(X[[i]], ...) : NAs introduced by coercion
2: In FUN(X[[i]], ...) : NAs introduced by coercion
3: In FUN(X[[i]], ...) : NAs introduced by coercion
4: In FUN(X[[i]], ...) : NAs introduced by coercion
> upper_values <- sapply(test_bins, function(x) as.numeric(sub(",([0-9.]+)[\\)\\]]$", "\\1", x)))
Warning messages:
1: In FUN(X[[i]], ...) : NAs introduced by coercion
2: In FUN(X[[i]], ...) : NAs introduced by coercion
3: In FUN(X[[i]], ...) : NAs introduced by coercion
4: In FUN(X[[i]], ...) : NAs introduced by coercion
> data.frame(test_bins, lower_values, upper_values)
test_bins lower_values upper_values
[0.15,0.273] [0.15,0.273] NA NA
(0.273,0.397] (0.273,0.397] NA NA
(0.397,0.52] (0.397,0.52] NA NA
[0.52,0.643] [0.52,0.643] NA NA
^[([]([0-9.-]+),.*. The second one must be.*,([0-9.-]+)[])]$. See the R demo online.read.csv(text = c('lower,upper', gsub('[][()]', '', test_bins)))