I have the following two strings:
x <- "chr1:625000-635000.BB_162.Adipose"
y <- "chr1:625000-635000.BB_162.combined.HMSC-ad"
With this regex I have no problem capturing parts of x
> stringr::str_match(x,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose"
What I want to do is with y to obtain this
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "HMSC-ad"
With my current regex and apply for y I get this instead:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.combined" "chr1" "625000" "635000" "BB_162" "combined"
How can I generalize my regex so that it can deal with both x and y?
Update
S.Kalbar, your regex gave this:
> stringr::str_match(y,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)(?:\\.([A-Za-z-]+))?")
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "combined" "HMSC-ad"
> stringr::str_match(x,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)(?:\\.([A-Za-z-]+))?")
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose" NA
What' I'd like to get is this for y:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "HMSC-ad"
And this for x:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose"