4

How can i split a column separated by multiple delimiter into separate columns in data frame

read.table(text = " Chr  Nm1 Nm2 Nm3
    chr10_100064111-100064134+Nfif   20  20 20
    chr10_100064115-100064138-Kitl   30 19 40
    chr10_100076865-100076888+Tert   60 440 18
    chr10_100079974-100079997-Itg    50 11 23                
    chr10_100466221-100466244+Tmtc3  55 24 53", header = TRUE)


              Chr              gene   Nm1 Nm2 Nm3
    chr10_100064111-100064134 Nfif   20  20 20
    chr10_100064115-100064138 Kitl   30 19 40
    chr10_100076865-100076888 Tert   60 440 18
    chr10_100079974-100079997 Itg    50 11 23 12                
    chr10_100466221-100466244 Tmtc3  55 24 53 12

i used

library(stringr)
df2 <- str_split_fixed(df1$name, "\\+", 2)

I would like to know how can we include both + and - delimiter

1
  • 2
    Use a regex character class in str_split with "[+-]" or use a pipe +|-. Also, I'm not sure you're gaining anything here from stringr that regular strsplit doesn't already do well. Commented Jun 16, 2016 at 14:57

3 Answers 3

9

If you're trying to split one column into multiple, tidyr::separate is handy:

library(tidyr)

dat %>% separate(Chr, into = paste0('Chr', 1:3), sep = '[+-]')

#              Chr1      Chr2  Chr3 Nm1 Nm2 Nm3
# 1 chr10_100064111 100064134  Nfif  20  20  20
# 2 chr10_100064115 100064138  Kitl  30  19  40
# 3 chr10_100076865 100076888  Tert  60 440  18
# 4 chr10_100079974 100079997   Itg  50  11  23
# 5 chr10_100466221 100466244 Tmtc3  55  24  53
Sign up to request clarification or add additional context in comments.

Comments

3

This should work:

str_split_fixed(a, "[-+]", 2)

3 Comments

You need to use str_split (or strsplit) if you're using regex, not str_split_fixed.
From the str_split_fixed documentation: "The default interpretation is a regular expression, as described in stringi-search-regex. Control options with regex()" Also I tested it and seems to work
Hmm, you are correct! ...though that's a very confusingly named set of functions given base R regex fixed = TRUE behavior.
2

Here is a way to do this in base R with strsplit:

# split Chr into a list
tempList <- strsplit(as.character(df$Chr), split="[+-]")

# replace Chr with desired values
df$Chr <- sapply(tempList, function(i) paste(i[[1]], i[[2]], sep="-"))

# get Gene variable
df$gene <- sapply(tempList, "[[", 3)

2 Comments

when tried df$gene <- sapply(tempList, "[[", 3) Error in FUN(X[[i]], ...) : subscript out of bounds
@beginner I just copied and pasted your example data.frame, naming it df, and then copied and pasted my suggested solution and did not receive this error.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.