Splitting rows with uneven string length into columns in R using tidyr [duplicate]

Question

Edit: This was marked as a duplicate. It is not. The question here is not only about splitting a single column into multiple ones, as my separate code would had worked. The main point of my question is splitting the column when the row string possess varying lengths of column output.

I'm trying to turn this:

data <- c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
          "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
          "Place1-Place1-Place1-Place1-Place3-Place5",
          "Place1-Place4-Place2-Place3-Place3-Place5-Place5",
          "Place6-Place6",
          "Place1-Place2-Place3-Place4")

Into this:

      X1     X2     X3     X4     X5     X6     X7     X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5 
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5 
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5 
5 Place6 Place6 
6 Place1 Place2 Place3 Place4

I tried to use tidyr's seperate function using this code:

library(data.table)
data <- as.data.table(data)
data_table <- tidyr::separate(data,
                            data,
                            sep="-",
                            into = strsplit(data$data, "-"),
                            fill = "right")

Sadly I'm getting this error:

Warning message:
Too many values at 3 locations: 1, 2, 4

What do I need to change to make it work?

What exactly do you mean by uneven string lengths? If you want to select the stuff inbetween dashes then try: [^-]+ as your regex — elldur
– elldur, Commented Mar 3, 2016 at 12:32
@Someone Yes I was referring to the output columns. I tried your suggestion and the warning became "Warning message: Too many values at 1 locations: 2" — JnrfL
– JnrfL, Commented Mar 3, 2016 at 12:38

Jaap · Accepted Answer · 2016-03-03 15:57:12Z

9

You specify the target columns correctly:

library(tidyr)
separate(DF, V1, paste0("X",1:8), sep="-")

which gives:

      X1     X2     X3     X4     X5     X6     X7     X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5   <NA>
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5   <NA>   <NA>
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5   <NA>
5 Place6 Place6   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>
6 Place1 Place2 Place3 Place4   <NA>   <NA>   <NA>   <NA>

If you don't know how many target columns you need beforehand, you can use:

> max(sapply(strsplit(as.character(DF$V1),'-'),length))
[1] 8

to extract the maximum number of parts (which is thus the number of columns you need).

Several other methods:

splitstackshape :

library(splitstackshape)
cSplit(DF, "V1", sep="-", direction = "wide")

stringi :

library(stringi)
as.data.frame(stri_list2matrix(stri_split_fixed(DF$V1, "-"), byrow = TRUE))

data.table :

library(data.table)
setDT(DF)[, paste0("v", 1:8) := tstrsplit(V1, "-")][, V1 := NULL][]

stringr :

library(stringr)
as.data.frame(str_split_fixed(DF$V1, "-",8))

which all give a similar result.

Used data:

DF <- data.frame(V1=c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
                      "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
                      "Place1-Place1-Place1-Place1-Place3-Place5",
                      "Place1-Place4-Place2-Place3-Place3-Place5-Place5",
                      "Place6-Place6",
                      "Place1-Place2-Place3-Place4"))

edited Mar 3, 2016 at 15:57

answered Mar 3, 2016 at 12:36

Jaap

83.7k36 gold badges190 silver badges203 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JnrfL Over a year ago

Thanks for the answer! This works fine, it would had been nice if the solution was with tidyr.

Jaap Over a year ago

@JnrfL see the updated answer, HTH

Collectives™ on Stack Overflow

Splitting rows with uneven string length into columns in R using tidyr [duplicate]

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related