1

I have a dataframe in long table format like this one below:

mydf <- data.frame(id = c(123, 123, 123, 123, 123),
                    name =c("test_2001", "test_2002", "test_2003", "test_2004", "test_2005"),
                    value = c(15, 20, 25, 30, 35))
mydf
#>    id      name value
#> 1 123 test_2001    15
#> 2 123 test_2002    20
#> 3 123 test_2003    25
#> 4 123 test_2004    30
#> 5 123 test_2005    35

Now, I want to split attributes from column name in such a way that I could remove those numeric values and store them in another column named year, output like this:

desired <- data.frame(id = c(123, 123, 123, 123, 123),
                      name =c("test", "test", "test", "test", "test"),
                      year =c(2001:2005),
                      value = c(15, 20, 25, 30, 35))
desired
#>    id name year value
#> 1 123 test 2001    15
#> 2 123 test 2002    20
#> 3 123 test 2003    25
#> 4 123 test 2004    30
#> 5 123 test 2005    35

How can I automate this step considering I have to apply this change to a lot of data?

1 Answer 1

2

base R

cbind(mydf, strcapture("(.*)_(.*)", mydf$name, list(name="", year=1L)))
#    id      name value name year
# 1 123 test_2001    15 test 2001
# 2 123 test_2002    20 test 2002
# 3 123 test_2003    25 test 2003
# 4 123 test_2004    30 test 2004
# 5 123 test_2005    35 test 2005

dplyr/tidyr

library(dplyr)
library(tidyr)
mydf %>%
  separate(name, into = c("name", "year"), sep = "_")
#    id name year value
# 1 123 test 2001    15
# 2 123 test 2002    20
# 3 123 test 2003    25
# 4 123 test 2004    30
# 5 123 test 2005    35

(Note that with this, year is a string, can be converted to ints with as.integer.)

data.table

library(data.table)
as.data.table(mydf)[, c("name", "year") := tstrsplit(name, "_")][]
#       id   name value   year
#    <num> <char> <num> <char>
# 1:   123   test    15   2001
# 2:   123   test    20   2002
# 3:   123   test    25   2003
# 4:   123   test    30   2004
# 5:   123   test    35   2005

(Again, year here is still a string.)

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you @r2evans
For this specific case, the default sep will do the trick, so separate(name, into = c("name", "year")) is enough
It works great for cases like this test_2001 but I just find out that if the name is something like this test_area_sqkm_2001, then it doesn't work. Any idea how to solve this case?
There are at least a few ways that that string could be interpreted, you need to better define what you expect.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.