4

I have this dataframe:

df <- data.frame(group=c("A", "A", "B", "B"), year=c(1980, 1986, 1990, 1992))
  group year
1     A 1980
2     A 1986
3     B 1990
4     B 1992

I'd like to modify it in the following way:

  • add rows for each existing row with the two preceding years
  • add a new column with a variable of the respective year
  • delete the existing rows

This would be the outcome:

   group  year     pre
1      A  1978 pre1980
2      A  1979 pre1980
3      A  1984 pre1986
4      A  1985 pre1986
5      B  1988 pre1990
6      B  1989 pre1990
7      B  1990 pre1992
8      B  1991 pre1992

Adding the new column would be easy..

df$pre <- paste("pre", df$year, sep="")

But I am stuck on how to add the new rows with the respective years (of course creating a whole new data frame would be just as good). Any hints?

1
  • 3
    do.call('rbind', lapply(1:nrow(df), function(x) {x <- df[x, ]; data.frame(group = x$group, year = x$year - 2:1, pre = paste0('pre', x$year))})) Commented Feb 17, 2016 at 15:07

5 Answers 5

6

base R ftw:

data.frame(group = rep(df$group, each=2),
           year = df[rep(1:nrow(df), each=2),]$year-2:1,
           pre = paste0("pre",rep(df$year,each=2)))
#   group year     pre
# 1     A 1978 pre1980
# 2     A 1979 pre1980
# 3     A 1984 pre1986
# 4     A 1985 pre1986
# 5     B 1988 pre1990
# 6     B 1989 pre1990
# 7     B 1990 pre1992
# 8     B 1991 pre1992
Sign up to request clarification or add additional context in comments.

Comments

5

Using the data.table package, here is one approach. With the given data, I decided to use year as a group variable. For each year, I calculate two previous years and created pre**** with the year. There are two year columns, so I deleted one of them in the end.

setDT(df)[, list(group = group,
                 year = c((year - 2), (year - 1)),
                 pre = paste0("pre", year, collapse = "")), by = "year"][, -1, with = FALSE][]

#   group year     pre
#1:     A 1978 pre1980
#2:     A 1979 pre1980
#3:     A 1984 pre1986
#4:     A 1985 pre1986
#5:     B 1988 pre1990
#6:     B 1989 pre1990
#7:     B 1990 pre1992
#8:     B 1991 pre1992

If you have an identical year appearing more than twice, you would do something like the following. This new data frame has 1992 appearing twice.

df <- data.frame(group=c("A", "A", "B", "B"), year=c(1980, 1986, 1992, 1992))


setDT(df)[, list(group = group,
                 year = c((year - 2), (year - 1)),
                 pre = paste0("pre", year, collapse = "")), by = rownames(df)][, -1, with = FALSE]


#   group year     pre
#1:     A 1978 pre1980
#2:     A 1979 pre1980
#3:     A 1984 pre1986
#4:     A 1985 pre1986
#5:     B 1990 pre1992
#6:     B 1991 pre1992
#7:     B 1990 pre1992
#8:     B 1991 pre1992

4 Comments

Thanks, on my real data the first option gives me a warning I don't understand: Column 2 of result for group 2 is length 3 but the longest column in this result is 4. Recycled leaving remainder of 1 items. This warning is once only for the first group with this issue., 2nd option works well I think!
@beetroot Since I do not have your data, I am afraid I cannot see what is happening on your side. If the second option works for you, that is great. You have another answer at the moment. So you have options to solve your problem. :)
Too many options ;) I hope you don't mind me accepting Pierre Lafortune's answer. I've just never really used the datatable package..
@beetroot no worries. Pleasure to help you.
4

Here is another option with Map

do.call(rbind,Map(function(x,y,z) 
   data.frame(group=x, year=y:z, pre=paste0('pre', z+1)), 
    df$group, df$year-2, df$year-1))
#  group year     pre
#1     A 1978 pre1980
#2     A 1979 pre1980
#3     A 1984 pre1986
#4     A 1985 pre1986
#5     B 1988 pre1990
#6     B 1989 pre1990
#7     B 1990 pre1992
#8     B 1991 pre1992

Or a modification with rep

`row.names<-`(transform(df[rep(1:nrow(df),each=2),],
      year = year-2:1, pre = paste0('pre', year) ), NULL)
#  group year     pre
#1     A 1978 pre1980
#2     A 1979 pre1980
#3     A 1984 pre1986
#4     A 1985 pre1986
#5     B 1988 pre1990
#6     B 1989 pre1990
#7     B 1990 pre1992
#8     B 1991 pre1992

Comments

1

If you don't mine the final order, without extra libraries you can use

gap = function(df, y) transform(df, year=year-y, pre = sprintf("pre%d", year))
rbind(gap(df,2), gap(df,1))

Comments

1

Here is a simple solution with no packages:

Your Dataframe:

df <- data.frame(group=c("A", "A", "B", "B"), year=c(1980, 1986, 1990, 1992))

group year
1     A 1980
2     A 1986
3     B 1990
4     B 1992

Subtract two years and add column pre:

df1<-cbind(group=as.character(df$group),year=df$year-2, pre=paste("pre",df$year,sep=""))

group year   pre      
[1,] "A"   "1978" "pre1980"
[2,] "A"   "1984" "pre1986"
[3,] "B"   "1988" "pre1990"
[4,] "B"   "1990" "pre1992"

Next subtract 1 year and add column pre:

df2<-cbind(group=as.character(df$group),year=df$year-1,pre=paste("pre",df$year,sep=""))

    group year   pre      
[1,] "A"   "1979" "pre1980"
[2,] "A"   "1985" "pre1986"
[3,] "B"   "1989" "pre1990"
[4,] "B"   "1991" "pre1992"

Now rbind the two together:

ndf<-data.frame(rbind(df1,df2))

group year     pre
1     A 1978 pre1980
2     A 1984 pre1986
3     B 1988 pre1990
4     B 1990 pre1992
5     A 1979 pre1980
6     A 1985 pre1986
7     B 1989 pre1990
8     B 1991 pre1992

Sort it according to year. This is your output.

Lastdf <- ndf[order(ndf$year),] 

group year     pre
1     A 1978 pre1980
5     A 1979 pre1980
2     A 1984 pre1986
6     A 1985 pre1986
3     B 1988 pre1990
7     B 1989 pre1990
4     B 1990 pre1992
8     B 1991 pre1992

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.