1

Sorry I am new to R and would greatly appreciate some help with this. I am trying to merge the following two dataframes(labourproductivity and Depressiondframe) based on Time:

Time            LabourProductivity
1 2004 Q1   96.6
2      Q2   96.9
3      Q3   96.9
4      Q4   97.1
5 2005 Q1   97.6
6      Q2   99.0

and

Time    DepressionCount
1 2004          875
2 2004.25   820
3 2004.5    785
4 2004.75   857
5 2005          844
6 2005.25   841

Since they both have different values for time I do not know how to merge them. Ideally it would look like:

Time    DepressionCount LabourProductivity
1 2004  875             96.6
2 2004  820             96.9
3 2004  785             96.9
4 2004  857             97.1
5 2005  844             97.6
6 2005  841             99.0

1 Answer 1

1

If "df1", and "df2" are the first and second datasets, create a grouping index ("indx") based on the "Time" column of "df1". Convert the "Time" column to similar format as in "df2" by using ave and as.yearqtr

library(zoo)
indx <-  cumsum(grepl('^\\d+', df1$Time))
df1$Time <- with(df1, as.numeric(ave(Time, indx, FUN= function(x)  {
        x[-1] <- paste (sub(' .*', '', x[1]), x[-1])
        as.yearqtr(x) })))

merge the datasets, and transform the "Time" column if needed

transform(merge(df1, df2), Time=trunc(Time))
#    Time LabourProductivity DepressionCount
#1 2004               96.6             875
#2 2004               96.9             820
#3 2004               96.9             785
#4 2004               97.1             857
#5 2005               97.6             844
#6 2005               99.0             841

Or using data.table

library(data.table)
 setDT(df1)[, TimeN:= as.numeric(as.yearqtr(c(Time[1L],
    paste(sub(' .*', '', Time[1L]), Time[-1L])))), 
      list(Grp=cumsum(grepl('^\\d+', Time)))][,
            Time:= TimeN][, TimeN:=NULL][]

 setkey(df1, Time)[df2][, Time:=trunc(Time)][]
 #   Time LabourProductivity DepressionCount
 #1: 2004               96.6             875
 #2: 2004               96.9             820
 #3: 2004               96.9             785
 #4: 2004               97.1             857
 #5: 2005               97.6             844
 #6: 2005               99.0             841

data

df1 <- structure(list(Time = c("2004 Q1", "Q2", "Q3", "Q4", "2005 Q1", 
"Q2"), LabourProductivity = c(96.6, 96.9, 96.9, 97.1, 97.6, 99
)), .Names = c("Time", "LabourProductivity"), class = "data.frame", 
row.names = c("1", "2", "3", "4", "5", "6"))

df2 <- structure(list(Time = c(2004, 2004.25, 2004.5, 2004.75, 2005, 
2005.25), DepressionCount = c(875L, 820L, 785L, 857L, 844L, 841L
 )), .Names = c("Time", "DepressionCount"), class = "data.frame", 
 row.names = c("1", "2", "3", "4", "5", "6"))
Sign up to request clarification or add additional context in comments.

5 Comments

got the following Error: unexpected symbol in "labourproductivity$Time <- with(labourproductivity, as.numeric(ave(Time, indx, FUN= function(x) {x[-1] <- paste (sub(' .*', '', x[1]), x[-1])as.yearqtr"
@DavidResch Can you try it on the data showed in my post?
Yes Fantastic it worked. My data for both series goes until the end of 2013 so shall i just add the rest of the values or is there a quicker way. Thank you so much for your help!
@DavidResch ave is pretty quick, For the merge part, you can use dplyr or data.table or all the methods in either one of those package
@DavidResch Updated with a possible data.table method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.