I am trying to create a new variable using ifelse by combining data from two data.frames (similar to this question but without factors).
My problem is that df1 features yearly data, whereas vars in df2 are temporally aggregated: e.g. df1 has multiple obs (1997,1998,...,2005) and df2 only has a range (1900-2001).
For illustration, a 2x2 example would look like
df1$id <- c("2","20")
df1$year <- c("1960","1870")
df2$id <- df1$id
df2$styear <- c("1800","1900")
df2$endyear <- c("2001","1950")
I want to combine both in such a way that the id (same variable exists in both) is matched, and further, the year in df1 is within the range of df2. I tried the following
df1$new.var <- ifelse(df1$id==df2$id & df1$year>=df2$styear &
df1$year<df2$endyear,1,0)
Which ideally should return 1 and 0, respectively.
But instead I get warning messages:
1: In df1$id == df2$id : longer object length is not a multiple of shorter object length
2: In df1$year >= df2$styear : longer object length is not a multiple of shorter object length
3: In df1$year < df2$endyear : longer object length is not a multiple of shorter object length
For the record, the 'real' df1 has 500 obs and df2 has 14. How can I make this work?
Edit: I realised some obs in df2 are repeated, with multiple periods e.g.
id styear endyear
1 1800 1915
1 1950 2002
2 1912 1988
3 1817 2000
So, I believe what I need is something like a double-ifelse:
df1$new.var <- ifelse(df1$id==df2$id & df1$year>=df2$styear &
df1$year<df2$endyear | df1$year>=df2$styear &
df1$year<df2$endyear,1,0)
Obviously, this wouldn't work, but it is a way to get out of the duplicates-problem.
For example, if id=1 in df1$year=1801, it will pass the first year-range test (1801 is between 1800-1915), but fail the second one (1801 is not between 1950-2002), so it is only coded once and no extra rows are added (currently the duplicates add extra rows).