Using data in one data.frame to generate values for a new column in another data.frame in R

Question

I have two dataframes, one which contains a timestamp and air_temperature

air_temp  time_stamp
85.1      1396335600
85.4      1396335860

And another, which contains startTime, endTime, location coordinates, and a canonical name.

startTime    endTime       location.lat    location.lon    name
1396334278   1396374621    37.77638        -122.4176       Work
1396375256   1396376369    37.78391        -122.4054       Work

For each row in the first data frame, I want to identify which time range in the second data frame it lies in, i.e if the timestamp 1396335600, is between the startTime 1396334278, and endTime 1396374621, add the location and name value to the row in the first data.frame.

The start and end time in the second data frame don't overlap, and are linearly increasing. However they are not perfectly continuous, so if the timestamp falls between two time bands, I need to mark the location as NA. If it does fit between the start and end times, I want to add the location.lat, location.lon, and name columns to the first data frame.

Appreciate your help.

look at which() cbind() , and rbind() for base quick and dirty solutions. Else, make factors out of your times. — Toby
– Toby, Commented Apr 23, 2014 at 6:12

CCurtis · Accepted Answer · 2014-04-23 06:38:18Z

1

Try this. Not tested.

newdata <- data2[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime  ,3:5]
data1 <- cbind(data1[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime,],newdata)

This won't return any values if timestamp isn't between startTime and endTime, so in theory your returned dataset could be shorter than the original. Just in case I treated data1 with the same TRUE FALSE vector as data2 so they will be the same length.

edited Apr 23, 2014 at 6:38

answered Apr 23, 2014 at 6:22

CCurtis

1,9623 gold badges17 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user3563018 Over a year ago

Thanks for the help! I am getting NAs for all rows except for the first one.

CCurtis Over a year ago

Looked over this in a little more detail. Turns out I jumped the gun. My code returns NA because its only comparing data in the same rows. You need to do a which loop or use which in sapply like the others have done so that you can compare each row in DF1 in every row within DF2. I could fix this code but it look nearly identical to the others so just go with one them. I predict that because you have to compare all rows against all rows the code might slow down with bigger data. If that happens you might look into sorting data and a method of starting loops where they stopped in the last loop

Shambho · Accepted Answer · 2014-04-23 08:29:31Z

Interesting problem... Turned out to be more complicated than I originally thought!! Step1: Set up the data!

DF1 <- read.table(text="air_temp  time_stamp
85.1      1396335600
85.4      1396335860",header=TRUE)

DF2 <- read.table(text="startTime    endTime       location.lat    location.lon    name
1396334278   1396374621    37.77638        -122.4176       Work
1396375256   1396376369    37.78391        -122.4054       Work",header=TRUE)

Step2: For each time_stamp in DF1 compute appropriate index in DF2:

index <- sapply(DF1$time_stamp, 
       function(i) {
         dec <- which(i >= DF2$startTime & i <= DF2$endTime)
         ifelse(length(dec) == 0, NA, dec)
         }
       )
index

Step3: Merge the two data frames:

DF1 <- cbind(DF1,DF2[index,3:5])
row.names(DF1) <- 1:nrow(DF1)
DF1

Hope this helps!!

IRTFM · Accepted Answer · 2014-04-23 06:26:04Z

0

rowidx <- sapply(dfrm1$time_stamp, function(x) which( dfrm2$startTime <= x & dfrm2$endTime >= x) 
cbind(dfrm1$time_stamp. dfrm2[ rwoidx, c("location.lat","location.lon","name")]

Mine's not test either and looks substantially similar to CCurtis, so give him the check if it works.

answered Apr 23, 2014 at 6:26

IRTFM

264k22 gold badges381 silver badges503 bronze badges

1 Comment

CCurtis Over a year ago

Thanks for your courtesy. Wow you're nearly up to 100K.

Collectives™ on Stack Overflow

Using data in one data.frame to generate values for a new column in another data.frame in R

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related