1

I have two dataframes, one which contains a timestamp and air_temperature

air_temp  time_stamp
85.1      1396335600
85.4      1396335860

And another, which contains startTime, endTime, location coordinates, and a canonical name.

startTime    endTime       location.lat    location.lon    name
1396334278   1396374621    37.77638        -122.4176       Work
1396375256   1396376369    37.78391        -122.4054       Work

For each row in the first data frame, I want to identify which time range in the second data frame it lies in, i.e if the timestamp 1396335600, is between the startTime 1396334278, and endTime 1396374621, add the location and name value to the row in the first data.frame.

The start and end time in the second data frame don't overlap, and are linearly increasing. However they are not perfectly continuous, so if the timestamp falls between two time bands, I need to mark the location as NA. If it does fit between the start and end times, I want to add the location.lat, location.lon, and name columns to the first data frame.

Appreciate your help.

1
  • look at which() cbind() , and rbind() for base quick and dirty solutions. Else, make factors out of your times. Commented Apr 23, 2014 at 6:12

3 Answers 3

1

Try this. Not tested.

newdata <- data2[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime  ,3:5]
data1 <- cbind(data1[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime,],newdata)

This won't return any values if timestamp isn't between startTime and endTime, so in theory your returned dataset could be shorter than the original. Just in case I treated data1 with the same TRUE FALSE vector as data2 so they will be the same length.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the help! I am getting NAs for all rows except for the first one.
Looked over this in a little more detail. Turns out I jumped the gun. My code returns NA because its only comparing data in the same rows. You need to do a which loop or use which in sapply like the others have done so that you can compare each row in DF1 in every row within DF2. I could fix this code but it look nearly identical to the others so just go with one them. I predict that because you have to compare all rows against all rows the code might slow down with bigger data. If that happens you might look into sorting data and a method of starting loops where they stopped in the last loop
0

Interesting problem... Turned out to be more complicated than I originally thought!! Step1: Set up the data!

DF1 <- read.table(text="air_temp  time_stamp
85.1      1396335600
85.4      1396335860",header=TRUE)

DF2 <- read.table(text="startTime    endTime       location.lat    location.lon    name
1396334278   1396374621    37.77638        -122.4176       Work
1396375256   1396376369    37.78391        -122.4054       Work",header=TRUE)

Step2: For each time_stamp in DF1 compute appropriate index in DF2:

index <- sapply(DF1$time_stamp, 
       function(i) {
         dec <- which(i >= DF2$startTime & i <= DF2$endTime)
         ifelse(length(dec) == 0, NA, dec)
         }
       )
index

Step3: Merge the two data frames:

DF1 <- cbind(DF1,DF2[index,3:5])
row.names(DF1) <- 1:nrow(DF1)
DF1

Hope this helps!!

Comments

0
rowidx <- sapply(dfrm1$time_stamp, function(x) which( dfrm2$startTime <= x & dfrm2$endTime >= x) 
cbind(dfrm1$time_stamp. dfrm2[ rwoidx, c("location.lat","location.lon","name")]

Mine's not test either and looks substantially similar to CCurtis, so give him the check if it works.

1 Comment

Thanks for your courtesy. Wow you're nearly up to 100K.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.