0

I want to subset a data frame (with millions of data rows) thousands of times, using values in two columns in another data frame. Currently I was using the example provided by Akrun

     subset(df1, (Latitude >= (df2$Lat - 0.01)) & (Latitude <= (df2$Lat + 0.01)))

However, this seems to return all of the data that matches any of the rows in the second data frame. How can I adjust this so that it takes a third column from the second data frame as a name for each row subset pair?

Reference; Subset data frame based on range of values in second data frame

1

1 Answer 1

0
# Subsetted data
df_sub <- subset(df1, (Latitude >= (df2$Lat - 0.01)) & (Latitude <= (df2$Lat + 0.01)))
# Names of third column
towns <- df2$Town[(df1$Latitude >= (df2$Lat - 0.01)) & (df1$Latitude <= (df2$Lat + 0.01))]

df_out <- cbind(df_sub, towns)
Sign up to request clarification or add additional context in comments.

10 Comments

Thanks for your advice @Julien. However, this returns arguments imply differing number of rows: 449, 2335850
@DylanEgan To know which code produce this error, run only (df1$Latitude >= (df2$Lat - 0.01)) & (df1$Latitude <= (df2$Lat + 0.01))
I tried that and it returns a list with FALSE upto 384, (so the same number as the number of subsets), and then NA, so it looks like it's not going through the whole dataframe. It also has longer object length is not a multiple of shorter object length
@DylanEgan This line of code was copied from the answer in the question that you linked stackoverflow.com/a/67305902/8806649 in your OP
I'm really sorry to be bothering you
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.