1

I'm trying to match corresponding values of two columns in two different data frames. For every subc-year pat.id-wise pair (e.g. 14X-1991) in df1, I'd like to search df2 to create a list/vector/etc with all the df2$pat.id of matching combinations (for the example above, US18 and US20).

As a sample:

df1:

pat.id subc year
US1    14X  1991
US3    15R  1992
US5    10R  1990

df2:

pat.id subc year
US18   14X  1991
US20   14X  1991
US33   15R  1992
US34   15R  1992
US37   15R  1992
US50   10R  1990

Data:

df1 <- data.frame(cbind(c("US1", "US3", "US5"), c("14X", "15R", "10R"), c("1991", "1992", "1990"))) colnames(df1) <- c("pat.id", "subc", "year") df2 <- data.frame(cbind(c("US18", "US20", "US33", "US34", "US37", "US50"), c("14X", "14X", "15R", "15R", "15R", "10R"), c("1991", "1991", "1992", "1992", "1992", "1990"))) colnames(df2) <- c("pat.id", "subc", "year")

Plugging in concrete values, it has worked for me with df2$pat.id[which(df2$year==1991 & df2$subc=="14X")]. Now, I'd like to loop through all rows in df1.

Thank you!

1 Answer 1

2

This is just a merge operation as far as I can tell:

vars <- c("subc","year")
merge(df1[vars], df2[c(vars,"pat.id")], by=vars)

#  subc year pat.id
#1  10R 1990   US50
#2  14X 1991   US18
#3  14X 1991   US20
#4  15R 1992   US33
#5  15R 1992   US34
#6  15R 1992   US37

If you only want to pick one row, sample randomly from df2 before merging:

merge(
 df1[vars],
 aggregate(pat.id ~ ., data=df2[c("pat.id",vars)], FUN=sample, 1), by=vars
)
#  subc year pat.id
#1  14X 1991   US20
#2  15R 1992   US33
#3  10R 1990   US50
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, that makes the first step easier! Now, I'd like to randomly select ( sample() ) one of the output's pat.ids with the same subc-year combination (e.g. one of US33, US34, US37) and assign/append it to each row of df1 again. That's where I'm stuck again..
Thank you again. It works on the sample, but trying it for a similar but only much larger dataframe with the same structure yields: [1] subc year pat.id <0 rows> (or 0-length row.names) -- have you encountered this before?
@user5835099 - where does it fail - does the aggregate work or the merge fail ?
Both seem to run successfully with no errors (and surprisingly quick since it's got 100thousands of rows). When calling the resulting object, it states "[1] subc year pat.id <0 rows> (or 0-length row.names)"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.