0

Tl;dr - I'm trying to use the merge.data.table() function with row indexes and the suggestions given in the R documentation are not working.

My data is roughly:

library(data.table)
library(quantreg)
library(purrr)

foo <- expand.grid(c(seq(60001, 60050, by = 1),
                   c("18-30", "31-60", "61+"),
                   c("pre", "during", "after"))
foo <- as.data.table(foo)
setnames(foo, names(foo), c("zip", "agegroup", "period"))
foo <- cbind(foo, 
             quartile = floor(runif(n = nrow(foo), 1, 4)),
             times = runif(n = nrow(foo), 18, 25))

I ran several quantile regressions on the data, subsetting by age group (at someone else's request).

v_tau <- c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)

mq_age1 <- map(v_tau, ~rq(data = foo[agegroup == "18-30",], 
               times ~ quartile + period + quartile*period,
               tau = .x))  

I'm trying to merge a vector of predicted fitted values from the rq() object with the original data table (I could also transform it into a dataframe, it doesn't have to be a data table). This vector is shorter than the number of rows in the data table, so I've been trying to apply the answer given here for a plm() object, modifying to account for the fact that my fitted values do not have multiple index attributes.

So, what I have been trying to do is join them by row index. I realize I can make another column with an explicit index, but I would like to avoid that because the fitted values are from a subset of the data and I am joining them to a subset of the data; adding an explicit index is possible, but not uniform or parsimonious, and will end up generating a lot of NAs that I don't want to deal with.

fitted <- mq_age1[[10]]$fitted.values
d_fitted <- cbind(attr(fitted, "index"),
                    fitted = fitted)

foo2 <- merge(foo[agegroup == "18-30",], d_fitted, by = 0, all.x = TRUE) 

Looking at the merge() documentation, it says: "Columns to merge on can be specified by name, number or by a logical vector: the name "row.names" or the number 0 specifies the row names. If specified by name it must correspond uniquely to a named column in the input."

However, when I try this, it gives me the following error message:

Error in 
merge.data.table(foo[agegroup == "18-30", ], d_fitted, by = 0,  : 
A non-empty vector of column names for `by` is required.

Similarly, when I try using "row.names":

foo2 <- merge(foo[agegroup == "18-30",], d_fitted, by = "row.names", all.x = TRUE)
Error in merge.data.table(foo[agegroup == "18-30", ], d_fitted, by = "row.names",  : 
  Elements listed in `by` must be valid column names in x and y

What is going on? Why can't I do this?

1
  • I can't verify this code (don't have quantreg), but while base::merge mentions the use of by=0, the S3 method data.table::merge does not. Perhaps you can try with merge(as.data.frame(foo[...,]), ...)? Commented Feb 1, 2023 at 1:49

1 Answer 1

0

Found the answer: @r2evans kindly pointed out that base::merge has this functionality, while data.table::merge does not.

foo <- as.data.frame(foo)

before

foo2 <- merge(foo[foo$agegroup == "18-49",], d_fitted, by = 0, all.x = TRUE)

did the trick. Thanks!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.