2

I am trying to apply a function across a number of dataframes using lapply. The function works when I invoke it on each of the dataframes individually, but lapply throws an error. The error doesn't seem relevant. I can't work out what the issue is. Here is an example:

a <- data.frame('country' = factor(c(rep(1, 5), rep(2, 5))), 
           'variable' = factor(c(rep('A', 5), rep('B', 5))), 
           'value' = runif(10, 0, 1), 
           'year' = runif(10, 0, 1))

slope <- function(dat) {
  dat %>%
  filter(!value %in% c(-66, -77, -88) & !is.na(value)) %>%
  group_by(country, variable) %>%
  do(data.frame(slope = coef(lm(value ~ year, .))[2])) %>%
  ungroup()
}

This function works:

> slope(a)
    # A tibble: 2 x 3
      country variable  slope
      <fct>   <fct>     <dbl>
    1 1       A         0.140
    2 2       B        -0.150

But lapply doesn't:

   > lapply(a, slope)
     Error in UseMethod("filter_") : 
      no applicable method for 'filter_' applied to an object of class "factor" 

I don't understand the error because value, which is filtered, is numeric (not a factor).

> str(a)
'data.frame':   10 obs. of  4 variables:
 $ country : Factor w/ 2 levels "1","2": 1 1 1 1 1 2 2 2 2 2
 $ variable: Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
 $ value   : num  0.884 0.513 0.835 0.83 0.694 ...
 $ year    : num  0.4288 0.2874 0.0531 0.7793 0.0496 ...

Obviously when using lapply in practice, I would be using it on a number of dataframes. I don't think it makes a difference in the example - i get the same error when trying to do this on a number of dataframes. I assume I am missing something obvious.

4
  • when you loop over a, it is looping through the columns, i.e. a vector and is not a data.frame Commented Dec 15, 2019 at 22:34
  • Do you need split(a, a$country) %>% lapply(slope) Commented Dec 15, 2019 at 22:35
  • @akrun, thanks for this. Seeing your first comment, made me realize that I had used lapply on c('a', 'b', 'c'), rather than on list('a', 'b', 'c') - where 'a', 'b' and 'c' are all dataframes. When I do that it all works. Thanks for this. Commented Dec 15, 2019 at 22:43
  • You don't need to quote the object names Commented Dec 15, 2019 at 22:43

1 Answer 1

1

The issue is that applying lapply on the data.frame, loops through the columns as column is a unit in a data.frame i.e. the output is a list of vectors and it is not a data.frame while the slope function expects a data.frame with columns to act upon.

Also, the OP mentioned about applying the function on a number of data.frames. In that case, place the datasets in a list and apply with lapply i.e.

list(a, a) %>%
   lapply(slope)

Or with a single dataset, wrap with list

list(a) %>%
   lapply(slope)

Or in tidyverse

library(purrr)
list(a) %>%
    map(slope)
Sign up to request clarification or add additional context in comments.

1 Comment

can we pass name columns to the slope function ? If so how ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.