Vectorizing with lapply instead of using For loop

Question

I am trying to get away from loops in R and was looking to both vectorize and speed up a section of my code.

I am looking to convert a For loop using lapply, but am getting an error:

Reproducible example:

library(dplyr)

# This works using a For loop -----------------------------------

# create sample data frame
df <- data.frame(Date  = rep(c("Jan1", "Jan2", "Jan3"), 3),
                 Item  = c(rep("A", 3), rep("B", 3), rep("C", 3)),
                 Value = 10:18)


diff <- numeric() # initialize

# Loop through each item and take difference of latest value from earlier values
for (myitem in unique(df$Item)) {

    y = df[df$Date == last(df$Date) & df$Item == myitem, "Value"]  # Latest value for an item

    x = df[df$Item == myitem, "Value"]                             # Every value for an item

    diff <- c(diff, y-x)

}

df_final <- mutate(df, Difference = diff)
df_final

I found related questions here (lapply), here (lapply), and here ($ operator) but none really helped me with my question.

Here is how I tried to vectorize using lapply:

# Same thing using vectorized approach ----------------------------------

mylist <- list(unique(df$Item))

myfunction <- function(df = df, diff = numeric()) {

    y = df[df$Date == last(df$Date) & df$Item == mylist, "Value"]  # Latest value for an item

    x = df[df$Item == mylist, "Value"]                             # Every value for an item

    diff <- c(diff, y-x)

}

# throws error
diff_vector <- unlist(lapply(mylist, myfunction))

df_final2 <- mutate(df, Difference = diff_vector)
df_final2

My real data set has hundreds of thousand of rows. If someone could point me in the right direction on how to vectorize this to get the same output as the For loop I would appreciate it.

Thanks!

lapply is a loop. It generally won't make your code faster. It's just nicer and more convenient syntax. — Roland
– Roland, Commented Jun 27, 2018 at 5:52

LachlanO · Accepted Answer · 2018-06-27 23:17:51Z

5

So lapply isn't being used quite right here, that's all!

lapply applies a function to each element of a list. To be explicit, it takes each element of a list, and applies the function to that element.

So if you want it to apply a function to several subsets of a data frame, you need to get it a list which is several subsets of a data frame. So let's create that list first.

We can do this using the split function, it splits your data frame into several data frames based on a column and stores these as a list. A list of subsets of a data frame. Perfect!

So let's replace the line where you create mylist with this line instead.

mylist <- split(df,df[,c("Item")])

Now we just need to make some changes tomyfunction. Remember we're now passing through our data already subsetted, so we can remove the conditions about the Item matching with what we'd expect. Remember this function will get applied to each of these data frames in their entirety.

myfunction <- function(df = df, diff = numeric()) { 
    y = df[df$Date == last(df$Date), "Value"]  # Latest value for an item

    x = df[, "Value"]                             # Every value for an item

    diff <- c(diff, y-x)
}

And the rest my friend, is exactly as you have it :)

edited Jun 27, 2018 at 23:17

answered Jun 27, 2018 at 5:29

LachlanO

1,1628 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

DaveM Over a year ago

Got it, so my code was effectively giving the function a list of characters and not a list of data frames. As a result, when it hit the code df$date, it threw an error because there was no data frame that lapply supplied. Is that correct?

LachlanO Over a year ago

Bingo! It couldn't find an object date in the string "A" so it was getting cranky with you :) My apologies I didn't end up addressing that concern in the original answer! I'll take solace that it was sufficiently clear that you were able to work that out, though!

Melissa Key · Accepted Answer · 2018-06-27 05:17:21Z

1

I'm not sure lapply is the right approach to take. I'd stick with mutate - which you already seem to be using:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(Date  = rep(c("Jan1", "Jan2", "Jan3"), 3),
  Item  = c(rep("A", 3), rep("B", 3), rep("C", 3)),
  Value = 10:18)

df <- df %>%
  group_by(Item) %>%
  mutate(diff = last(Value) - Value)

df
#> # A tibble: 9 x 4
#> # Groups:   Item [3]
#>   Date  Item  Value  diff
#>   <fct> <fct> <int> <int>
#> 1 Jan1  A        10     2
#> 2 Jan2  A        11     1
#> 3 Jan3  A        12     0
#> 4 Jan1  B        13     2
#> 5 Jan2  B        14     1
#> 6 Jan3  B        15     0
#> 7 Jan1  C        16     2
#> 8 Jan2  C        17     1
#> 9 Jan3  C        18     0

Created on 2018-06-27 by the reprex package (v0.2.0).

This does assume that the observations (at least within the "Item" group) are arranged in order. If not, add arrange(Date) %>% as a step after group_by

answered Jun 27, 2018 at 5:17

Melissa Key

4,55114 silver badges22 bronze badges

1 Comment

DaveM Over a year ago

Melissa, good point! Thank you for pointing this out as this would also work with the rest of my code and is simpler.

SatZ · Accepted Answer · 2018-06-27 05:18:15Z

1

you could create a table with the latest value, join with the original table and get the difference or use data.table to create an additional column with latest value

library(data.table)
df <- data.frame(Date  = rep(c("Jan1", "Jan2", "Jan3"), 3),
                 Item  = c(rep("A", 3), rep("B", 3), rep("C", 3)),
                 Value = 10:18)

setDT(df)

df[,latestVal:=last(Value),by=.(Item)][,diff:=latestVal-Value][,.(Date,Item,Value,diff)]

answered Jun 27, 2018 at 5:18

SatZ

4506 silver badges15 bronze badges

Collectives™ on Stack Overflow

Vectorizing with lapply instead of using For loop

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related