3

I often use loops in my code. I was told that rather than using loops, I should be using functions, and that a loop can be re-written using a function in the R package purr.

As an example the code shows just the counts of the different species in the iris dataset where the Sepal.Width < 3

 library(dplyr)
 #dataframe to put the output in
 sepaltable <- data.frame(Species=character(),
                     Total=numeric(), 
                     stringsAsFactors=FALSE) 

 #list of species to iterate over
 specieslist<-unique(iris$Species)

 #loop to populate the dataframe with the name of the species 
 #and the count of how many there were in the iris dataset

 for (i in  seq_along (specieslist)){
 a<-paste(specieslist[i])  
 b<- filter(iris,`Species`==a & Sepal.Width <=3)
 c<-nrow(b)
 sepaltable[i,"Species"]<-a
 sepaltable[i,"Total"]<-c
 }

The loop populates the sepaltable dataframe with the name of each species and how many of them there were in the iris dataset. I want to reproduce the effects of this loop using a function in the R package purrr without using a loop. Can anyone help?

2 Answers 2

5

We can use a group by sum of logical expression in dplyr

library(dplyr)
iris %>% 
   group_by(Species) %>%
   summarise(Total = sum(Sepal.Width <=3))

Or if purrr is needed

library(purrr)
map_dfr(specieslist,  ~iris %>% 
      summarise(Total = sum(Species == .x & Sepal.Width <=3),
          Species = .x )) %>%
   select(Species, Total)

NOTE: map or apply family functions (lapply/sapply/vapply/rapply/mapply/Map/apply) are all loops

Sign up to request clarification or add additional context in comments.

1 Comment

agree with akrun - no reason to implement purrr here.
2

For the type example you provided, akrun's answer is the most straightforward approach, especially since you are already using dplyr. The dplyr package is written to handle basic data table summaries, especially the group statistics used in your example.
But, with more complicated cases most of the time you write a loop, you could accomplish the same thing using a function and the apply family.

using your example:

# write function that does the stuff you put in your loop
summSpecies <- function(a) {
      b<- filter(iris,`Species`==a & Sepal.Width <=3)
      c<-nrow(b)
      return(c)
}

# apply the loop over your list
sapply(specieslist,summSpecies) #sapply simplifies the output to return a vector (in this case)
#[1]  8 42 33

# You can build this into a data frame
sepaltable <- data.frame(Species=specieslist,
                         Total=sapply(specieslist,summSpecies), 
                         stringsAsFactors=FALSE) 
sepaltable
#      Species Total
# 1     setosa     8
# 2 versicolor    42
# 3  virginica    33

For what it's worth I did a comparison of the methods proposed in the example:

Unit: microseconds
#            expr      min        lq     mean   median        uq       max neval
#      ForLoop.OP 2548.519 2725.9020 3107.153 2819.837 3006.5915 11654.194   100
#     Apply.Brian 2385.638 2534.2390 2810.854 2625.050 2822.5145  9641.172   100
#     dplyr.akrun 721.136  837.6065 1180.244  864.604  902.9815 13440.076   100
#     purrr.akrun 3572.656 3783.2845 4147.900 3874.095 4073.5690 10517.602   100
#    purrr.Axeman 2440.973 2527.322 2866.7686 2586.8960 2774.097  9577.360   100

It should be no surprise that the existing function that is optimized for this kind of task is the clear winner. The for loop approach lags behind the apply family approach.

3 Comments

And replace sapply with map_int if you want to use purrr.
In the loop example the empty dataframe is produced first which is populated by the loop. Is it possible to do the same thing with the function in that the function populates the empty dataframe rather than having to incorporate the results as a vector at the end?
@Basil sure you could produce the empty dataframe first, then populate it with e.g sepaltable$Total<- sapply(specieslist,summSpecies) This would work whether or not sepaltable had a column named Total. You could also use a column of a dataframe as the input (ie sepaltable$Total<- sapply(sepaltable$Species,summSpecies)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.