Make a function to filter and summarize using R

Question

I have these two tables;

   <A>                       <B>
a1    a2                     b1   
ABC   CAFE                   AB
ABD   DRINK                  BF
ABF   CAFE                   ..
ABFF  DRINK
..     ..

I would like to know the summarize table containing B to a1 in table A like this;

library(dplyr)
library(stringr)

A1 <- A %>%
filter(str_detect(a1, "AB")) %>%
group_by(a2) %>%
summarize(n())

A2 <- A %>%
filter(str_detect(a1, "BF")) %>%
group_by(a2) %>%
summarize(n())

However, I should make the code several times so that I would like to a function to input the B table in the str_detect function... How do I make the function?

lapply(A$b1,function(x)A%>%filter(str_detect(a1, x)) %>% group_by(a2) %>% summarize(n())) — Onyambu
– Onyambu, Commented Dec 29, 2017 at 2:04
Why not? A is not the parameter, it will call A from the .Globalenv.. Try it out if it doesnt work its am sure someone will give you a correct method.. lapply(B$b1,function(x)A%>%filter(str_detect(a1, x)) %>% group_by(a2) %>% summarize(n())) — Onyambu
– Onyambu, Commented Dec 29, 2017 at 2:10

www · Accepted Answer · 2017-12-29 02:37:14Z

1

Here I designed a function called count_fun, which has four arguments. dat is a data frame like A, Scol is a column with strings, Gcol is the grouping column, and String is the test string. See https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html to learn how to design a function using dplyr.

library(dplyr)
library(stringr)

count_fun <- function(dat, Scol, Gcol, String){

  Scol <- enquo(Scol)
  Gcol <- enquo(Gcol)

  dat2 <- dat %>%
    filter(str_detect(!!Scol, String)) %>%
    group_by(!!Gcol) %>%
    summarize(n())
  return(dat2)
}

count_fun(A, a1, a2, "AB")
# # A tibble: 2 x 2
#   a2    `n()`
#   <chr> <int>
# 1 CAFE      2
# 2 DRINK     2

count_fun(A, a1, a2, "BF")
# # A tibble: 2 x 2
#   a2    `n()`
#   <chr> <int>
# 1 CAFE      1
# 2 DRINK     1

We can then apply count_fun using lapply to loop through every elements in B.

lapply(B$b1, function(x){
  count_fun(A, a1, a2, x)
})

# [[1]]
# # A tibble: 2 x 2
#   a2    `n()`
#   <chr> <int>
# 1 CAFE      2
# 2 DRINK     2
# 
# [[2]]
# # A tibble: 2 x 2
#   a2    `n()`
#   <chr> <int>
# 1 CAFE      1
# 2 DRINK     1

DATA

A <- read.table(text = "a1    a2
ABC   CAFE
                ABD   DRINK 
                ABF   CAFE
                ABFF  DRINK
                ",
                header = TRUE, stringsAsFactors = FALSE)

B <- data.frame(b1 = c("AB", "BF"), stringsAsFactors = FALSE)

edited Dec 29, 2017 at 2:37

answered Dec 29, 2017 at 2:17

www

39.3k12 gold badges52 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SATH Over a year ago

Excuse me, I would like to get the proportion of the summarized table rather than count now... Then I changed the count_fun function but it does not work... How do I get the proportion (%) of the types in the 'function'?

www Over a year ago

This could be a new question, but before you ask a new question, please search on SO to see if there are posts talking about how to calculate percentage using dplyr.

Onyambu · Accepted Answer · 2017-12-29 02:14:55Z

1

I guess this solved your issue:

 lapply(B$b1,function(x)A%>%filter(str_detect(a1, x)) %>% group_by(a2) %>% summarize(n()))

answered Dec 29, 2017 at 2:14

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Collectives™ on Stack Overflow

Make a function to filter and summarize using R

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related