R dplyr - Summarize from two distinct columns containing the same value (single step)

Question

Sample Dataframe:

      ID RNG_BEFORE_WDL RNG_AFTER_WDL
     <int> <chr>          <chr>        
 1 3490115 Less than 10K  Less than 10K
 2 3564671 Less than 1K   0            
 3 3914214 Less than 30K  0            
 4 3971472 More than 60K  More than 60K
 5 3971472 More than 60K  More than 60K
 6 4138130 Less than 1K   0            
 7 4143893 Less than 10K  Less than 10K
 8 4145280 Less than 10K  Less than 100
 9 4146908 Less than 30K  Less than 30K
10 4146929 Less than 10K  Less than 1K 
11 4146958 Less than 30K  Less than 10K
12 4147813 Less than 10K  Less than 1K 
13 4148128 Less than 1K   0            
14 4148446 Less than 60K  Less than 60K
15 4148446 Less than 60K  Less than 60K

2 columns (RNG_BEFORE_WDL & RNG_AFTER_WDL) has different name but share unique value:

[1] "Less than 10K" "Less than 1K"  "Less than 30K" "More than 60K"
[5] "Less than 60K" "Less than 100"

Expected outcome:

  SV_RNG         BF_WDL  AF_WDL
  <chr>           <int>   <int>
1 0                  NA  926339
2 Less than 100   15165   78106
3 Less than 1K   289663  588669
4 Less than 10K 1225950  558488
5 Less than 30K  848153  469356
6 Less than 60K  464382  354119
7 More than 60K 4531080 4399316

Two Step Solution:

# CREATE SEPARATE ENTITY_SET
es_01 <- df |> distinct(ID, .keep_all = TRUE) |> group_by(RNG_BEFORE_WDL)|> summarise(BF_WDL = n()) |> rename(SV_RNG = RNG_BEFORE_WDL)
es_02 <- df |> distinct(ID, .keep_all = TRUE) |> group_by(RNG_AFTER_WDL) |> summarise(AF_WDL = n()) |> rename(SV_RNG = RNG_AFTER_WDL)
  
# PRE-DEFINE CUSTOM SORT ORDER
ls_odr <- list("0", "Less than 100", "Less than 1K", "Less than 10K", "Less than 30K", "Less than 60K", "More than 60K") 

# SUMMARIZE ENTITY_SET
s_es00 <- es_01 |> full_join(es_02) |> arrange(`SV_RNG` |> factor(ls_odr))

How to do in a single step with R dplyr?

Thanks!

Please post your code and sample data as copy/pasteable text, not images. And please DON'T USE ALL CAPS, it seems like shouting. You can use ## in the question to format lines as headings if you like. — Gregor Thomas
– Gregor Thomas, Commented Oct 20, 2022 at 14:45

akrun · Accepted Answer · 2022-10-20 15:19:42Z

1

If we need two datasets or keep it in a list, loop over the grouping column name as a string, and the the summarisation column

library(dplyr)
library(purrr)
lst1 <- map2(c("RNG_BEFORE_WDL", "RNG_AFTER_WD"), c("BF_WDL", "AF_WDL"),
     ~   df %>%
           distinct(ID, .keep_all = TRUE) %>%
           group_by(across(all_of(.x))) %>%
           summarise(!! .y := n(), .groups = 'drop') %>%
           rename(SV_RNG = 1)) %>%
     reduce(full_join)

edited Oct 20, 2022 at 15:19

answered Oct 20, 2022 at 14:46

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

muzyace Over a year ago

Good idea. But I need to keep it in a single df.

Collectives™ on Stack Overflow

R dplyr - Summarize from two distinct columns containing the same value (single step)

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related