1

Sample Dataframe:

      ID RNG_BEFORE_WDL RNG_AFTER_WDL
     <int> <chr>          <chr>        
 1 3490115 Less than 10K  Less than 10K
 2 3564671 Less than 1K   0            
 3 3914214 Less than 30K  0            
 4 3971472 More than 60K  More than 60K
 5 3971472 More than 60K  More than 60K
 6 4138130 Less than 1K   0            
 7 4143893 Less than 10K  Less than 10K
 8 4145280 Less than 10K  Less than 100
 9 4146908 Less than 30K  Less than 30K
10 4146929 Less than 10K  Less than 1K 
11 4146958 Less than 30K  Less than 10K
12 4147813 Less than 10K  Less than 1K 
13 4148128 Less than 1K   0            
14 4148446 Less than 60K  Less than 60K
15 4148446 Less than 60K  Less than 60K

2 columns (RNG_BEFORE_WDL & RNG_AFTER_WDL) has different name but share unique value:

[1] "Less than 10K" "Less than 1K"  "Less than 30K" "More than 60K"
[5] "Less than 60K" "Less than 100"

Expected outcome:

  SV_RNG         BF_WDL  AF_WDL
  <chr>           <int>   <int>
1 0                  NA  926339
2 Less than 100   15165   78106
3 Less than 1K   289663  588669
4 Less than 10K 1225950  558488
5 Less than 30K  848153  469356
6 Less than 60K  464382  354119
7 More than 60K 4531080 4399316

Two Step Solution:

# CREATE SEPARATE ENTITY_SET
es_01 <- df |> distinct(ID, .keep_all = TRUE) |> group_by(RNG_BEFORE_WDL)|> summarise(BF_WDL = n()) |> rename(SV_RNG = RNG_BEFORE_WDL)
es_02 <- df |> distinct(ID, .keep_all = TRUE) |> group_by(RNG_AFTER_WDL) |> summarise(AF_WDL = n()) |> rename(SV_RNG = RNG_AFTER_WDL)
  
# PRE-DEFINE CUSTOM SORT ORDER
ls_odr <- list("0", "Less than 100", "Less than 1K", "Less than 10K", "Less than 30K", "Less than 60K", "More than 60K") 

# SUMMARIZE ENTITY_SET
s_es00 <- es_01 |> full_join(es_02) |> arrange(`SV_RNG` |> factor(ls_odr))

How to do in a single step with R dplyr?

Thanks!

1
  • 4
    Please post your code and sample data as copy/pasteable text, not images. And please DON'T USE ALL CAPS, it seems like shouting. You can use ## in the question to format lines as headings if you like. Commented Oct 20, 2022 at 14:45

1 Answer 1

1

If we need two datasets or keep it in a list, loop over the grouping column name as a string, and the the summarisation column

library(dplyr)
library(purrr)
lst1 <- map2(c("RNG_BEFORE_WDL", "RNG_AFTER_WD"), c("BF_WDL", "AF_WDL"),
     ~   df %>%
           distinct(ID, .keep_all = TRUE) %>%
           group_by(across(all_of(.x))) %>%
           summarise(!! .y := n(), .groups = 'drop') %>%
           rename(SV_RNG = 1)) %>%
     reduce(full_join)
Sign up to request clarification or add additional context in comments.

1 Comment

Good idea. But I need to keep it in a single df.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.