filter() piped with distinct() to select strings in R

Question

Let's assume that I have data in a df called data_full. From data_full I get

data_filtered <- data_full %>% filter(ua %in% c('a', 'b', 'c'))

Where,

data_filtered <- data.frame(ua = c(rep('a', 3), rep('b', 4), rep('c', 3)),
                        sp = c(rep('sp1', 3), rep('sp2', 3), rep('sp3', 2), rep('sp4',2)))

Now, I want to select the unique terms that occur in data_filtered$sp without breaking the pipe in the first code (data_filtered <- data_full %>%). Without a pipe I can simply use unique(data_filtered$sp), but how can I keep it in {dplyr} language? distinctworks in my above example, but in my dataset it doesn't since it keeps the uniqueness between ua. I tried to write some replication code with the ''error'' but I couldn't, so I'll print a section of the data (sorry)

This is after I pipe all the way from data_full into data_filtered. In my example it would be:

data_filtered <- data_full %>%
     filter(ua %in% c('a', 'b', 'c')) %>% distinct(sp)

Is this because of "Select only unique/distinct rows from a data frame." on the function description? If so, how can I get the results I want? For example, only one "Alsophila setosa" in my print. I want the final result to be a vector of species names.

EDIT:

As requested:

structure(list(`Unidade Amostral` = c("1000", "1000", "1000", 
"1000", "1000", "1000", "1000", "1001", "1001", "1001", "1001", 
"1001", "1001", "1001", "1001", "1003", "1003", "1003", "1003", 
"1003"), Espécie = c("Aspidosperma australe", "Cupania vernalis", 
"Matayba elaeagnoides", "Nectandra megapotamica", "Ocotea puberula", 
"Ocotea pulchella", "Parapiptadenia rigida", "Allophylus edulis", 
"Araucaria angustifolia", "Hovenia dulcis", "Machaerium paraguariense", 
"Matayba elaeagnoides", "Muellera campestris", "Nectandra megapotamica", 
"Parapiptadenia rigida", "Clethra scabra", "Ilex brevicuspis", 
"Ilex paraguariensis", "Matayba elaeagnoides", "Myrsine coriacea"
), n = c(4, 7, 14, 6, 9, 4, 5, 4, 8, 3, 4, 16, 10, 6, 4, 4, 13, 
3, 42, 12)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -20L), groups = structure(list(`Unidade Amostral` = c("1000", 
"1001", "1003"), .rows = structure(list(1:7, 8:15, 16:20), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))

So what can I use to avoid the repetition I get in my print? Is it returning each unique value base on each ua? — hiperhiper
– hiperhiper, Commented Aug 2, 2022 at 17:10
unique(data_filtered$sp) gives the same output as %>% distinct(sp) (except distinct returns a data.frame with single column whereas the unique on the vector returns the vector — akrun
– akrun, Commented Aug 2, 2022 at 17:12
I am guessing that your original data values may have leading/lagging spaces in it and thus distinct returning everything as "a" and " a" are different. You mayuse %>%mutate(sp = trimws(sp)) %>% distinct(sp) — akrun
– akrun, Commented Aug 2, 2022 at 17:12
You may have group attribute. try dat %>% ungroup %>% distinct(Espécie) — akrun
– akrun, Commented Aug 2, 2022 at 17:23

akrun · Accepted Answer · 2022-08-02 17:24:57Z

1

Based on the data showed, there is a group attribute, which prevents the distinct from looking over the whole dataset. We need to ungroup first

library(dplyr)
dat %>%
   ungroup %>% 
   distinct(Espécie)

In the case of unique on the extracted the column as a vector, there is no group attribute, as $ or [[ extract will get the whole column whereas within the tidyverse environment, if there is a group attribute, the functions are applied to within each of the group elements

answered Aug 2, 2022 at 17:24

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

hiperhiper Over a year ago

Thanks! I will take this into consideration in further dplyr usage

akrun Over a year ago

@hiperhiper i would check the str(data) first as this gives a lot of info before we even have to look for other issues

hiperhiper Over a year ago

I'll keep that in mind for a next time

Collectives™ on Stack Overflow

filter() piped with distinct() to select strings in R

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related