2

Totally new to R and I trying to solve this using the dplyr package. I want to filter out and return countries that both have Import and Export values and view them separately. I tried a lot of methods such as select and filter but have been unable to do so.

Country Year    Quantity    Description Import/Export
A   2001    10  Frozen  Export
B   2001    50  Fresh   Import
B   2004    20  Frozen  Export
C   2003    30  Frozen  Import
C   2005    40  Fresh   Export
C   2006    60  Frozen  Import
D   2007    290 Fresh   Import

Ideally, the end result should be this:

Country Year    Quantity    Description Import/Export
B   2001    50  Fresh   Import
B   2004    20  Frozen  Export
C   2003    30  Frozen  Import
C   2005    40  Fresh   Export
C   2006    60  Frozen  Import
0

4 Answers 4

3

We can group_by() Country, then filter all groups with any "Import/Export" =='Import' and any ""Import/Export"" == 'Export'

library(dplyr)

df %>% group_by(Country) %>%
        filter(any(`Import/Export`=='Import') & 
               any(`Import/Export`=='Export')) %>%
        ungroup()

# A tibble: 5 x 5
  Country  Year Quantity Description `Import/Export`
  <chr>   <dbl>    <dbl> <chr>       <chr>          
1 B        2001       50 Fresh       Import         
2 B        2004       20 Frozen      Export         
3 C        2003       30 Frozen      Import         
4 C        2005       40 Fresh       Export         
5 C        2006       60 Frozen      Import 

data

structure(list(Country = c("A", "B", "B", "C", "C", "C", "D"), 
    Year = c(2001, 2001, 2004, 2003, 2005, 2006, 2007), Quantity = c(10, 
    50, 20, 30, 40, 60, 290), Description = c("Frozen", "Fresh", 
    "Frozen", "Frozen", "Fresh", "Frozen", "Fresh"), `Import/Export` = c("Export", 
    "Import", "Export", "Import", "Export", "Import", "Import"
    )), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"
))
Sign up to request clarification or add additional context in comments.

5 Comments

Amazing!! What would be the benefit of presenting the data as shown in the 2nd part?
what 2nd part??
structure(list(Country = c("A", "B", "B", "C", "C", "C", "D"), Year = c(2001, 2001, 2004, 2003, 2005, 2006, 2007), Quantity = c(10, 50, 20, 30, 40, 60, 290), Description = c("Frozen", "Fresh", "Frozen", "Frozen", "Fresh", "Frozen", "Fresh"), Import/Export = c("Export", "Import", "Export", "Import", "Export", "Import", "Import" )), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame" ))
this is your data in a reproducible form. If you run that code, it gets us your dataframe. It makes it easier to share the data. You can get the same code with dput(data)
You should always share your data this way, it makes it much easier for others to test the answers and manipulate your data.
3

Using data from @GuedesBF answer here is another dplyr way to filter groups which has both 'Import' and 'Export'.

library(dplyr)

df %>%
  group_by(Country) %>%
  filter(all(c('Import', 'Export') %in% `Import/Export`)) %>%
  ungroup()

# Country  Year Quantity Description `Import/Export`
#  <chr>   <dbl>    <dbl> <chr>       <chr>          
#1 B        2001       50 Fresh       Import         
#2 B        2004       20 Frozen      Export         
#3 C        2003       30 Frozen      Import         
#4 C        2005       40 Fresh       Export         
#5 C        2006       60 Frozen      Import         

Comments

1

Using data.table

library(data.table)
setDT(df)[df[, .I[all(c('Import', 'Export') %in% `Import/Export`)], Country]$V1]
   Country Year Quantity Description Import/Export
1:       B 2001       50       Fresh        Import
2:       B 2004       20      Frozen        Export
3:       C 2003       30      Frozen        Import
4:       C 2005       40       Fresh        Export
5:       C 2006       60      Frozen        Import

Comments

0
library(tidyverse)

data <- tribble(
  ~Country, ~Year, ~Quantity, ~Description, ~`Import/Export`,
  "A", 2001, 10, "Frozen", "Export",
  "B", 2001, 50, "Fresh", "Import",
  "B", 2004, 20, "Frozen", "Export",
  "C", 2003, 30, "Frozen", "Import",
  "C", 2005, 40, "Fresh", "Export",
  "C", 2006, 60, "Frozen", "Import",
  "D", 2007, 290, "Fresh", "Import"
)
data
#> # A tibble: 7 x 5
#>   Country  Year Quantity Description `Import/Export`
#>   <chr>   <dbl>    <dbl> <chr>       <chr>          
#> 1 A        2001       10 Frozen      Export         
#> 2 B        2001       50 Fresh       Import         
#> 3 B        2004       20 Frozen      Export         
#> 4 C        2003       30 Frozen      Import         
#> 5 C        2005       40 Fresh       Export         
#> 6 C        2006       60 Frozen      Import         
#> 7 D        2007      290 Fresh       Import

selected_countries <-
  data %>%
  mutate(is_there = TRUE) %>%
  distinct(Country, `Import/Export`, is_there) %>%
  pivot_wider(names_from = "Import/Export", values_from = is_there) %>%
  filter(!is.na(Export) & !is.na(Import)) %>%
  pull(Country) %>%
  unique()
selected_countries
#> [1] "B" "C"

data %>% filter(Country %in% selected_countries)
#> # A tibble: 5 x 5
#>   Country  Year Quantity Description `Import/Export`
#>   <chr>   <dbl>    <dbl> <chr>       <chr>          
#> 1 B        2001       50 Fresh       Import         
#> 2 B        2004       20 Frozen      Export         
#> 3 C        2003       30 Frozen      Import         
#> 4 C        2005       40 Fresh       Export         
#> 5 C        2006       60 Frozen      Import

Created on 2021-09-11 by the reprex package (v2.0.1)

2 Comments

Love it! What is the purpose of adding a column?
It's to indicate that the property is "there". Pivot wider requires a name and a value column. After the pivot, we have all the info need in one row per country.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.