1

Is it possible to remove rows of data by referencing specific character strings or factor levels from 2 or more columns? For small datasets, this is easy because I can just scroll through the dataframe and remove the row I want, but how could this be achieved for larger datasets without endlessly scrolling to see which rows match my criteria?

Fake data:

df1 <- data.frame(year = rep(c(2019, 2020), each = 10),
                  month = rep(c("March", "October"), each = 1), 
                  site = rep(c("1", "2", "3", "4", "5"), each = 2),
                  common_name = rep(c("Tuna", "shark"), each = 1),
                  num = sample(x = 0:2, size  = 20, replace = TRUE))

For example: How do I remove only site "1" in March of 2019 in one line of code and without looking at which row it's in?

2

3 Answers 3

2

You can use subset():

df1 <- data.frame(year = rep(c(2019, 2020), each = 10),
                  month = rep(c("March", "October"), each = 1), 
                  site = rep(c("1", "2", "3", "4", "5"), each = 2),
                  common_name = rep(c("Tuna", "shark"), each = 1),
                  num = sample(x = 0:2, size  = 20, replace = TRUE))

subset(df1, !(site == "1" & year == 2019 & month == "March"))
#>    year   month site common_name num
#> 2  2019 October    1       shark   0
#> 3  2019   March    2        Tuna   1
#> 4  2019 October    2       shark   0
#> 5  2019   March    3        Tuna   0
#> 6  2019 October    3       shark   0
#> 7  2019   March    4        Tuna   2
#> 8  2019 October    4       shark   2
#> 9  2019   March    5        Tuna   0
#> 10 2019 October    5       shark   2
#> 11 2020   March    1        Tuna   1
#> 12 2020 October    1       shark   1
#> 13 2020   March    2        Tuna   2
#> 14 2020 October    2       shark   2
#> 15 2020   March    3        Tuna   1
#> 16 2020 October    3       shark   0
#> 17 2020   March    4        Tuna   1
#> 18 2020 October    4       shark   0
#> 19 2020   March    5        Tuna   0
#> 20 2020 October    5       shark   2

Created on 2022-05-31 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

Comments

1

We could use paste as well

subset(df1, paste(year, month, site) != '2019 March 1')

-output

   year   month site common_name num
2  2019 October    1       shark   1
3  2019   March    2        Tuna   1
4  2019 October    2       shark   2
5  2019   March    3        Tuna   0
6  2019 October    3       shark   0
7  2019   March    4        Tuna   2
8  2019 October    4       shark   1
9  2019   March    5        Tuna   1
10 2019 October    5       shark   1
11 2020   March    1        Tuna   1
12 2020 October    1       shark   1
13 2020   March    2        Tuna   1
14 2020 October    2       shark   2
15 2020   March    3        Tuna   1
16 2020 October    3       shark   0
17 2020   March    4        Tuna   1
18 2020 October    4       shark   1
19 2020   March    5        Tuna   1
20 2020 October    5       shark   2

2 Comments

Why does filter not work here? dplyr::filter(df1, (site !="1" | month !="March" | year !=2019))
@TarJae It should work dplyr::filter(df1, paste(year, month, site) != '2019 March 1')
1

A one line alternative to subset or dplyr:filter using the R bracket notation:

df2 <- df1[!(df1$site=="1" & df1$year==2019 & df1$month=="March"),]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.