1

I'm having some trouble cleaning up a compiled dataset. Here's what the data look like:

   site unique_id      date latitude longitude depth name    count
1  L012    L012_1   no data 18.17606 -65.10571    40 dat1        0
2  L012    L012_1   no data 18.17606 -65.10571    40 dat2        5
3  L012    L012_1   no data 18.17606 -65.10571    40 dat3        4
4  B197    B197_1   no data 18.21543 -65.04415    43 dat2        5
5   S56     S56_1 9/16/2016 18.24459 -65.11549   999 dat4        5
6 N9040   N9040_1 7/16/2013 18.26385 -64.90385    25 dat5        1
7    SC      SC_1 7/19/2006 18.26267 -64.87237    24 dat6        0
8    SC      SC_2 7/19/2006 18.26267 -64.87237    24 dat6        0

I need to remove duplicate rows based on the latitude and longitude columns on the condition that the count column has a number in it greater than 0 within those duplicate rows. The row that should remain then would be a unique lat/long with a 0 in the count column. That would be the case with the first three rows in this df.

At the same time, I need to keep any lat/longs that are unique (rows 4,5,6), even though they have numbers in the count columns greater than 0. I also need to keep any duplicate rows with the same lat/long, but have a 0 in the count column.

Ideally, I want the resulting data frame to look like this:

   site unique_id      date latitude longitude depth name    count
1  L012    L012_1   no data 18.17606 -65.10571    40 dat1        0
4  B197    B197_1   no data 18.21543 -65.04415    43 dat2        5
5   S56     S56_1 9/16/2016 18.24459 -65.11549   999 dat4        5
6 N9040   N9040_1 7/16/2013 18.26385 -64.90385    25 dat5        1
7    SC      SC_1 7/19/2006 18.26267 -64.87237    24 dat6        0
8    SC      SC_2 7/19/2006 18.26267 -64.87237    24 dat6        0

The original data frame is much larger than this and contains more 4s in the count column, so just 4s cannot be removed.

1 Answer 1

2

What about this?

library(dplyr)
df %>% group_by(latitude, longitude) %>% filter(n() == 1 | count == 0)
Source: local data frame [6 x 8]
Groups: latitude, longitude [5]

   site unique_id      date latitude longitude depth  name count
  <chr>     <chr>     <chr>    <dbl>     <dbl> <int> <chr> <int>
1  L012    L012_1    nodata 18.17606 -65.10571    40  dat1     0
2  B197    B197_1    nodata 18.21543 -65.04415    43  dat2     5
3   S56     S56_1 9/16/2016 18.24459 -65.11549   999  dat4     5
4 N9040   N9040_1 7/16/2013 18.26385 -64.90385    25  dat5     1
5    SC      SC_1 7/19/2006 18.26267 -64.87237    24  dat6     0
6    SC      SC_2 7/19/2006 18.26267 -64.87237    24  dat6     0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.