I'm having some trouble cleaning up a compiled dataset. Here's what the data look like:
site unique_id date latitude longitude depth name count
1 L012 L012_1 no data 18.17606 -65.10571 40 dat1 0
2 L012 L012_1 no data 18.17606 -65.10571 40 dat2 5
3 L012 L012_1 no data 18.17606 -65.10571 40 dat3 4
4 B197 B197_1 no data 18.21543 -65.04415 43 dat2 5
5 S56 S56_1 9/16/2016 18.24459 -65.11549 999 dat4 5
6 N9040 N9040_1 7/16/2013 18.26385 -64.90385 25 dat5 1
7 SC SC_1 7/19/2006 18.26267 -64.87237 24 dat6 0
8 SC SC_2 7/19/2006 18.26267 -64.87237 24 dat6 0
I need to remove duplicate rows based on the latitude and longitude columns on the condition that the count column has a number in it greater than 0 within those duplicate rows. The row that should remain then would be a unique lat/long with a 0 in the count column. That would be the case with the first three rows in this df.
At the same time, I need to keep any lat/longs that are unique (rows 4,5,6), even though they have numbers in the count columns greater than 0. I also need to keep any duplicate rows with the same lat/long, but have a 0 in the count column.
Ideally, I want the resulting data frame to look like this:
site unique_id date latitude longitude depth name count
1 L012 L012_1 no data 18.17606 -65.10571 40 dat1 0
4 B197 B197_1 no data 18.21543 -65.04415 43 dat2 5
5 S56 S56_1 9/16/2016 18.24459 -65.11549 999 dat4 5
6 N9040 N9040_1 7/16/2013 18.26385 -64.90385 25 dat5 1
7 SC SC_1 7/19/2006 18.26267 -64.87237 24 dat6 0
8 SC SC_2 7/19/2006 18.26267 -64.87237 24 dat6 0
The original data frame is much larger than this and contains more 4s in the count column, so just 4s cannot be removed.