Removing particular rows in a dataframe with pre-defined conditions

Question

I have a data frame with columns

    shipment_id     created_at    picked_at   packed_at   shipped_at
    CSDJKH231BN     2019-02-03    2019-02-03    
    CSDJKH231BN     2019-02-03    2019-02-03  2019-02-04  2019-02-05
    CSDJKH2KFJ3     2019-02-01    2019-02-04  2019-02-07

The data base is being uploaded to rServer via google drive which is being constantly being updated.

    u1 <- "https://docs.google.com/spreadsheets/d/e/"link""
    tc1 <- getURL(u1, ssl.verifypeer=FALSE)
    x <- read.csv(textConnection(tc1))

If in the first update the shipment_id CSDJKH231BN was upto picked_at and in second update from google drive we get CSDJKH231BN upto shipped_at. How do i keep only the shipment_id that are upto shipped_at, but i also want to keep the shipment_id like CSDJKH2KFJ3 which are still to be processed and are not shipped yet.

Basically just to delete the duplicate entries but this code is not working for me.

    df <- df[!duplicated(df), ]

Any help would be appreciated.

heds1 · Accepted Answer · 2019-06-08 05:24:18Z

2

I think you just need to specify that you're looking for duplicates in shipment_id. However, that will just keep the first version which would have nothing in the shipped_at column. So you might need to sort the column by the shipped_at and packed_at columns (in reverse, so that null values are at the bottom). Does this work?

df <- df[order(df[,'shipped_at'],df[,'packed_at'], decreasing=TRUE),]
df <- df[!duplicated(df$shipment_id), ]

edited Jun 8, 2019 at 5:24

answered Jun 8, 2019 at 5:02

heds1

3,4903 gold badges20 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Removing particular rows in a dataframe with pre-defined conditions

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related