Get list of unique values across multiple columns using data.table

Question

I want to get a list of unique numeric id values across multiple numeric id columns. My goal is to help summarize the flow of changes in a database across users changing multiple tables, in my example from table A to B then back to A.

I know I could do this by appending a list of each columns, but I want to make use of data.table internal to improve efficiency if possible.

set.seed(1)
dt <- data.table(tbl_A_create_uid=sample(1:2),
                 tbl_A_update_uid=sample(1:4))
dt[,tbl_B_create_uid:=tbl_A_update_uid]
dt[,tbl_B_update_uid:=sample(1:4)]
dt_after_update<-rbind(dt,data.table(tbl_A_create_uid=dt[,tbl_B_update_uid])
                       ,use.names=TRUE
                       ,fill=TRUE
                       )
dt_after_update
# > dt_after_update
#    tbl_A_create_uid tbl_A_update_uid tbl_B_create_uid tbl_B_update_uid
# 1:                1                3                3                4
# 2:                2                4                4                2
# 3:                1                1                1                3
# 4:                2                2                2                1
# 5:                4               NA               NA               NA
# 6:                2               NA               NA               NA
# 7:                3               NA               NA               NA
# 8:                1               NA               NA               NA

wanted: vector or data.table with unique values, e.g., c(1,2,3,4)

PavoDive · Accepted Answer · 2019-06-14 16:50:02Z

2

Would this work?

melt(dt_after_update)[, unique(value)] #ignore the warning

If you don't want the NAs:

melt(dt_after_update)[!is.na(value), unique(value)] #ignore the warning

answered Jun 14, 2019 at 16:50

PavoDive

6,5073 gold badges32 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

datocrats.org Over a year ago

thanks - yes I didn't realize you could melt without giving column names in that way, great solution

chinsoon12 Over a year ago

another option without melting: dt_after_update[, unique(unlist(lapply(.SD, unique)))]

jeromeResearch Over a year ago

Great idea to use melt(), but this only works if the full dataset is non-integer numeric. The error thrown by melt() says "all non-numeric/integer/logical type columns are considered id.vars" when both id.vars and measure.vars are NULL. Suggestion by @chinsoon12 works independently of data type.

Collectives™ on Stack Overflow

Get list of unique values across multiple columns using data.table

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related