3

I've found many pages about finding duplicated elements in a list or duplicated rows in a data frame. However, I want to search for duplicated elements throughout the entire data frame. Take this as an example:

df
     coupon1    coupon2    coupon3
1         10         11         12
2         13         16         15
3         16         17         18
4         19         20         21
5         22         23         24
6         25         26         27

You'll notice that df[2,2] and df[3,1] have the same element (16). When I run

duplicated(df)

It returns six "FALSE"s because the entire row isn't duplicated, just one element. How can I check for any duplicated values within the entire data frame? I would like to both know the duplicate exist and also know its value (and the same if there's multiple duplicates).

2
  • is it enough for your purposes to map to a vector: duplicated(matrix(df, ncol=1)) Commented Jul 7, 2015 at 18:32
  • The only thing is this matrix can be thousands of lines long, so I'm looking for a solution that deals with it as a data frame. Commented Jul 7, 2015 at 18:40

2 Answers 2

2

This will find global dupes but it searches columnwise. So (3,1) will still be FALSE as it is the first value 16 in the data frame.

m <- matrix(duplicated(unlist(df)), ncol=ncol(df))
#      [,1]  [,2]  [,3]
#[1,] FALSE FALSE FALSE
#[2,] FALSE  TRUE FALSE
#[3,] FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE
#[6,] FALSE FALSE FALSE

You can then use it however you'd like, for example:

df[m]
#[1] 16
Sign up to request clarification or add additional context in comments.

Comments

1
which(duplicated(stack(yourdf)[,1]))
[1] 8
stack(yourdf)[,1][which(duplicated(stack(yourdf)[,1]))]
[1] 16

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.