1

How can I remove the duplicate rows on the basis of specific columns while maintaining the dataset. I tried using these links1, link2

What I want to do is I want to see the ambiguity on the basis of column 3 to 6. If their values are same then the processed dataset should remove the rows, as shown in the example:

I used this code but I gave me half result:

Data <- unique(Data[, 3:6])

Lets suppose my dataset is like this

 A  B  C  D  E  F  G  H  I  J  K  L  M
 1  2  2  1  5  4  12 A  3  5  6  2  1
 1  2  2  1  5  4  12 A  2 35  36 22 21
 1  22 32 31 5 34  12 A  3  5  6  2  1

What I want in my output is:

 A  B  C  D  E  F  G  H  I  J  K  L  M
 1  2  2  1  5  4  12 A  3  5  6  2  1
 1  22 32 31 5 34  12 A  3  5  6  2  1    

2 Answers 2

2

Another option is unique from data.table. It has the by option. We convert the 'data.frame' to 'data.table' (setDT(df1)), use unique and specify the columns within the by

 library(data.table)
 unique(setDT(df1), by= names(df1)[3:6])
 #   A  B  C  D E  F  G H I J K L M
 #1: 1  2  2  1 5  4 12 A 3 5 6 2 1
 #2: 1 22 32 31 5 34 12 A 3 5 6 2 1

unique returns a data.table with duplicated rows removed.

Sign up to request clarification or add additional context in comments.

3 Comments

@ayush What is the other question
I have already have the dummy solution but it isn't accepting in my original dataset. I tried every possible permutation in my code but it won't work. Can I mail you the ques? or you can ping me over mail so that i can do that.
@ayush I am using sim to connect to the net. Downloading big datasets is costly for me. Can't you provide a dummy example that mimics your original dataset as a new post
2

Assuming that your data is stored as a dataframe, you could try:

Data <- Data[!duplicated(Data[,3:6]),]
#> Data
#  A  B  C  D E  F  G H I J K L M
#1 1  2  2  1 5  4 12 A 3 5 6 2 1
#3 1 22 32 31 5 34 12 A 3 5 6 2 1

The function duplicated() returns a logical vector containing in this case information for each row about whether the combination of the entries in column 3 to 6 reappears elsewhere in the dataset. The negation ! of this logical vector is used to select the rows from your dataset, resulting in a dataset with unique combinations of the entries in column 3 to 6.

Thanks to @thelatemail for pointing out a mistake in my previous post.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.