1

I get CSV's with hundreds of different columns and would like to be able to output a new file with the duplicate values removed from each column. Everything that I have seen and tried uses a specific column. I just need each column to be unique values.

For Example My Data:

df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs."))
df
    A B    C
  1 1 1  Mr.
  2 2 0  Mr.
  3 3 1 Mrs.
  4 4 0 Miss
  5 5 0  Mr.
  6 6 1 Mrs.

I would like:

    A B    C
  1 1 1  Mr.
  2 2 0 Mrs.
  3 3   Miss
  4 4   
  5 5    
  6 6   

Then I can:

write.csv(df, file = file.path(df, "df_No_Dupes.csv"), na="")

So I can use it as a reference for my next task.

0

3 Answers 3

1

read.csv and write.csv work best with tabular data. Your desired output is not a good example of this (every row does not have the same number of columns).

You can easily get all the unique value for your columns with

vals <- sapply(df, unique)

Then you'd be better off saving this object with save() and load() to preserve the list as an R object.

Sign up to request clarification or add additional context in comments.

Comments

1

Code snippet to work with a flexible number of columns, remove duplicate columns, and preserve column names:

require(rowr)

df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs."))

#get the number of columns in the dataframe
n <- ncol(df)

#loop through the columns
for(i in 1:ncol(df)){

  #replicate column i without duplicates, fill blanks with NAs
  df <-  cbind.fill(df,unique(df[,1]), fill = NA)
  #rename the new column
  colnames(df)[n+1] <- colnames(df)[1]
  #delete the old column
  df[,1] <- NULL
}

Comments

0
df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs."))


for(i in 1:ncol(df)){
  assign(paste("df_",i,sep=""), unique(df[,i]))
}

require(rowr)
df <- cbind.fill(df_1,df_2,df_3, fill = NA)
  V1 V1   V1
1  1  1  Mr.
2  2  0 Mrs.
3  3 NA Miss
4  4 NA <NA>
5  5 NA <NA>
6  6 NA <NA>

or you could do

require(rowr)
df <- cbind.fill(df_1,df_2,df_3, fill = "")
df
  V1 V1   V1
1  1  1  Mr.
2  2  0 Mrs.
3  3    Miss
4  4        
5  5        
6  6

If you want to avoid typing the name of each intermediate dataframe you can just use ls(pattern="df_") and get the objects named in that vector or use another loop.

If you want to change the column names back to their original values you can use:

colnames(output_df) <- colnames(input_df)

Then you can save the results however you, like, i.e.

saveRDS()

save()

or write it to a file.

Putting it all together:

df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs."))


for(i in 1:ncol(df)){
  assign(paste("df_",i,sep=""), unique(df[,i]))
}

require(rowr)
files     <- ls(pattern="df_")

df_output <- data.frame()
for(i in files){
  df_output <- cbind.fill(df_output, get(i), fill = "")
}

df_output <- df_output[,2:4] # fix extra colname from initialization
colnames(df_output) <- colnames(df)
write.csv(df_output, "df_out.csv",row.names = F)

verify_it_worked <- read.csv("df_out.csv")
verify_it_worked
  A  B    C
1 1  1  Mr.
2 2  0 Mrs.
3 3    Miss
4 4      
5 5      
6 6 

4 Comments

This works for the current data set however I sometimes have 100 or more columns so typing out df_1, df_2... would not work. So after the For loop when I have every column outputted into values can I run another loop to to grab every value starting with df_ and combine into 1 file? Also if the headers could be the original name that would be perfect.
@Trigs Yes, sure. You can also us ls() to get a list of the objects in your environment with a certain pattern, i.e. ls(pattern="df_"). If you want to change the colnames it's just colnames(output_df) <- colnames(input_df)
Looks like it is adding another column of NA so everything is shifted right. Column A is all NA. Also this is a huge help so thank you!
@Trigs Oh I see what you're saying. A weird result of how I created the object. I will add one line to fix it right now. You're welcome for the help. Please feel free to upvote if it was helpful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.