Add a column to a dataframe using (extracting unique values) from existing columns

Question

I am new to R, and was not able to search answers for the specific problem I have encountered.

If my dataframe looks like below:

d <- data.frame(Name = c("Jon", "Jon", "Jon", "Kel", "Kel", "Kel", "Don", "Don", "Don"),
             No1 = c(1,2,3,1,1,1,3,3,3),
             No2 = c(1,1,1,2,2,2,3,3,3))

Name No1 No2
Jon   1   1
Jon   2   1
Jon   3   1
Kel   1   2
Kel   1   2
Kel   1   2
Don   3   3
Don   3   3
Don   3   3 
...

How would I add be able to add new columns to the dataframe, where the columns would indicate the unique values in column No1 and No2: which would be (1,2,3), (1,2), (3) for John, Kelly, Don, respectively

So, if the new columns are named ID#, The desired results should be

d2 <- data.frame(Name = c("Jon", "Jon", "Jon", "Kel", "Kel", "Kel", "Don", "Don", "Don"),
          No1 = c(1,2,3,1,1,1,3,3,3),
          No2 = c(1,1,1,2,2,2,3,3,3),
          ID1 = c(1,1,1,1,1,1,3,3,3),
          ID2 = c(2,2,2,2,2,2,NA,NA,NA),
          ID3 = c(3,3,3,NA,NA,NA,NA,NA,NA))

Name No1 No2 ID1 ID2 ID3
Jon   1   1   1   2   3 
Jon   2   1   1   2   3
Jon   3   1   1   2   3 
Kel   1   2   1   2   NA
Kel   1   2   1   2   NA
Kel   1   2   1   2   NA
Don   3   3   3   NA  NA
Don   3   3   3   NA  NA
Don   3   3   3   NA  NA

alistaire · Accepted Answer · 2016-07-18 04:35:59Z

A tidyverse approach:

library(dplyr)
library(tidyr)

      # evaluate separately for each name
d %>% group_by(Name) %>% 
    # add a column of the unique values pasted together into a string
    mutate(ID = paste(unique(c(No1, No2)), collapse = ' ')) %>% 
    # separate the string into individual columns, filling with NA and converting to numbers
    separate(ID, into = paste0('ID', 1:3), fill = 'right', convert = TRUE)

## Source: local data frame [9 x 6]
## Groups: Name [3]
## 
##     Name   No1   No2   ID1   ID2   ID3
## * <fctr> <dbl> <dbl> <int> <int> <int>
## 1    Jon     1     1     1     2     3
## 2    Jon     2     1     1     2     3
## 3    Jon     3     1     1     2     3
## 4    Kel     1     2     1     2    NA
## 5    Kel     1     2     1     2    NA
## 6    Kel     1     2     1     2    NA
## 7    Don     3     3     3    NA    NA
## 8    Don     3     3     3    NA    NA
## 9    Don     3     3     3    NA    NA

Here's a nice base version with a basic split-apply-combine approach:

# store distinct values in No1 and No2
cols <- unique(unlist(d[,-1]))
                           # split No1 and No2 by Name,
ids <- data.frame(t(sapply(split(d[,-1], d$Name), 
                           # find unique values for each split,
                           function(x){y <- unique(unlist(x))
                                       # pad with NAs,
                                       c(y, rep(NA, length(cols) - length(y)))
                           # and return a data.frame
                           }))) 
# fix column names
names(ids) <- paste0('ID', cols)
# turn rownames into column
ids$Name <- rownames(ids)
# join two data.frames on Name columns
merge(d, ids, sort = FALSE)

##   Name No1 No2 ID1 ID2 ID3
## 1  Jon   1   1   1   2   3
## 2  Jon   2   1   1   2   3
## 3  Jon   3   1   1   2   3
## 4  Kel   1   2   1   2  NA
## 5  Kel   1   2   1   2  NA
## 6  Kel   1   2   1   2  NA
## 7  Don   3   3   3  NA  NA
## 8  Don   3   3   3  NA  NA
## 9  Don   3   3   3  NA  NA

And just for kicks, here's a creative alternate base version that leverages table instead of splitting/grouping:

# copy d so as not to distort original with factor columns
d_f <- d
# make No* columns factors to ensure similar table structure
d_f[, -1] <- lapply(d[,-1], factor, levels = unique(unlist(d[, -1])))
# make tables of cols, sum to aggregate occurrences, and set as boolean mask for > 0
tab <- Reduce(`+`, lapply(d_f[, -1], table, d_f$Name)) > 0
# replace all TRUE values with values they tabulated
tab <- tab * matrix(as.integer(rownames(tab)), nrow = nrow(tab), ncol = ncol(tab))
# replace 0s with NAs
tab[tab == 0] <- NA
# store column names
cols <- paste0('ID', rownames(tab))
# sort each row, keeping NAs
tab <- data.frame(t(apply(tab, 2, sort, na.last = T)))
# apply stored column names
names(tab) <- cols
# turn rownames into column
tab$Name <- rownames(tab)
# join two data.frames on Name columns
merge(d, tab, sort = FALSE)

Results are identical.

Maiasaura · Accepted Answer · 2016-07-18 02:35:18Z

4

library(dplyr)
library(tidyr)
d %>% 
  group_by(Name) %>%
  mutate(unique_id = paste0(unique(c(No1, No2)), collapse = ",")) %>%
  separate(., unique_id, paste0("id_", 1:max(c(.$No1, .$No2))), fill = "right")

edited Jul 18, 2016 at 2:35

answered Jul 18, 2016 at 2:22

Maiasaura

33.1k29 gold badges108 silver badges112 bronze badges

Comments

akrun · Accepted Answer · 2016-07-18 04:50:51Z

We can use a single external package i.e. data.table and get the output. Convert the 'data.frame' to 'data.table' (setDT(d)), grouped by 'Name', we unlist the columns mentioned in the .SDcols, get the unique values, and dcast from 'long' to 'wide' format, do a join with the original dataset on the "Name" column.

library(data.table)
dcast(setDT(d)[, unique(unlist(.SD)) , Name, .SDcols = No1:No2],
      Name~paste0("ID", rowid(Name)), value.var="V1")[d, on = "Name"]
#   Name ID1 ID2 ID3 No1 No2
#1:  Jon   1   2   3   1   1
#2:  Jon   1   2   3   2   1
#3:  Jon   1   2   3   3   1
#4:  Kel   1   2  NA   1   2
#5:  Kel   1   2  NA   1   2
#6:  Kel   1   2  NA   1   2
#7:  Don   3  NA  NA   3   3
#8:  Don   3  NA  NA   3   3
#9:  Don   3  NA  NA   3   3

Or this can be done in one-line by first pasteing the unique elements in 'No1' and 'No2', grouped by 'Name', and then split it to three columns by using cSplit from splitstackshape.

library(splitstackshape)
cSplit(setDT(d)[, ID:= paste(unique(c(No1, No2)), collapse=" ") , Name], "ID", " ")
#   Name No1 No2 ID_1 ID_2 ID_3
#1:  Jon   1   1    1    2    3
#2:  Jon   2   1    1    2    3
#3:  Jon   3   1    1    2    3
#4:  Kel   1   2    1    2   NA
#5:  Kel   1   2    1    2   NA
#6:  Kel   1   2    1    2   NA
#7:  Don   3   3    3   NA   NA
#8:  Don   3   3    3   NA   NA
#9:  Don   3   3    3   NA   NA

Or using the baseVerse just for kicks

d1 <- read.table(text=ave(unlist(d[-1]), rep(d$Name, 2), 
      FUN = function(x) paste(unique(x), collapse=" "))[1:nrow(d)], 
      header=FALSE, fill=TRUE, col.names= paste0("ID", 1:3))
cbind(d, d1)
#  Name No1 No2 ID1 ID2 ID3
#1  Jon   1   1   1   2   3
#2  Jon   2   1   1   2   3
#3  Jon   3   1   1   2   3
#4  Kel   1   2   1   2  NA
#5  Kel   1   2   1   2  NA
#6  Kel   1   2   1   2  NA
#7  Don   3   3   3  NA  NA
#8  Don   3   3   3  NA  NA
#9  Don   3   3   3  NA  NA

NOTE: No packages used and without much effort in splitting.

Collectives™ on Stack Overflow

Add a column to a dataframe using (extracting unique values) from existing columns

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related