4

I am new to R, and was not able to search answers for the specific problem I have encountered.

If my dataframe looks like below:

d <- data.frame(Name = c("Jon", "Jon", "Jon", "Kel", "Kel", "Kel", "Don", "Don", "Don"),
             No1 = c(1,2,3,1,1,1,3,3,3),
             No2 = c(1,1,1,2,2,2,3,3,3))

Name No1 No2
Jon   1   1
Jon   2   1
Jon   3   1
Kel   1   2
Kel   1   2
Kel   1   2
Don   3   3
Don   3   3
Don   3   3 
...

How would I add be able to add new columns to the dataframe, where the columns would indicate the unique values in column No1 and No2: which would be (1,2,3), (1,2), (3) for John, Kelly, Don, respectively

So, if the new columns are named ID#, The desired results should be

d2 <- data.frame(Name = c("Jon", "Jon", "Jon", "Kel", "Kel", "Kel", "Don", "Don", "Don"),
          No1 = c(1,2,3,1,1,1,3,3,3),
          No2 = c(1,1,1,2,2,2,3,3,3),
          ID1 = c(1,1,1,1,1,1,3,3,3),
          ID2 = c(2,2,2,2,2,2,NA,NA,NA),
          ID3 = c(3,3,3,NA,NA,NA,NA,NA,NA))

Name No1 No2 ID1 ID2 ID3
Jon   1   1   1   2   3 
Jon   2   1   1   2   3
Jon   3   1   1   2   3 
Kel   1   2   1   2   NA
Kel   1   2   1   2   NA
Kel   1   2   1   2   NA
Don   3   3   3   NA  NA
Don   3   3   3   NA  NA
Don   3   3   3   NA  NA
0

3 Answers 3

5

A tidyverse approach:

library(dplyr)
library(tidyr)

      # evaluate separately for each name
d %>% group_by(Name) %>% 
    # add a column of the unique values pasted together into a string
    mutate(ID = paste(unique(c(No1, No2)), collapse = ' ')) %>% 
    # separate the string into individual columns, filling with NA and converting to numbers
    separate(ID, into = paste0('ID', 1:3), fill = 'right', convert = TRUE)

## Source: local data frame [9 x 6]
## Groups: Name [3]
## 
##     Name   No1   No2   ID1   ID2   ID3
## * <fctr> <dbl> <dbl> <int> <int> <int>
## 1    Jon     1     1     1     2     3
## 2    Jon     2     1     1     2     3
## 3    Jon     3     1     1     2     3
## 4    Kel     1     2     1     2    NA
## 5    Kel     1     2     1     2    NA
## 6    Kel     1     2     1     2    NA
## 7    Don     3     3     3    NA    NA
## 8    Don     3     3     3    NA    NA
## 9    Don     3     3     3    NA    NA

Here's a nice base version with a basic split-apply-combine approach:

# store distinct values in No1 and No2
cols <- unique(unlist(d[,-1]))
                           # split No1 and No2 by Name,
ids <- data.frame(t(sapply(split(d[,-1], d$Name), 
                           # find unique values for each split,
                           function(x){y <- unique(unlist(x))
                                       # pad with NAs,
                                       c(y, rep(NA, length(cols) - length(y)))
                           # and return a data.frame
                           }))) 
# fix column names
names(ids) <- paste0('ID', cols)
# turn rownames into column
ids$Name <- rownames(ids)
# join two data.frames on Name columns
merge(d, ids, sort = FALSE)

##   Name No1 No2 ID1 ID2 ID3
## 1  Jon   1   1   1   2   3
## 2  Jon   2   1   1   2   3
## 3  Jon   3   1   1   2   3
## 4  Kel   1   2   1   2  NA
## 5  Kel   1   2   1   2  NA
## 6  Kel   1   2   1   2  NA
## 7  Don   3   3   3  NA  NA
## 8  Don   3   3   3  NA  NA
## 9  Don   3   3   3  NA  NA

And just for kicks, here's a creative alternate base version that leverages table instead of splitting/grouping:

# copy d so as not to distort original with factor columns
d_f <- d
# make No* columns factors to ensure similar table structure
d_f[, -1] <- lapply(d[,-1], factor, levels = unique(unlist(d[, -1])))
# make tables of cols, sum to aggregate occurrences, and set as boolean mask for > 0
tab <- Reduce(`+`, lapply(d_f[, -1], table, d_f$Name)) > 0
# replace all TRUE values with values they tabulated
tab <- tab * matrix(as.integer(rownames(tab)), nrow = nrow(tab), ncol = ncol(tab))
# replace 0s with NAs
tab[tab == 0] <- NA
# store column names
cols <- paste0('ID', rownames(tab))
# sort each row, keeping NAs
tab <- data.frame(t(apply(tab, 2, sort, na.last = T)))
# apply stored column names
names(tab) <- cols
# turn rownames into column
tab$Name <- rownames(tab)
# join two data.frames on Name columns
merge(d, tab, sort = FALSE)

Results are identical.

Sign up to request clarification or add additional context in comments.

Comments

4
library(dplyr)
library(tidyr)
d %>% 
  group_by(Name) %>%
  mutate(unique_id = paste0(unique(c(No1, No2)), collapse = ",")) %>%
  separate(., unique_id, paste0("id_", 1:max(c(.$No1, .$No2))), fill = "right")

Comments

2

We can use a single external package i.e. data.table and get the output. Convert the 'data.frame' to 'data.table' (setDT(d)), grouped by 'Name', we unlist the columns mentioned in the .SDcols, get the unique values, and dcast from 'long' to 'wide' format, do a join with the original dataset on the "Name" column.

library(data.table)
dcast(setDT(d)[, unique(unlist(.SD)) , Name, .SDcols = No1:No2],
      Name~paste0("ID", rowid(Name)), value.var="V1")[d, on = "Name"]
#   Name ID1 ID2 ID3 No1 No2
#1:  Jon   1   2   3   1   1
#2:  Jon   1   2   3   2   1
#3:  Jon   1   2   3   3   1
#4:  Kel   1   2  NA   1   2
#5:  Kel   1   2  NA   1   2
#6:  Kel   1   2  NA   1   2
#7:  Don   3  NA  NA   3   3
#8:  Don   3  NA  NA   3   3
#9:  Don   3  NA  NA   3   3

Or this can be done in one-line by first pasteing the unique elements in 'No1' and 'No2', grouped by 'Name', and then split it to three columns by using cSplit from splitstackshape.

library(splitstackshape)
cSplit(setDT(d)[, ID:= paste(unique(c(No1, No2)), collapse=" ") , Name], "ID", " ")
#   Name No1 No2 ID_1 ID_2 ID_3
#1:  Jon   1   1    1    2    3
#2:  Jon   2   1    1    2    3
#3:  Jon   3   1    1    2    3
#4:  Kel   1   2    1    2   NA
#5:  Kel   1   2    1    2   NA
#6:  Kel   1   2    1    2   NA
#7:  Don   3   3    3   NA   NA
#8:  Don   3   3    3   NA   NA
#9:  Don   3   3    3   NA   NA

Or using the baseVerse just for kicks

d1 <- read.table(text=ave(unlist(d[-1]), rep(d$Name, 2), 
      FUN = function(x) paste(unique(x), collapse=" "))[1:nrow(d)], 
      header=FALSE, fill=TRUE, col.names= paste0("ID", 1:3))
cbind(d, d1)
#  Name No1 No2 ID1 ID2 ID3
#1  Jon   1   1   1   2   3
#2  Jon   2   1   1   2   3
#3  Jon   3   1   1   2   3
#4  Kel   1   2   1   2  NA
#5  Kel   1   2   1   2  NA
#6  Kel   1   2   1   2  NA
#7  Don   3   3   3  NA  NA
#8  Don   3   3   3  NA  NA
#9  Don   3   3   3  NA  NA

NOTE: No packages used and without much effort in splitting.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.