R: Parse data.frame string variable into multiple variables

Question

I have data gathered through Amazon's Mechnical Turk that has a column vector called "LifeTimeApprovalRate". The column contains information

head(ES$LifetimeApprovalRate)
[1] [1] "100% (32/32)" "50% (16/32)" "100% (11/11)" "100% (4/4)"`

I would like to create three new variables using this information:

 ES$rate: "100%" "50%" "100%" "100%" 
 ES$approve: "32" "16" "11" "4"
 ES$total: "32" "32" "11" "4"

I am afraid just about anything I try creates these monstrous lists which are difficult to manage into anything useful.

joran · Accepted Answer · 2015-06-24 14:41:00Z

4

tidyr's separate is also handy for this sort of thing:

library(tidyr)
> dat <- data.frame(x = 1:4,y = c("100% (32/32)", "50% (16/32)", "100% (11/11)", "100% (4/4)"))
> separate(dat,y,c("rate","approve","total"),sep = "[()/ ]+",extra = "drop")
  x rate approve total
1 1 100%      32    32
2 2  50%      16    32
3 3 100%      11    11
4 4 100%       4     4

answered Jun 24, 2015 at 14:41

joran

175k34 gold badges439 silver badges484 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akrun · Accepted Answer · 2015-06-24 14:47:19Z

You can try strsplit

  nm1 <- c('rate', 'approve', 'total')
  ES[nm1] <- do.call(rbind,
             strsplit(as.character(ES$LifetimeApprovalRate),'[()/ ]+'))

  ES[nm1[-1]] <- lapply(ES[nm1[-1]], as.numeric) 
  ES
  #    LifetimeApprovalRate rate approve total
  #1         100% (32/32) 100%      32    32
  #2          50% (16/32)  50%      16    32
  #3         100% (11/11) 100%      11    11
  #4           100% (4/4) 100%       4     4

A similar option using the devel version of data.table i.e. v1.9.5 is below. Instructions to install the devel version are here. Here, we use tstrsplit to split the column 'LifetimeApprovalRate' and assign the output columns to new columns ('nm1'). There is also option type.convert=TRUE to convert the column classes.

 library(data.table)#v1.9.5+
 setDT(ES)[, (nm1):=tstrsplit(LifetimeApprovalRate,'[()/ ]+', type.convert=TRUE)]
 #   LifetimeApprovalRate rate approve total
 #1:         100% (32/32) 100%      32    32
 #2:          50% (16/32)  50%      16    32
 #3:         100% (11/11) 100%      11    11
 #4:           100% (4/4) 100%       4     4

data

 ES <-  structure(list(LifetimeApprovalRate = structure(c(2L, 4L, 1L, 
 3L), .Label = c("100% (11/11)", "100% (32/32)", "100% (4/4)", 
 "50% (16/32)"), class = "factor")), .Names = "LifetimeApprovalRate",
 row.names = c(NA, -4L), class = "data.frame")

Collectives™ on Stack Overflow

R: Parse data.frame string variable into multiple variables

2 Answers 2

Comments

data

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

data

Comments

Your Answer

Sign up or log in

Post as a guest

Related