1

I have data gathered through Amazon's Mechnical Turk that has a column vector called "LifeTimeApprovalRate". The column contains information

head(ES$LifetimeApprovalRate)
[1] [1] "100% (32/32)" "50% (16/32)" "100% (11/11)" "100% (4/4)"`

I would like to create three new variables using this information:

 ES$rate: "100%" "50%" "100%" "100%" 
 ES$approve: "32" "16" "11" "4"
 ES$total: "32" "32" "11" "4"

I am afraid just about anything I try creates these monstrous lists which are difficult to manage into anything useful.

0

2 Answers 2

4

tidyr's separate is also handy for this sort of thing:

library(tidyr)
> dat <- data.frame(x = 1:4,y = c("100% (32/32)", "50% (16/32)", "100% (11/11)", "100% (4/4)"))
> separate(dat,y,c("rate","approve","total"),sep = "[()/ ]+",extra = "drop")
  x rate approve total
1 1 100%      32    32
2 2  50%      16    32
3 3 100%      11    11
4 4 100%       4     4
Sign up to request clarification or add additional context in comments.

Comments

4

You can try strsplit

  nm1 <- c('rate', 'approve', 'total')
  ES[nm1] <- do.call(rbind,
             strsplit(as.character(ES$LifetimeApprovalRate),'[()/ ]+'))

  ES[nm1[-1]] <- lapply(ES[nm1[-1]], as.numeric) 
  ES
  #    LifetimeApprovalRate rate approve total
  #1         100% (32/32) 100%      32    32
  #2          50% (16/32)  50%      16    32
  #3         100% (11/11) 100%      11    11
  #4           100% (4/4) 100%       4     4

A similar option using the devel version of data.table i.e. v1.9.5 is below. Instructions to install the devel version are here. Here, we use tstrsplit to split the column 'LifetimeApprovalRate' and assign the output columns to new columns ('nm1'). There is also option type.convert=TRUE to convert the column classes.

 library(data.table)#v1.9.5+
 setDT(ES)[, (nm1):=tstrsplit(LifetimeApprovalRate,'[()/ ]+', type.convert=TRUE)]
 #   LifetimeApprovalRate rate approve total
 #1:         100% (32/32) 100%      32    32
 #2:          50% (16/32)  50%      16    32
 #3:         100% (11/11) 100%      11    11
 #4:           100% (4/4) 100%       4     4

data

 ES <-  structure(list(LifetimeApprovalRate = structure(c(2L, 4L, 1L, 
 3L), .Label = c("100% (11/11)", "100% (32/32)", "100% (4/4)", 
 "50% (16/32)"), class = "factor")), .Names = "LifetimeApprovalRate",
 row.names = c(NA, -4L), class = "data.frame")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.