creating and populating new columns in a Dataframe based on values from existing columns

Question

I have a csv in this format:

Col1_Status Col1_Value  Col2_Status Col2_Value Col3_Status  Col3__Value
LOW             5           HIGH         5         LOW           5
LOW             8           HIGH         8         LOW           8
HIGH            82          HIGH         8         LOW           7
HIGH            83          NORMAL       8         LOW           7
HIGH            82          NORMAL       8         LOW           7

I want to create a new dataframe with the high and low as columns, for example:

Col1_High  Col1_Low Col2_High Col2_Low Col3_High Col3_Low
    82         5        5        NA        NA        5
    83         8        8        NA        NA        8
    82         NA       8        NA        NA        7
    NA         NA       NA       NA        NA        7
    NA         NA       NA       NA        NA        7

What is the best way to go about this?

So far I think:

#extract the Status Columns from original file into DataFrame
  statusDF <- ret[grepl("Status", colnames(ret))]

  #extract the Value Columns from original file into DataFrame
  originalValueDF <- ret[grepl("Value", colnames(ret))]

  #create new columns attribute_high and attribute_low
  for(i in names(originalValueDF)){
    newValueDF <- originalValueDF[[paste(i, 'High', sep = "_")]]
    newValueDF <- originalValueDF[[paste(i, 'Low', sep = "_")]]
  }

 #populate both columns based on value in attribute status column
 for(i in names(originalValueDF)){
    if (originalValueDF$i == "High"){
      temp <-  # stuck here
    }
  }

Any advise is appreciated

Col3_Low = c(5, 8) ... where is 7? What are your criteria? — Sotos
– Sotos, Commented Apr 18, 2017 at 11:20
sorry I just gave the the first two tuples as the desired output. The criteria is to look at the status column and extract that into a new column high or low. — ukbaz
– ukbaz, Commented Apr 18, 2017 at 11:22

Sotos · Accepted Answer · 2017-04-18 13:13:53Z

1

Here is an attempt with a lot of lapply. We first create a list (l1) which takes the values for each 'High' and 'Low' Status. However, the lengths of those vectors are different so we need to set them all equal to their max (in our case ind). We convert the vectors to matrices with 2 columns (high and Low) and use do.call with cbind to get the final dataframe.

l1 <- lapply(seq(1, ncol(df), by = 2), function(i) list(HIGH = df[i+1][df[i] == 'HIGH'],
                                                         LOW = df[i+1][df[i] == 'LOW']))
names(l1) <- paste0('Col', seq(length(l1)))

ind <- max(unlist(lapply(l1, function(i) lengths(i))))

do.call(cbind, lapply(lapply(l1, function(i) lapply(i, `length<-`, ind)), function(j)
                    setNames(data.frame(matrix(unlist(j), ncol = 2)), c('High', 'Low'))))

#  Col1.High Col1.Low Col2.High Col2.Low Col3.High Col3.Low
#1        82        5         5       NA        NA        5
#2        83        8         8       NA        NA        8
#3        82       NA         8       NA        NA        7
#4        NA       NA        NA       NA        NA        7
#5        NA       NA        NA       NA        NA        7

edited Apr 18, 2017 at 13:13

answered Apr 18, 2017 at 12:56

Sotos

51.6k6 gold badges36 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ukbaz Over a year ago

great thank you, would you mind explaining it - that seems quite complex

Andrew Lavers · Accepted Answer · 2017-04-18 15:24:50Z

ret <- read.table(text="
Col1_Status Col1_Value  Col2_Status Col2_Value Col3_Status  Col3__Value
LOW             5           HIGH         5         LOW           5
LOW             8           HIGH         8         LOW           8
HIGH            82          HIGH         8         LOW           7
HIGH            83          NORMAL       8         LOW           7
HIGH            82          NORMAL       8         LOW           7
", header = TRUE, stringsAsFactors = F)

# fix column headers
names(ret) <- gsub("(_+)", "_", names(ret))

library(stats)

# extract the column prefixes
prefixes <- unique(gsub("_.+", "", names(ret)))
value_names  <- names(ret[grepl("_Value",  names(ret))])
status_names <- names(ret[grepl("_Status", names(ret))])

library(stats)
# get the lwo values - extract the lows, pad with NA's and set the name to _High
high_values  <- sapply(1:length(prefixes),
                       function(i) {
                         result <- ret[which(ret[, status_names][i] == "HIGH"), value_names][[i]]
                         result[(length(result)+1):nrow(ret)+1] <- NA
                         setNames(list(foo = result[1:nrow(ret)]), paste0(prefixes[i], "_High"))})

# get the lwo values - extract the lows, pad with NA's and set the name to _Low
low_values  <- sapply(1:length(prefixes),
                      function(i) {
                        result <- ret[which(ret[, status_names][i] == "LOW"), value_names][[i]]
                        result[(length(result)+1):nrow(ret)+1] <- NA
                        setNames(list(foo = result[1:nrow(ret)]), paste0(prefixes[i], "_Low"))})

# combine
output <- cbind(data.frame(low_values), data.frame(high_values))

output

#   Col1_Low Col2_Low Col3_Low Col1_High Col2_High Col3_High
# 1        5       NA        5        82         5        NA
# 2        8       NA        8        83         8        NA
# 3       NA       NA        7        82         8        NA
# 4       NA       NA        7        NA        NA        NA
# 5       NA       NA        7        NA        NA        NA

Collectives™ on Stack Overflow

creating and populating new columns in a Dataframe based on values from existing columns

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related