2

I have a csv in this format:

Col1_Status Col1_Value  Col2_Status Col2_Value Col3_Status  Col3__Value
LOW             5           HIGH         5         LOW           5
LOW             8           HIGH         8         LOW           8
HIGH            82          HIGH         8         LOW           7
HIGH            83          NORMAL       8         LOW           7
HIGH            82          NORMAL       8         LOW           7

I want to create a new dataframe with the high and low as columns, for example:

Col1_High  Col1_Low Col2_High Col2_Low Col3_High Col3_Low
    82         5        5        NA        NA        5
    83         8        8        NA        NA        8
    82         NA       8        NA        NA        7
    NA         NA       NA       NA        NA        7
    NA         NA       NA       NA        NA        7

What is the best way to go about this?

So far I think:

#extract the Status Columns from original file into DataFrame
  statusDF <- ret[grepl("Status", colnames(ret))]

  #extract the Value Columns from original file into DataFrame
  originalValueDF <- ret[grepl("Value", colnames(ret))]

  #create new columns attribute_high and attribute_low
  for(i in names(originalValueDF)){
    newValueDF <- originalValueDF[[paste(i, 'High', sep = "_")]]
    newValueDF <- originalValueDF[[paste(i, 'Low', sep = "_")]]
  }

 #populate both columns based on value in attribute status column
 for(i in names(originalValueDF)){
    if (originalValueDF$i == "High"){
      temp <-  # stuck here
    }
  }

Any advise is appreciated

3
  • Col3_Low = c(5, 8) ... where is 7? What are your criteria? Commented Apr 18, 2017 at 11:20
  • sorry I just gave the the first two tuples as the desired output. The criteria is to look at the status column and extract that into a new column high or low. Commented Apr 18, 2017 at 11:22
  • have updated the output dataframe Commented Apr 18, 2017 at 11:25

2 Answers 2

1

Here is an attempt with a lot of lapply. We first create a list (l1) which takes the values for each 'High' and 'Low' Status. However, the lengths of those vectors are different so we need to set them all equal to their max (in our case ind). We convert the vectors to matrices with 2 columns (high and Low) and use do.call with cbind to get the final dataframe.

l1 <- lapply(seq(1, ncol(df), by = 2), function(i) list(HIGH = df[i+1][df[i] == 'HIGH'],
                                                         LOW = df[i+1][df[i] == 'LOW']))
names(l1) <- paste0('Col', seq(length(l1)))

ind <- max(unlist(lapply(l1, function(i) lengths(i))))

do.call(cbind, lapply(lapply(l1, function(i) lapply(i, `length<-`, ind)), function(j)
                    setNames(data.frame(matrix(unlist(j), ncol = 2)), c('High', 'Low'))))

#  Col1.High Col1.Low Col2.High Col2.Low Col3.High Col3.Low
#1        82        5         5       NA        NA        5
#2        83        8         8       NA        NA        8
#3        82       NA         8       NA        NA        7
#4        NA       NA        NA       NA        NA        7
#5        NA       NA        NA       NA        NA        7
Sign up to request clarification or add additional context in comments.

1 Comment

great thank you, would you mind explaining it - that seems quite complex
0
ret <- read.table(text="
Col1_Status Col1_Value  Col2_Status Col2_Value Col3_Status  Col3__Value
LOW             5           HIGH         5         LOW           5
LOW             8           HIGH         8         LOW           8
HIGH            82          HIGH         8         LOW           7
HIGH            83          NORMAL       8         LOW           7
HIGH            82          NORMAL       8         LOW           7
", header = TRUE, stringsAsFactors = F)

# fix column headers
names(ret) <- gsub("(_+)", "_", names(ret))

library(stats)

# extract the column prefixes
prefixes <- unique(gsub("_.+", "", names(ret)))
value_names  <- names(ret[grepl("_Value",  names(ret))])
status_names <- names(ret[grepl("_Status", names(ret))])

library(stats)
# get the lwo values - extract the lows, pad with NA's and set the name to _High
high_values  <- sapply(1:length(prefixes),
                       function(i) {
                         result <- ret[which(ret[, status_names][i] == "HIGH"), value_names][[i]]
                         result[(length(result)+1):nrow(ret)+1] <- NA
                         setNames(list(foo = result[1:nrow(ret)]), paste0(prefixes[i], "_High"))})

# get the lwo values - extract the lows, pad with NA's and set the name to _Low
low_values  <- sapply(1:length(prefixes),
                      function(i) {
                        result <- ret[which(ret[, status_names][i] == "LOW"), value_names][[i]]
                        result[(length(result)+1):nrow(ret)+1] <- NA
                        setNames(list(foo = result[1:nrow(ret)]), paste0(prefixes[i], "_Low"))})

# combine
output <- cbind(data.frame(low_values), data.frame(high_values))

output

#   Col1_Low Col2_Low Col3_Low Col1_High Col2_High Col3_High
# 1        5       NA        5        82         5        NA
# 2        8       NA        8        83         8        NA
# 3       NA       NA        7        82         8        NA
# 4       NA       NA        7        NA        NA        NA
# 5       NA       NA        7        NA        NA        NA

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.