2

I have the following for loop script:

# Create example data
dataKM <- data.frame(x1 = 1:5,    
                     x2 = 6:10,
                     x3 = 11:15)
# Duplicate dataframe
datatest <- dataKM[c(1:3)]

# for loop
for(i in colnames(dataKM[,2:ncol(dataKM)])) {
  # median of each single column of dataframe
  median <- median(dataKM[,i])
  # add column in duplicated dataframe with 'High' or 'low' based on median for each column
  datatest$median[dataKM[,i] <= median ] <- "Low"
  datatest$median[dataKM[,i] > median ] <- "High"
}

I'm trying to repeat for loop for each column of dataKM dataframe and save results as column in dataset dataframe. My script save only the last iteration. Probably I get a single output because I overwrite the previous value on each pass in the loop. I'd like to know how I can save all for loop output in their respective column. Can anyone help me? Thank you so much, I hope this can be useful even for someone else trying to do something similar.

3 Answers 3

2

We can just use lapply function

datatest <- dataKM[c(2:3)]
datatest[] <- lapply(dataKM[-1] , function(x) ifelse(x <= median(x) , "Low" , "High"))

colnames(datatest) <- c("x2Median" , "x3Median")

cbind(dataKM , datatest)

  • output
  x1 x2 x3  x2Median x3Median
1  1  6 11      Low      Low
2  2  7 12      Low      Low
3  3  8 13      Low      Low
4  4  9 14      High     High
5  5 10 15      High     High

If you insist using for loop try this

datatest <- dataKM[c(1:3)]

for(i in colnames(dataKM[-1])) {
    median <- median(dataKM[,i])
    datatest[[paste0(i,"median")]][dataKM[,i] <= median ] <- "Low"
    datatest[[paste0(i,"median")]][dataKM[,i] > median ] <- "High"
}

Sign up to request clarification or add additional context in comments.

8 Comments

First column column also?
I think OP said "for loop for each column of dataKM dataframe" !
I think, colnames(dataKM[,2:ncol(dataKM)]) means 2 to last column!
Yes, colnames (dataKM [, 2: ncol (dataKM)]) means 2 up to the last column. The first column in my real dataset is a numeric column, but I don't need to calculate the median of it.
The dplyr approach is @TarJae solution , pls add this comment after his solution .
|
1

I am not sure what is compared with what. But here is an example were x2 value or x3 value is compared with its column median:

Here is a dplyr approach:

library(dplyr)

dataKM %>% 
  mutate(across(-1, ~case_when(. <= median(., na.rm=TRUE) ~ "Low",
                               . > median(., nar.rm=TRUE) ~ "High"), .names = "Median_{.col}"))
  x1 x2 x3 Median_x2 Median_x3
1  1  6 11       Low       Low
2  2  7 12       Low       Low
3  3  8 13       Low       Low
4  4  9 14      High      High
5  5 10 15      High      High

3 Comments

Your dplyr approach is exactly what I need for my analysis. Thank you very much for your help.
And may I ask, why you then unaccepted the answer? It is ok for me just wondering?
Sorry, I accidentally took it off
0

Currently, you are updating a single new column, median. Simply adjust to create new median column with each iteration of for loop, concatenating the column current column name and median.

# for loop
for(col in colnames(dataKM[,2:ncol(dataKM)])) {
  curr_col <- dataKM[[col]]
  # median of each single column of dataframe
  col_median <- median(curr_col)

  # add column in duplicated dataframe with 'High' or 'low' based on median for each column
  datatest[[paste0(col, "_median")]][curr_col <= col_median] <- "Low"
  datatest[[paste0(col, "_median")]][curr_col > col_median] <- "High"
}

Alternatively, with ifelse:

for(col in colnames(dataKM[,2:ncol(dataKM)])) {
  curr_col <- dataKM[[col]]
  col_median <- median(curr_col)

  datatest[[paste0(col, "_median")]] <- ifelse(
    curr_col <= col_median, "Low", " High"
  )
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.