I need to recode some data. Firstly, iImagine that the the original data looks something like this
A data.frame: 6 × 5
col1 col2 col3 col4 col5
<chr> <chr> <chr> <chr> <chr>
s1 414234 244575 539645 436236
s2 NA 512342 644252 835325
s3 NA NA 816747 475295
s4 NA NA NA 125429
s5 NA NA NA NA
s6 617465 844526 NA 194262
which, secondly, is transformed into
A data.frame: 6 × 5
col1 col2 col3 col4 col5
<chr> <int> <int> <int> <int>
s1 4 2 5 4
s2 NA 5 6 8
s3 NA NA 8 4
s4 NA NA NA 1
s5 NA NA NA NA
s6 6 8 NA 1
because I am going to recode everything according to the first digit. When, thirdly, recoded (see recoding pattern in MWE below) it should look like this
A data.frame: 6 × 5
col1 col2 col3 col4 col5
<chr> <int> <int> <int> <int>
s1 3 1 3 3
s2 NA 3 4 5
s3 NA NA 5 3
s4 NA NA NA 1
s5 NA NA NA NA
s6 4 5 NA 1
and, fourthly, entire rows should be removed if all columns except the first one is empty, that is
A data.frame: 6 × 5
col1 col2 col3 col4 col5
<chr> <int> <int> <int> <int>
s1 3 1 3 3
s2 NA 3 4 5
s3 NA NA 5 3
s4 NA NA NA 1
s6 4 5 NA 1
which is the ultimate data.
The first and second step were easily implemented but I struggle with the third and fourth step since I am new to R (see MWE below). For the third step, I tried to use mutate over multiple columns but Error in UseMethod("mutate"): no applicable method for 'mutate' applied to an object of class "c('integer', 'numeric')" appeared. The fourth step is easily implemented in Python with thresh but I am not sure if there is an equivalent in R.
How is this possible? Also, I work with huge data, so time-efficient solutions would also be highly appreciated.
library(dplyr)
df <- data.frame(
col1 = c("s1", "s2", "s3", "s4", "s5", "s6"),
col2 = c("414234", NA, NA, NA, NA, "617465"),
col3 = c("244575", "512342", NA, NA, NA, "844526"),
col4 = c("539645", "644252", "816747", NA, NA, NA),
col5 = c("436236", "835325", "475295", "125429", NA, "194262")
)
n = ncol(df)
for (i in colnames(df[2:n])) {
df[, i] = strtoi(substr(df[, i], 1, 1))
}
for (i in colnames(df[2:n])) {
df[, i] %>% mutate(i=recode(i, "0": 1, "1": 1, "2": 1, "3": 2, "4": 3, "5": 3, "6": 4, "7": 5, "8": 5))
}