4

I have the following data frame (example):

myfile <- data.frame(C1=c(1,3,4,5),
                     C2=c(5,4,6,7),
                     C3=c(0,1,3,2),
                     C1_A=c(NA,NA,1,2),
                     C2_A=c(NA,9,8,7),
                     C3_A=c(NA,NA,NA,1))

I would like to replace all NA values under the last 3 "_A" columns with the respective same row value from columns C1 to C3. for example C1_A to be 1,3,1,2

I tried the following line

myfile <- myfile %>% mutate(across(c(C1_A:C3_A), ~ if_else(is.na(.)==TRUE, eval(parse(text=str_replace(., "_A", ""))), .)))

but is not working and returns the bottom row value of the _A columns. Also tried it with the rowwise dplyr option, but still no success.

My real dataset has several columns like the example, so doesn't make sense to mutate each individually. How best to resolve this?

1
  • 2
    Does first set of complete columns have a matching set of incomplete columns and there nothing else in that frame? Or in other words, can we ignore columns names and work only with positions, like included example might suggest? Commented Oct 8 at 10:43

2 Answers 2

5

An option with tidyverse:

myfile %>%
 mutate(across(ends_with("_A"), ~ if_else(is.na(.), get(str_remove(cur_column(), "_A")), .)))

  C1 C2 C3 C1_A C2_A C3_A
1  1  5  0    1    5    0
2  3  4  1    3    9    1
3  4  6  3    1    8    3
4  5  7  2    2    7    1
Sign up to request clarification or add additional context in comments.

3 Comments

Great, thanks for the answer this works. It is actually the 'cur_column()' bit I was doing wrong, as now my line with eval/parse also works when adding the cur column.
would pick()/pull() be more verse-esque?
Since this is solely about NA, I think coalesce may be a more natural fit than if_else, using mutate(myfile, across(ends_with("_A"), ~ coalesce(.x, pick(sub("_A$", "", cur_column()))[[1]])))
4

If there's a set of complete columns followed by a matching set of incomplete columns, we could naively locate NA indices (1), get matching source / patch value indices by subtracting number of columns in a set from index col (2) and update NA locations (3):

myfile <- data.frame(C1=c(1,3,4,5),
                     C2=c(5,4,6,7),
                     C3=c(0,1,3,2),
                     C1_A=c(NA,NA,1,2),
                     C2_A=c(NA,9,8,7),
                     C3_A=c(NA,NA,NA,1))
# 1 - get NA locations
( na_idx <- src_idx <- which(is.na(myfile), arr.ind = TRUE) )
#>      row col
#> [1,]   1   4
#> [2,]   2   4
#> [3,]   1   5
#> [4,]   1   6
#> [5,]   2   6
#> [6,]   3   6

# 2 - update index col
src_idx[,2] <- src_idx[,2] - 3
src_idx
#>      row col
#> [1,]   1   1
#> [2,]   2   1
#> [3,]   1   2
#> [4,]   1   3
#> [5,]   2   3
#> [6,]   3   3

# 3 - update values
myfile[na_idx] <- myfile[src_idx]
myfile
#>   C1 C2 C3 C1_A C2_A C3_A
#> 1  1  5  0    1    5    0
#> 2  3  4  1    3    9    1
#> 3  4  6  3    1    8    3
#> 4  5  7  2    2    7    1

Created on 2025-10-08 with reprex v2.1.1

1 Comment

It might be fragile to always assume that the order and offset of columns is fixed, I tend to recommend using names instead. While not code-golf shorter, this is more robust to the order of columns, all in base-R: Acols <- grep("_A$", names(myfile), value=TRUE); myfile[,Acols] <- Map(function(a, b) { b[is.na(b)] <- a[is.na(b)]; b; }, myfile[grep("_A$", names(myfile), value=TRUE)], myfile[sub("_A$", "", grep("_A$", names(myfile), value=TRUE))])

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.