R apply a user-defined function to all rows of a dataframe

Question

I am struggling to make loop through the rows of a column in a dataframe and then use the current row to define arguments that will be used in a function. Here is the sample dataframe:

df <- 
structure(list(child = c("A268", "A268497", "A268497BOX", "A268497BOX2", 
"A268497BOX218", "A277", "A277A79", "A277A79091", "A277A790911", 
"A277A79091144", "A492", "A492586", "A492586BOX", "A492586BOX1", 
"A492586BOX144", "A492A69", "A492A69027", "A492A690271", "A492A69027144", 
"A492A6902715K", "A492A6902719Y", "A492A690271BH", "A492A690271BI", 
"A492A690271CQ", "A492A690271CS", "A492A690271CT", "A492A690271CU", 
"A492A690271CV", "A492A690271CW", "A492A690271CX", "A492A690271CY", 
"A492A690271DA", "A492A69028", "A492A690281", "A492A69028144", 
"A492A69402", "A492A694021", "A492A69402144", "A492A70", "A492A70033", 
"A492A700331", "A492A70033144", "A492A700332", "A492A70033244", 
"A492A70034", "A492A700341", "A492A70034144", "A492A70035", "A492A700351", 
"A492A70035144"), clvl = c(2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 
4, 5, 6, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 
5, 6, 4, 5, 6, 3, 4, 5, 6, 5, 6, 4, 5, 6, 4, 5, 6), parent = c("A", 
"A268", "A268497", "A268497BOX", "A268497BOX2", "A", "A277", 
"A277A79", "A277A79091", "A277A790911", "A", "A492", "A492586", 
"A492586BOX", "A492586BOX1", "A492", "A492A69", "A492A69027", 
"A492A690271", "A492A690271", "A492A690271", "A492A690271", "A492A690271", 
"A492A690271", "A492A690271", "A492A690271", "A492A690271", "A492A690271", 
"A492A690271", "A492A690271", "A492A690271", "A492A690271", "A492A69", 
"A492A69028", "A492A690281", "A492A69", "A492A69402", "A492A694021", 
"A492", "A492A70", "A492A70033", "A492A700331", "A492A70033", 
"A492A700332", "A492A70", "A492A70034", "A492A700341", "A492A70", 
"A492A70035", "A492A700351"), plvl = c(1, 2, 3, 4, 5, 1, 2, 3, 
4, 5, 1, 2, 3, 4, 5, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 3, 4, 5, 3, 4, 5, 2, 3, 4, 5, 4, 5, 3, 4, 5, 3, 4, 5
)), row.names = c(NA, 50L), class = "data.frame")

My goal is to generate this:

I tried to do this with a loop and using different versions of apply function inside the loop, but I could not get it right. Here, I need to define that x and y will be the child and pathString from the current row every time I iterate. Is there a clean and easy way to do this?

df[] <- apply(df,1,function(x,y) sub(x,y,x))

what is the logic to create pathString variable ?

YOLO
– YOLO

2020-01-26 09:13:10 +00:00
Commented Jan 26, 2020 at 9:13 — YOLO
– YOLO, Commented Jan 26, 2020 at 9:13

Ronak Shah · Accepted Answer · 2020-01-26 09:29:32Z

1

Assuming the number of characters in child (or pathString) would keep on increasing as shown in the data shared one way is to use purrr::accumulate which allows to take input from previous output and apply it by group.

library(dplyr)

df %>%
  group_by(gr = cumsum(c(TRUE, diff(nchar(child)) < 0))) %>%
  mutate(ans = purrr::accumulate(pathString, ~sub(".*(/.*)",paste0(.x, "\\1"),.y))) 

#   child         pathString        gr ans               
#   <chr>         <chr>          <int> <chr>             
# 1 A268          A/268              1 A/268             
# 2 A268497       A268/497           1 A/268/497         
# 3 A268497BOX    A268497/BOX        1 A/268/497/BOX     
# 4 A268497BOX2   A268497BOX/2       1 A/268/497/BOX/2   
# 5 A268497BOX218 A268497BOX2/18     1 A/268/497/BOX/2/18
# 6 A277          A/277              2 A/277             
# 7 A277A79       A277/A79           2 A/277/A79         
# 8 A277A79091    A277A79/091        2 A/277/A79/091     
# 9 A277A790911   A277A79091/1       2 A/277/A79/091/1   
#10 A277A79091144 A277A790911/44     2 A/277/A79/091/1/44

Kept the gr column of group in the final output to clarify how the groups are created.

We can implement the same logic in base R as well using Reduce

apply_fun <- function(x, y) sub(".*(/.*)", paste0(x, "\\1"), y)

df$ans <- with(df, ave(pathString, cumsum(c(TRUE, diff(nchar(child)) < 0)), 
FUN = function(x) Reduce(apply_fun, x, accumulate = TRUE)))

edited Jan 26, 2020 at 9:29

answered Jan 26, 2020 at 9:24

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ibo Over a year ago

so df must be sorted? The real df has more than 35k row, I will check your answer tomorrow and will get back to you

Ronak Shah Over a year ago

@Ibo Yes. This is what I came up with looking at the expected output. There isn't any logic shared on how to reach to output.

Ibo Over a year ago

I tried it with a more extensive data sample and it was not generating the right output. I actually went one step back and edited the data sample so that you can have access to both child and parent values with their level (not sure it can help) if you apply your answer you will see that where gr resets at any level that is not level 2 the forward slashes are not created properly, plus in some cases it adds segments from above rows while we are only allowed to add forward slashes to the values. This is to create a path so that I can create data.tree

Ibo Over a year ago

this is the original post that was not answered by anyone. Maybe there was a better way to get to the final answer, but I could get to here so far: stackoverflow.com/questions/59870536/…

Ibo Over a year ago

I managed to find a solution, but I am sure there is a smarter way!

Ibo · Accepted Answer · 2020-01-27 00:56:41Z

0

I managed to get it done using the following code block, but the loop takes 75-80 seconds, I guess there could be a faster way:

for(row in 1:nrow(df5)) {

  x=df5[row,2] #child
  y=df5[row,3] #pathString
  g=df5[row,c('gr')]

  df5$pathString[df5$gr==g] <- sub(x,y,df5$pathString[df5$gr==g])
  df5$child[df5$gr==g] <- sub(x,y,df5$child[df5$gr==g])

}

Note that gr was populated based on clvl=2:

library(zoo)
df4$gr <- ifelse(df4$clvl==2,df4$child,NA)
df4$gr <- na.locf(df4$gr)

and this is how df4 is made:

df4 <- sqldf("select  *, parent || replace(child,parent,'/') AS pathString FROM df ORDER BY child")

answered Jan 27, 2020 at 0:56

Ibo

4,3197 gold badges51 silver badges75 bronze badges

Collectives™ on Stack Overflow

R apply a user-defined function to all rows of a dataframe

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related