0

I have the following tibble in R:

df <- tibble(desc=c("test1", "test2", "test3", "test4","test1"), code=c("X00.2", "Y10", "X20.234", "Z10", "Q23.2"))

I want to create a new dataframe as:

df <- tibble(desc=c("test1", "test1", "test2", "test3", "test3", "test3", "test3", "test4", "test1", "test1"), code=c("X00", "X00.2", "Y10", "X20", "X20.2", "X20.23", "X20.234", "Z10", "Q23", "Q23.2"))

How would I do this? I think I can do it with separate_rows in dplyr by manipulating the separator but not exactly sure.

Thank you in advance.

1
  • Of course I mispecified exactly what I wanted. editing the data frames. Commented Aug 29, 2020 at 15:46

1 Answer 1

2

Here is one way using tidyverse functions.

library(tidyverse)

df %>%
  #n is the number of new rows to add
  mutate(n = nchar(sub('.*\\.', '', code)) + 1, 
         #l is location of "."
         l = str_locate(code, '\\.')[, 1], 
         #replace NA with 1
         n = replace(n, is.na(l), 1),
         l = ifelse(is.na(l), nchar(code), l), 
         r = row_number()) %>%
  #Repeat each row n times
  uncount(n) %>%
  #For each desc
  group_by(r) %>%
  #Create code value incrementing one character at a time
  mutate(code = map_chr(row_number(), ~substr(first(code), 1, l + .x - 1)), 
         #Remove "." which is present at the end of string
         code = sub('\\.$', '', code)) %>%
  ungroup %>%
  select(-l, -r)

This returns

# A tibble: 10 x 2
#   desc  code   
#   <chr> <chr>  
# 1 test1 X00    
# 2 test1 X00.2  
# 3 test2 Y10    
# 4 test3 X20    
# 5 test3 X20.2  
# 6 test3 X20.23 
# 7 test3 X20.234
# 8 test4 Z10    
# 9 test1 Q23    
#10 test1 Q23.2  
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Ronak! i would have never come up with your solution. I tried it on my problem and I didn't specify what I wanted enough. I updated the question if you have time to give a crack at it. I gave it a shot but was getting tripped up with the sub regex.
Check updated answer which works for the new data you provided.
I'm back. The iteration to build the string gets confused when the desc has repetitions and is not unique. i updated the example. I tried rowwise but that didn't work.
@yindalon You can add a row number column. See updated answer.
Thank you. I was messing around with the .id in uncount. The row_number is exactly what i needed. Thanks so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.