0

I am working with a data frame that comes from the database in the following way:

username    elements
username1   """interfaces"".""dual()"""
username1   """interfaces"".""f_capitalaccrualcurrentyear"""
username2   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username4   """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""

So, two columns, a "username" and "elements". So there can be one element or several elements the user has used in one transaction. When multiple elements, they are separated with a comma in a transaction. I need to have the elements separated, one per row, but still tagged with the user name. At the end I'd like it to be like so:

username    elements
username1   """interfaces"".""dual()"""
username1   """interfaces"".""f_capitalaccrualcurrentyear"""
username2   """interfaces"".""dnow_completion""
username2   ""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion""
username2   ""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion""
username2   ""interfaces"".""dnow_s_daily_prod_ta"""
username4   """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3   """interfaces"".""dnow_completion""
username3   ""interfaces"".""dnow_s_daily_prod_ta"""

I have been trying to iterate through the data frame, split the elements that have commas and then put them back together with the respective user name.

I have been trying the code below but it is super inefficient. I am new to "R" so my guess is that there has to be a more efficient way to do this.

interface.data <-data.frame(
    username = c(),
    elements = c()
)
for (row in 1:nrow(input)) { ##input is the frame that comes from the database
     myrowbrk<-input[row,"elements"]
     myrowelements<-chartr(",", "\n", myrowbrk)      
     user<-input[row,"username"]
     interface.newdata <- data.frame(
         username = user,
         elements = c(myrowelements)         
     )
     interface.final<- rbind(interface.data,interface.newdata )
}

output<-interface.final
2
  • 1
    Try tidyr::separate_rows(input, elements, sep = ",") ? Commented Jan 3, 2020 at 0:14
  • Awesome! This solved it. Thank you! Commented Jan 3, 2020 at 13:39

1 Answer 1

1

You could use the tidyrpackage to do that. My solution uses two steps to obtain the data in the desired format: 1) separate the elements column using the comma character and 2) changing the format from wide to long.

library(tidyr)

#Separate the 'elements' column from your 'df' data frame using the comma character
#Set the new variable names as a sequence of 1 to the max number of expected columns
df2 <- separate(data = df, 
                   col = elements, 
                   into = as.character(seq(1,2,1)),
                   sep = ",")
#This code gives a warning because not every row has a string with a comma. 
#Empty entries are filled with NA

#Then change from wide to long format, dropping NA entries
#Drop the column that indicates the name of the column from which the elements entry was obtained (i.e., 1 or 2)
df2 <- df2 %>%
  pivot_longer(cols = "1":"2",
               values_to = "elements",
               values_drop_na = TRUE) %>%
  select(-name)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your response! Much appreciated!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.