0

I'm having trouble outputting the results of a for loop to a list/vector in R. The loop is running over a df structured as below, where each unique ID is represented by 1 to n rows:

id <- c(1, 2, 2, 2, 3, 4, 5, 6)
string <-c("apple", "grape", "orange", "blueberry", "plum", "tomato", "pear", "plum")
df <- data.frame(id, string)

For each unique ID, I want to write a list collapsing the n rows into a single row containing a concatenated character string based on the information in column "string". So I have:

#write a function to concatenate strings where d = dataframe, n = column name, and s = character to act as separator
concat <- function(d, n, s) {
   list_value = paste0(d[[n]], sep = s)
   return(list_value)
}

#create two empty lists
string_list <- list()
item_list <- list()

#loop the concatenate function over each unique id in the df
for (i in unique(df$id)) {

   item <- filter(df, id == i)
   print(item)
   item_list[i] <- item

   strings <- concat(item, "string", ";") 
   print(strings)
   string_list[i] <- strings

   }

I can see from the print statements that the loop is running "correctly" (I'm getting the output I want printed to the console) but I get warnings that "number of items to replace is not a multiple of replacement length" and string_list and item_list are impossibly large objects (a df of ~2000 rows becomes a list of ~10M elements).

If at the beginning of the loop I instead say:

 for (i in 1:length(df$id))

I get a list that is the same length as the number of rows in the original df; but it's empty (it returns integer [0] or character [1] for all). There are no NAs in the original df (checked with table(is.na(df$col_name)) for all columns). Same warnings.

Using string_list <- c() instead of string_list <- list() does not seem to help.

I'm missing something simple. What is it? Thanks

EDIT: I think I see part of the problem. The object "item" is a (small) df, and appending a series of dfs to a list would result in a large object. But replacing item_list <- list() with

item_data <- data.frame(Col1 = integer(), Col2 = character(), stringsAsFactors = FALSE)

gives an error, new columns would leave holes after existing columns

4
  • Try converting your item_list[ and string_list[ assignments to [[. Commented Aug 31, 2021 at 12:56
  • What should be in the string_list and item_list in the end? Commented Aug 31, 2021 at 13:17
  • Do you mean item_list[[i] <- items and string_list[[i]] <- strings inside the loop? With for (i in unique(df$id)) {, that gives the same size 10M element lists. Interestingly, with for (i in 1:length(df$id)), the output now becomes large; two columns - an empty list of 0x3 and a df of 0x3 Commented Aug 31, 2021 at 13:19
  • @denisafonin the goal is for string_list to be ("apple;", "grape;" "orange;" "blueberry;", "plum;", "tomato;", "pear;", "plum;") and item_list to be (1, 2, 3, 4, 5, 6) so that they can be rejoined df2 <- data.frame(item_list, string_list), collapsing the n rows into a single row. I can't just drop duplicates on the original df because I need the information in the other rows. Commented Aug 31, 2021 at 13:24

1 Answer 1

1

Would this achieve the result you are looking for?

library(dplyr)

id <- c(1, 2, 2, 2, 3, 4, 5, 6)
string <-c("apple", "grape", "orange", "blueberry", "plum", "tomato", "pear", "plum")
df <- data.frame(id, string)

df2 <- df %>%
  group_by(id) %>%
  summarize(string = paste0(string, collapse = '; '))

Output:

> df2
# A tibble: 6 x 2
     id string                  
  <dbl> <chr>                   
1     1 apple                   
2     2 grape; orange; blueberry
3     3 plum                    
4     4 tomato                  
5     5 pear                    
6     6 plum 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.