2

I have 6 different dataframes that each calculates a cosine similarity between a set of documents. I have already calculated the cosine similarity, I just need to pull out the right variable on each of the six and save it. The code to do this looks like this:

# first I have to convert the model (calculating cosine similarity) into a dataframe. The model is a "Formal class textstal_simil" from quanteda
y_V_2 <- as.data.frame(as.matrix(y_V_2))

#then I label the reference documents column to "id"
y_V_2 <- cbind(id = rownames(y_V_2), y_V_2)

# because I have my variables of interest as columns ("DF", "EL", etc. are all political parties) # I change convert the dataframe from wide to long
y_V_2 <- y_V_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")

# lastly, I filter the correct cosine similarity and save it in the final dataframe called: cos_sim_V_2
cos_sim_V_2 <- y_V_2 %>% 
  filter(party == "V")

Not sure if this makes sense. The bottom line is that I do this with six different dataframes (each one represents a political party). That line of code looks like this:

y_V_2 <- as.data.frame(as.matrix(y_V_2))
y_S_2 <- as.data.frame(as.matrix(y_S_2))
y_EL_2 <- as.data.frame(as.matrix(y_EL_2))
y_SF_2 <- as.data.frame(as.matrix(y_SF_2))
y_DF_2 <- as.data.frame(as.matrix(y_DF_2))
y_KF_2 <- as.data.frame(as.matrix(y_KF_2))
y_RV_2 <- as.data.frame(as.matrix(y_RV_2))


y_V_2 <- cbind(id = rownames(y_V_2), y_V_2)
y_S_2 <- cbind(id = rownames(y_S_2), y_S_2)
y_EL_2 <- cbind(id = rownames(y_EL_2), y_EL_2)
y_SF_2 <- cbind(id = rownames(y_SF_2), y_SF_2)
y_DF_2 <- cbind(id = rownames(y_DF_2), y_DF_2)
y_KF_2 <- cbind(id = rownames(y_KF_2), y_KF_2)
y_RV_2 <- cbind(id = rownames(y_RV_2), y_RV_2)

y_V_2 <- y_V_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_S_2 <- y_S_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_EL_2 <- y_EL_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_SF_2 <- y_SF_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_DF_2 <- y_DF_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_KF_2 <- y_KF_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_RV_2 <- y_RV_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")

cos_sim_V_2 <- y_V_2 %>% 
  filter(party == "V")
cos_sim_S_2 <- y_S_2 %>% 
  filter(party == "S")
cos_sim_EL_2 <- y_EL_2 %>% 
  filter(party == "EL")
cos_sim_SF_2 <- y_SF_2 %>% 
  filter(party == "SF")
cos_sim_DF_2 <- y_DF_2 %>% 
  filter(party == "DF")
cos_sim_KF_2 <- y_KF_2 %>% 
  filter(party == "KF")
cos_sim_RV_2 <- y_RV_2 %>% 
  filter(party == "RV")

NOW, what I actually want to do is the following: these six dataframes are for year "2" (hence the 2 at the end of each). I actually have 22 years of interest. Therefore, I need to do this entire thing 22 times for 6 parties (for party 1: y_V_2, y_V_3, y_V_4 etc. etc.). Is there any way I can loop through this?

I have tried the following:

time <- 2:22

for (i in time){
  
  y_V_[[i]] <- as.data.frame(as.matrix(y_V_[[i]]))
  
  
  
  y_V_[[i]] <- cbind(id = rownames(y_V_[[i]]), y_V_[[i]])
  
  
  y_V_[[i]] <- y_V_[[i]] %>% 
    pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
  
  y_V_[[i]] <- y_V_[[i]] %>% 
    filter(party == "V")
  
}

But it does not work. What is the correct way of this doing?

If it helps, this is the structure of the dataframe, once I convert the "formal_class textstal_simil" to dataframe: y_V_2 <- as.data.frame(as.matrix(y_V_2))

dput(head(y_V_2))

structure(list(DF = c(0.23499916674957, 0.16697708727056, 0.26998882552819, 
0.11989777626359, 0.28145930377199, 0.15959668959184), EL = c(0.23595981215221, 
0.18359709428329, 0.28810481269376, 0.13263861987521, 0.25331537435773, 
0.18167733395369), KF = c(0.20936950007655, 0.18252467175417, 
0.26042704505428, 0.14266913827392, 0.20023284784432, 0.18992935664409
), RV = c(0.2046697473122, 0.24951432279883, 0.24766480258903, 
0.11242986749057, 0.23958714529124, 0.16084468614859), S = c(0.24270069472492, 
0.18741729570808, 0.29014329186024, 0.14733535217516, 0.27150818619494, 
0.18979023415197), SF = c(0.23561869890038, 0.17927679461636, 
0.29403349472473, 0.15269893065285, 0.2559026802251, 0.17742356519735
), V = c(0.31302795687125, 0.2765158096593, 0.41588664999413, 
0.21090507950169, 0.34787076982177, 0.2583219375177)), row.names = c("Anders Fogh Rasmussen", 
"Anders Mølgaard", "Birthe Rønn Hornbech", "Bodil Thrane", "Charlotte Antonsen", 
"Christian Mejdahl"), class = "data.frame")

Extra question (but not absolute necessary): can I combine looping through the 22 years with also looping through the six different parties? So that I only have to write the original 6 lines of code. The looping would then be through the parties (V, S, EL, SF, DF, KF, RV) as well as the 22 years for each party.

1
  • Put your data.frames into a list when you create them! It's very easy to iterate over a list. Commented Dec 2, 2021 at 13:49

1 Answer 1

2

You can use get(object_name) to get an object by name

for (i in time) {
  df <- get(paste0("y_V_", i))
}

Will get the dataframe y_V_{i} where i is the time index. You can do the letter as well:

for (i in time) {
  for (l in letter_vector) {
    df <- get(paste0("y_", l, "_", i))
  }
}

Will write y_{l}_{i} to df, given that they all exist. That's up to you


Edit: use assign to write to a pasted object name

for (i in time) {
  for (l in letter_vector) {
    df <- get(paste0("y_", l, "_", i))
    assign(paste0("df_", l, "_", i), df)
  }
}

Second edit. You can write the dataframes to a list:

# first initialize the list
list_with_dfs <- list()

for (i in time) {
  for (l in letter_vector) {
    df <- get(paste0("y_", l, "_", i))
    assign(paste0("df_", l, "_", i), df)

    # Then write to the list
    list_with_dfs[[length(list_with_dfs) +  1]] <- get(paste0("df_", l, "_", i))

    # Or just use the df 
    list_with_dfs[[length(list_with_dfs) +  1]] <- df
  }
}
Sign up to request clarification or add additional context in comments.

4 Comments

This can work I think. But how do I (in the end) save the 22*6 instances of df into its own unique df? so e.g. with the label cos_sim_l_i? I tried to end it with: paste0("cos_sim_", l, "_", i) <- df %>% filter(party == l) but it does not work. Essentially how do I save each iteration in its own unique dataframe, that has the character (from "letter_vector") and the index (from "time")?
use assign(pasted_object_name, data_to_write)
This is actually really good! I have a last follow up question (sorry): it worked with assign, but I just realized that I now have around a 100 dataframes. Do you know how I can save it into a list instead, so that I in the end can rbind them all and instead get a long dataframe? right now I have tried: First I created a list: listofdfs <- list() and then ended the loop with listofdfs[[j]] <- assign(paste0("cos_sim_", l, "_", i), df ). However, it yields an error. (j is a vector of 1:132)
Sure! Just realize that assign relates values to a variable, where the variable name can be dynamic. So for that, you need to get the object again. You are on the right track, but just need to use get instead of assign. Or you could just use the "temporary" df as in the example. I'll edit the answer once more

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.