I have a data frame df that has two columns, term and frequency. I also have a list of terms with given IDs stored in a vector called indices. To illustrate these two info, I have the following:
> head(indices)
Term
1 hello
256 i
33 the
Also, for the data frame.
> head(df)
Term Freq
1 i 24
2 hello 12
3 the 28
I want to add a column in df called TermID which will just be the index of the term in the vector indices. I have tried using dplyr::mutate but to no avail. Here is my code below
library(dplyr)
whichindex <- function(term){
ind <- which(indices == as.character(term))
ind}
mutate(df, TermID = whichindex(Term))
What I am getting as output is a df that has a new column called TermID, but all the values for TermID are the same.
Can someone help me figure out what I am doing wrong? It would be nice as well if you can recommend a more efficient algorithm to do this in [R]. I have implemented this in Python and I have not encountered such issues.
Thanks in advance.
merge(from base) orjoin?dput(head(indices))anddput(head(df))so that there is no ambiguity about what data structures you are working with.dfandindiceshaveclass = "data.frame". However, I noticed that each element ofindicesunder theTermcolumn is ofclass = "factor".df$TermID <- match(df$Term,indices$Term)will do it, and will take milliseconds on a million cases by my testing.