2

I have a data frame df that has two columns, term and frequency. I also have a list of terms with given IDs stored in a vector called indices. To illustrate these two info, I have the following:

> head(indices)
   Term
1    hello
256  i
33   the

Also, for the data frame.

> head(df)
   Term  Freq
1  i     24
2  hello 12
3  the   28

I want to add a column in df called TermID which will just be the index of the term in the vector indices. I have tried using dplyr::mutate but to no avail. Here is my code below

library(dplyr)

whichindex <- function(term){
              ind <- which(indices == as.character(term))
              ind}

mutate(df, TermID = whichindex(Term))

What I am getting as output is a df that has a new column called TermID, but all the values for TermID are the same.

Can someone help me figure out what I am doing wrong? It would be nice as well if you can recommend a more efficient algorithm to do this in [R]. I have implemented this in Python and I have not encountered such issues.

Thanks in advance.

4
  • Why not just merge (from base) or join? Commented Apr 24, 2015 at 3:49
  • Also, can you post the output of dput(head(indices)) and dput(head(df)) so that there is no ambiguity about what data structures you are working with. Commented Apr 24, 2015 at 3:51
  • Thanks, Ananda. I was actually looking for a faster algo since I am handling a few hundred thousand words. Both df and indices have class = "data.frame". However, I noticed that each element of indices under the Term column is of class = "factor". Commented Apr 24, 2015 at 4:07
  • 1
    df$TermID <- match(df$Term,indices$Term) will do it, and will take milliseconds on a million cases by my testing. Commented Apr 24, 2015 at 5:22

1 Answer 1

7

what about?

df %>% rowwise() %>% mutate(TermID = grep(Term,indices))

w/ example data:

library(dplyr)
indices <- c("hello","i","the")
df <- data_frame(Term = c("i","hello","the"), Freq = c(24,12,28))

df_res <- df %>% rowwise() %>% mutate(TermID = grep(Term,indices))
df_res

gives:

Source: local data frame [3 x 3]
Groups: <by row>

   Term Freq TermID
1     i   24      2
2 hello   12      1
3   the   28      3
Sign up to request clarification or add additional context in comments.

3 Comments

I implemented the suggestion, and I didn't get any error. However, the resulting df remains unchanged (no additional TermID column). It must be because of the data structure. Let me check again to find some answers.
@EFL df remains unchanged you have to bind the output to a variable as in df_res in the above example. does this help answer your question? If not feel free to post your own answer and accept that otherwise.
What's the use of rowwise here? I tried without rowwise and it worked.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.