R: Add new column to dataframe using function

Question

I have a data frame df that has two columns, term and frequency. I also have a list of terms with given IDs stored in a vector called indices. To illustrate these two info, I have the following:

> head(indices)
   Term
1    hello
256  i
33   the

Also, for the data frame.

> head(df)
   Term  Freq
1  i     24
2  hello 12
3  the   28

I want to add a column in df called TermID which will just be the index of the term in the vector indices. I have tried using dplyr::mutate but to no avail. Here is my code below

library(dplyr)

whichindex <- function(term){
              ind <- which(indices == as.character(term))
              ind}

mutate(df, TermID = whichindex(Term))

What I am getting as output is a df that has a new column called TermID, but all the values for TermID are the same.

Can someone help me figure out what I am doing wrong? It would be nice as well if you can recommend a more efficient algorithm to do this in [R]. I have implemented this in Python and I have not encountered such issues.

Thanks in advance.

Also, can you post the output of dput(head(indices)) and dput(head(df)) so that there is no ambiguity about what data structures you are working with. — A5C1D2H2I1M1N2O1R2T1
– A5C1D2H2I1M1N2O1R2T1, Commented Apr 24, 2015 at 3:51
Thanks, Ananda. I was actually looking for a faster algo since I am handling a few hundred thousand words. Both df and indices have class = "data.frame". However, I noticed that each element of indices under the Term column is of class = "factor". — EFL
– EFL, Commented Apr 24, 2015 at 4:07
df$TermID <- match(df$Term,indices$Term) will do it, and will take milliseconds on a million cases by my testing. — thelatemail
– thelatemail, Commented Apr 24, 2015 at 5:22

npjc · Accepted Answer · 2015-04-25 00:37:53Z

7

what about?

df %>% rowwise() %>% mutate(TermID = grep(Term,indices))

w/ example data:

library(dplyr)
indices <- c("hello","i","the")
df <- data_frame(Term = c("i","hello","the"), Freq = c(24,12,28))

df_res <- df %>% rowwise() %>% mutate(TermID = grep(Term,indices))
df_res

gives:

Source: local data frame [3 x 3]
Groups: <by row>

   Term Freq TermID
1     i   24      2
2 hello   12      1
3   the   28      3

edited Apr 25, 2015 at 0:37

answered Apr 24, 2015 at 5:13

npjc

4,2241 gold badge24 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

EFL Over a year ago

I implemented the suggestion, and I didn't get any error. However, the resulting df remains unchanged (no additional TermID column). It must be because of the data structure. Let me check again to find some answers.

npjc Over a year ago

@EFL df remains unchanged you have to bind the output to a variable as in df_res in the above example. does this help answer your question? If not feel free to post your own answer and accept that otherwise.

Catiger3331 Over a year ago

What's the use of rowwise here? I tried without rowwise and it worked.

Collectives™ on Stack Overflow

R: Add new column to dataframe using function

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related