1

I have another question. Thanks for everyone's help and patience with an R newbie!

How can I count how many times a string occurs in a column? Example:

MYdata <- data.frame(fruits = c("apples", "pears", "unknown_f", "unknown_f", "unknown_f"), 
                     veggies = c("beans", "carrots", "carrots", "unknown_v", "unknown_v"), 
                     sales = rnorm(5, 10000, 2500))

The problem is that my real data set contains several thousand rows and several hundred of the unknown fruits and unknown veggies. I played around with "table()" and "levels" but without much success. I guess it's more complicated than that. Great would be to have an output table listing the name of each unique fruit/veggie and how many times it occurs in its column. Any hint in the right direction would be much appreciated.

Thanks,

Marcus

2
  • 2
    In what way was table(MYdata$fruits) unsatisfactory? Commented Jun 11, 2012 at 6:24
  • Wow! I really have to apologize for this!! I spent half a day on this ... tried varies iterations of table() ... but - I promise - never got anything useful. I guess I missed the forest for the trees. Thanks everyone for your helpful answers and comments! Marcus Commented Jun 12, 2012 at 3:08

3 Answers 3

10

If I understand your question, the function table() should work just fine. Here is how:

table(MYdata$fruits)

   apples     pears unknown_f 
        1         1         3 
table(MYdata$veggies)

    beans   carrots unknown_v 
        1         2         2 

Or use table inside lapply:

lapply(MYdata[1:2], table)
$fruits

   apples     pears unknown_f 
        1         1         3 

$veggies

    beans   carrots unknown_v 
        1         2         2 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! I am sorry it was so easy. See my comment above.
3

The following gives you a data frame of counts which you might find easier to use or may suit your purposes better:

tabs=lapply(MYdata[-3], table)
out=data.frame(item=names(unlist(tabs)),count=unlist(tabs)[],
               stringsAsFactors=FALSE)
rownames(out)=c()

print(out)

               item count
1     fruits.apples     1
2      fruits.pears     1
3  fruits.unknown_f     3
4     veggies.beans     1
5   veggies.carrots     2
6 veggies.unknown_v     2

2 Comments

This solution is especially nice since the output is a data frame and not a table, which may be a pain to manipulate later.
I have been looking for code that does just this for a long time!! Thank you, thank you!!!!
1

Maybe something like

summary(MYdata$fruits)

1 Comment

thank you! As a newbie I was not really aware of the summary() command.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.