1

I have an Excel sheet containing employment numbers of each industry of each county in the US.

It looks like this:

County   Industry  Employees
a        1         49
a        2         1
b        1         4
b        2         19
...

I want to calculate the Herfindahl-Hirschman index (HHI) of employment in each county. I'm using R. Given some numbers, calculating the HHI is easy:

hhi <- function(x) {
  # calculate sum
  total <- sum(x)
  
  # calculate share
  share <- x*100/total
  
  # add
  return(sum(share^2))
  
}

So, for example, county 1 has a HHI of 9608 (= 98^2 + 2^2) and county 2 has a HHI of 7127.

But how can I create a new column with the HHI of that county?

2 Answers 2

6

You can use dplyr:

library(dplyr)
df %>% group_by(County) %>% mutate(HHI = sum((Employees/sum(Employees) * 100)^2))

# Source: local data frame [4 x 4]
# Groups: County [2]

#   County Industry Employees      HHI
#   <fctr>    <int>     <int>    <dbl>
# 1      a        1        50 9615.532
# 2      a        2         1 9615.532
# 3      b        1         4 7126.654
# 4      b        2        19 7126.654

Or equivalently, use data.table:

setDT(df)[, HHI := sum((Employees/sum(Employees) * 100)^2), County][]

With your own customized function hhi, since all the functions it calls are vectorized, you can directly use it with mutate:

df %>% group_by(County) %>% mutate(HHI = hhi(Employees))

or:

setDT(df)[, HHI := hhi(Employees), County][]
Sign up to request clarification or add additional context in comments.

2 Comments

How do I ignore NA values using the first approach?
You can specify na.rm = T for the sum functions.
3

We can use ave from base R (no packages used)

df1$HHI <- with(df1, ave(Employees, County, FUN = hhi))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.