3

My dataframe is as follows in which gender=="1" refers to men and gender=="2" refers to women, Occupations go from A to U and year goes from 2010 to 2018 (I give you a small example)

Gender   Occupation    Year
1            A         2010
1            A         2010
2            A         2010
1            B         2010
2            B         2010
1            A         2011
2            A         2011
1            C         2011
2            C         2011

I want an output that sums the number of rows in which gender and year and occupation is distinct like you can see next:

Year | Occupation | Men | Woman
2010 |      A     |  2  |   1
2010 |      B     |  1  |   1
2011 |      A     |  1  |   1
2011 |      C     |  1  |   1

I have tried the following:

Nr_gender_occupation <- data %>%
   group_by(year, occupation) %>%
   summarise(
      Men = aggregate(gender=="1" ~ occupation, FUN= count),
      Women = aggregate(gender=="2" ~ occupation, FUN=count)
)
2
  • Just curious (for the sake of assumptions and possible data loss), are you concerned with non-binary gender values? Commented Apr 30, 2021 at 18:03
  • No, I only have binary gender values. Thank you! Commented Apr 30, 2021 at 18:06

3 Answers 3

3

You can also do a count within your groups:

library(dplyr)

df %>% 
  group_by(Occupation, Year) %>% 
  summarize(Men = sum(Gender == 1),
            Woman = sum(Gender == 2), .groups = "drop")

Output

  Occupation  Year   Men Woman
  <chr>      <dbl> <int> <int>
1 A           2010     2     1
2 A           2011     1     1
3 B           2010     1     1
4 C           2011     1     1
Sign up to request clarification or add additional context in comments.

Comments

3

We could use the index in 'Gender' to change the values, then with pivot_wider from tidyr reshape the data into 'wide' format

library(dplyr)
library(tidyr)
data %>%
 mutate(Gender = c("Male", "Female")[Gender]) %>%
 pivot_wider(names_from = Gender, values_from = Gender, values_fn = length)

-output

# A tibble: 4 x 4
#  Occupation  Year  Male Female
#  <chr>      <int> <int>  <int>
#1 A           2010     2      1
#2 B           2010     1      1
#3 A           2011     1      1
#4 C           2011     1      1

Or use table with unnest

library(tidyr)
data %>%
   group_by(Year, Occupation) %>%
   summarise(out = list(table(Gender)), .groups = 'drop') %>%     
   unnest_wider(out)

Or we can use count with pivot_wider

data %>%
  count(Gender, Occupation, Year) %>%
  pivot_wider(names_from = Gender, values_from = n)

data

data <- structure(list(Gender = c(1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), 
    Occupation = c("A", "A", "A", "B", "B", "A", "A", "C", "C"
    ), Year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 
    2011L, 2011L)), class = "data.frame", row.names = c(NA, -9L
))

Comments

1

A data.table option using dcast

dcast(setDT(df), Year + Occupation ~ c("Men", "Woman")[Gender])

gives

   Year Occupation Men Woman
1: 2010          A   2     1
2: 2010          B   1     1
3: 2011          A   1     1
4: 2011          C   1     1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.