I have a Data Set that is available here
It gives us a DataFrame like
df=pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user', sep='|')
df.head()
user_id age gender occupation zip_code
1 24 M technician 85711
2 53 F other 94043
3 23 M writer 32067
4 24 M technician 43537
5 33 F other 15213
I want to find out what is the ratio of Males:Females in each occupation
I have used the given function below but this is not the most optimal approach.
df.groupby(['occupation', 'gender']).agg({'gender':'count'}).div(df.groupby('occupation').agg('count'), level='occupation')['gender']*100
That gives us the result something like
occupation gender
administrator F 45.569620
M 54.430380
artist F 46.428571
M 53.571429
The above answer is in a very different format as I want something like: (demo)
occupation M:F
programmer 2:3
farmer 7:2
Can somebody please tell me how to make own aggregation functions?
unstackat the end of your function(df.groupby(["occupation", "gender"]).agg({"gender": "count"}).div( df.groupby("occupation").agg("count"), level="occupation" ).unstack('gender')["gender"] * 100)but I don't understand how you get 2:3 and 7:2?