0

Lets cay I have a pd.DataFrame() object that stores number of people that given age ang gender had stroke in the past. In mor visual way:

positive_by_gender.tail()

gives us:

gender Female Male
age
78 9.0 12.0
79 13.0 4.0
80 10.0 7.0
81 8.0 6.0
82 4.0 5.0

So there are 9 females of age 78 that had stroke, 12 males of age 78 that had stroke etc.

What I want is to calculate a median for each gender of age taht they had stroke - in this excample it would be 79.5 for females, but I want it to be calculated by code not by me :-) - I guess I could make an array that for females would look like: [78 times 9, 79 times 13, 80 times 10, etc... ] and then find median this way but still - I dunno how to do even that. I'd really appreciate all help.

2
  • Does this answer your question? Python: weighted median algorithm with pandas Commented Mar 17, 2021 at 19:15
  • Just got to see what that was and yes it is helpful thx a lot just like the full solution below Commented Mar 17, 2021 at 20:06

1 Answer 1

1

To follow your idea of creating an array and getting the median this way:

In [235]: df
Out[235]: 
     Female  Male
age              
78      9.0  12.0
79     13.0   4.0
80     10.0   7.0
81      8.0   6.0
82      4.0   5.0

In [236]: df = df.astype(int)

In [237]: df
Out[237]: 
     Female  Male
age              
78        9    12
79       13     4
80       10     7
81        8     6
82        4     5

In [238]: df = df.reset_index('age')

In [240]: df = df.melt(id_vars='age', var_name='gender', value_name='count')

In [241]: df
Out[241]: 
   age  gender  count
0   78  Female      9
1   79  Female     13
2   80  Female     10
3   81  Female      8
4   82  Female      4
5   78    Male     12
6   79    Male      4
7   80    Male      7
8   81    Male      6
9   82    Male      5

In [242]: df['age'] = df.apply(lambda s: [s['age']] * s['count'], axis=1)

In [243]: df
Out[243]: 
                                                 age  gender  count
0               [78, 78, 78, 78, 78, 78, 78, 78, 78]  Female      9
1  [79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 7...  Female     13
2           [80, 80, 80, 80, 80, 80, 80, 80, 80, 80]  Female     10
3                   [81, 81, 81, 81, 81, 81, 81, 81]  Female      8
4                                   [82, 82, 82, 82]  Female      4
5   [78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78]    Male     12
6                                   [79, 79, 79, 79]    Male      4
7                       [80, 80, 80, 80, 80, 80, 80]    Male      7
8                           [81, 81, 81, 81, 81, 81]    Male      6
9                               [82, 82, 82, 82, 82]    Male      5

In [245]: df = df.explode('age')
In [249]: df['age'] = df['age'].astype(int)

In [251]: df
Out[251]: 
    age  gender  count
0    78  Female      9
0    78  Female      9
0    78  Female      9
0    78  Female      9
0    78  Female      9
..  ...     ...    ...
9    82    Male      5
9    82    Male      5
9    82    Male      5
9    82    Male      5
9    82    Male      5

[78 rows x 3 columns]

In [250]: df.groupby('gender')['age'].median()
Out[250]: 
gender
Female    79.5
Male      80.0
Name: age, dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.