Python Matplotlib - frequency table

Question

I have a table of data below: (the 1st column is date, the 2nd column is the daily return)

2020-01-02    0.022034
2020-01-03   -0.002666
2020-01-06    0.009716
2020-01-07    0.009838
2020-01-08   -0.011690
2020-01-09    0.025103
2020-01-10    0.009325
2020-01-13    0.028888
2020-01-14   -0.009183
2020-01-15    0.012292
2020-01-16   -0.005593
2020-01-17    0.020492
2020-01-20   -0.003878
2020-01-21   -0.032687
2020-01-22    0.034887
2020-01-23   -0.033485
2020-01-24    0.001934
2020-01-29   -0.026629
2020-01-30   -0.039513
2020-01-31   -0.001845
2020-02-03    0.021784
2020-02-04    0.033137
2020-02-05    0.000586
2020-02-06    0.016146
2020-02-07    0.000082
2020-02-10   -0.016997
2020-02-11    0.010172
2020-02-12    0.016836
2020-02-13    0.013530
                  ...
2022-01-31    0.031707
2022-02-04    0.028683
2022-02-07   -0.015853
2022-02-08   -0.024170
2022-02-09    0.045076
2022-02-10    0.013623
2022-02-11   -0.012259
2022-02-14   -0.023093
2022-02-15   -0.008984
2022-02-16    0.023177
2022-02-17    0.003182
2022-02-18   -0.054995
2022-02-21   -0.033302
2022-02-22   -0.028148
2022-02-23    0.012332
2022-02-24   -0.048095
2022-02-25   -0.004944
2022-02-28   -0.002682
2022-03-01    0.006940
2022-03-02    0.002542
2022-03-03   -0.006318
2022-03-04   -0.048641
2022-03-07   -0.050231
2022-03-08   -0.015469
2022-03-09    0.011477
2022-03-10   -0.002236
2022-03-11   -0.038740
2022-03-14   -0.115421
2022-03-15   -0.089573
2022-03-16    0.243084

I want to build a frequency table like below:

I think this might involve several steps: (1) categorise daily return data into different ranges (2) use value_counts() on the ranges (3) calculate the percentage on the ranges

For the first step, I think I can try pd.cut with a groupby. However, my dataframe doesnt have a header, and I tried portret_df.columns = ['Dates','Daily Return'] but could not manage to add the header. May I ask how can I add a header there so that I can refer to the 1st and 2nd columns? Much appreciated for your help.

better show small example data, and first calculate result manually for this data, and show this result. And then we can see what result you really expect. With big data and image which doesn't match to this data we can't confirm if we create correct solution. — furas
– furas, Commented Mar 24, 2022 at 20:25
if you have problem to add headers then show FULL error message. I have no problem with your portret_df.columns = ['Dates','Daily Return']. Maybe you get it as single column and now it has problem to add two headers to single column. How did you get/read data? — furas
– furas, Commented Mar 24, 2022 at 20:34
maybe first check what you have in portret_df.columns and how many columns it shows. You can always select column by number. — furas
– furas, Commented Mar 24, 2022 at 20:35

Scott Boston · Accepted Answer · 2022-03-24 21:49:34Z

2

Let's use pd.Series.value_counts with 'bins' parameter.

bins = [-np.inf,-.01,-.005,0,.005,.01,np.inf]
labels = ['ret < -1%', 
          '-1% < ret < -.5%', 
          '-.5% < ret < %0',
          ' %0 < ret < .5%',
          '.5% < ret < 1%',
          'reg > 1%']
df_counts = (df['ret'].value_counts(bins=bins, sort=False)
                      .rename('# of events').to_frame().set_axis(labels).T)
df_pcts = (df['ret'].value_counts(bins=bins, normalize=True, sort=False)
                    .rename('% of events').to_frame()
                    .set_axis(labels).T.mul(100).round(1))

pd.concat([df_counts, df_pcts])

Output:

             ret < -1%  -1% < ret < -.5%  -.5% < ret < %0   %0 < ret < .5%  .5% < ret < 1%  reg > 1%
# of events       20.0               4.0              6.0              5.0             4.0      20.0
% of events       33.9               6.8             10.2              8.5             6.8      33.9

edited Mar 24, 2022 at 21:49

answered Mar 24, 2022 at 21:45

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

tdy Over a year ago

nice, never noticed value_counts had a bins param

Scott Boston Over a year ago

@tdy Yeah... It is one of those often looked over parameters. :)

silverpegasus Over a year ago

Thank you very much for your suggestion. For the code .rename('# of events').to_frame().set_axis(labels).T) It throw the error TypeError: set_axis() missing 1 required positional argument: 'labels . I wonder what exactly went wrong?

Scott Boston Over a year ago

Older version of pandas. add inplace=False and axis=0(I think zero is rows). Or upgrade pandas. I am away from keyboard. Maybe someelse can edit this solution and test.

furas · Accepted Answer · 2022-03-24 21:08:45Z

1

I have no idea why you have problem to change headers - maybe you read it as single column.

I have no problem to set headers when I load

df = pd.read_csv(..., names=['Date','Daily Return'])

or later

df.columns = ['Date','Daily Return']

And later I can use cut with bins=[min_val, -1, -0.5, 0, 0.5, 1, max_val]

min_val = df['Daily Return'].min() - 1
max_val = df['Daily Return'].max() + 1

regions = pd.cut(df['Daily Return'],
                 bins=[min_val, -1, -0.5, 0, 0.5, 1, max_val],
                 labels=['ret < -1(%)','-1 < ret < -0.5(%)', '-0.5 < ret < 0(%)','0 < ret < 0.5(%)','0.5 < ret < 1(%)','ret > 1(%)'],
                 )

And calculate number of events

count = regions.value_counts(sort=False)
print(count)

ret < -1(%)            0
-1 < ret < -0.5(%)     0
-0.5 < ret < 0(%)     30
0 < ret < 0.5(%)      29
0.5 < ret < 1(%)       0
ret > 1(%)             0

And use it to calculate percentage

size = len(regions)
percentage = (count/size) * 100
print(percentage)

ret < -1(%)            0.000000
-1 < ret < -0.5(%)     0.000000
-0.5 < ret < 0(%)     50.847458
0 < ret < 0.5(%)      49.152542
0.5 < ret < 1(%)       0.000000
ret > 1(%)             0.000000

Now it would need only to put all in DataFrame to format table

results = pd.DataFrame({'# of event': count, '% of event': percentage})

print(results.T.to_string())

            ret < -1(%)  -1 < ret < -0.5(%)  -0.5 < ret < 0(%)  0 < ret < 0.5(%)  0.5 < ret < 1(%)  ret > 1(%)
# of event          0.0                 0.0          30.000000         29.000000               0.0         0.0
% of event          0.0                 0.0          50.847458         49.152542               0.0         0.0

Full working code with example data readed with io.StringIO

but you should use own method to get data

text = '''2020-01-02    0.022034
2020-01-03   -0.002666
2020-01-06    0.009716
2020-01-07    0.009838
2020-01-08   -0.011690
2020-01-09    0.025103
2020-01-10    0.009325
2020-01-13    0.028888
2020-01-14   -0.009183
2020-01-15    0.012292
2020-01-16   -0.005593
2020-01-17    0.020492
2020-01-20   -0.003878
2020-01-21   -0.032687
2020-01-22    0.034887
2020-01-23   -0.033485
2020-01-24    0.001934
2020-01-29   -0.026629
2020-01-30   -0.039513
2020-01-31   -0.001845
2020-02-03    0.021784
2020-02-04    0.033137
2020-02-05    0.000586
2020-02-06    0.016146
2020-02-07    0.000082
2020-02-10   -0.016997
2020-02-11    0.010172
2020-02-12    0.016836
2020-02-13    0.013530
2022-01-31    0.031707
2022-02-04    0.028683
2022-02-07   -0.015853
2022-02-08   -0.024170
2022-02-09    0.045076
2022-02-10    0.013623
2022-02-11   -0.012259
2022-02-14   -0.023093
2022-02-15   -0.008984
2022-02-16    0.023177
2022-02-17    0.003182
2022-02-18   -0.054995
2022-02-21   -0.033302
2022-02-22   -0.028148
2022-02-23    0.012332
2022-02-24   -0.048095
2022-02-25   -0.004944
2022-02-28   -0.002682
2022-03-01    0.006940
2022-03-02    0.002542
2022-03-03   -0.006318
2022-03-04   -0.048641
2022-03-07   -0.050231
2022-03-08   -0.015469
2022-03-09    0.011477
2022-03-10   -0.002236
2022-03-11   -0.038740
2022-03-14   -0.115421
2022-03-15   -0.089573
2022-03-16    0.243084
'''

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text), sep='\s+', names=['date', 'value'])
df.columns = ['Date','Daily Return']
#print(df)

min_val = df['Daily Return'].min() - 1
max_val = df['Daily Return'].max() + 1

regions = pd.cut(df['Daily Return'],
                 bins=[min_val, -1, -0.5, 0, 0.5, 1, max_val],
                 labels=['ret < -1(%)','-1 < ret < -0.5(%)', '-0.5 < ret < 0(%)','0 < ret < 0.5(%)','0.5 < ret < 1(%)','ret > 1(%)'],
                 )

count = regions.value_counts(sort=False)
print(count)

size = len(regions)
percentage = (count/size) * 100
print(percentage)

results = pd.DataFrame({'# of event': count, '% of event': percentage})

print(results.T.to_string())

EDIT:

As @tdy suggests in comment you can also use -np.inf, np.inf instead of min_val, max_val

import numpy as np

regions = pd.cut(df['Daily Return'],
                 bins=[-np.inf, -1, -0.5, 0, 0.5, 1, np.inf],
                 labels=['ret < -1(%)','-1 < ret < -0.5(%)', '-0.5 < ret < 0(%)','0 < ret < 0.5(%)','0.5 < ret < 1(%)','ret > 1(%)'],
                 )

edited Mar 24, 2022 at 21:08

answered Mar 24, 2022 at 21:01

furas

149k12 gold badges121 silver badges171 bronze badges

6 Comments

silverpegasus Over a year ago

Thank you very much for your reply and advice. I realised the problem was that portret_df is a "Series" object in my code, which is not a data frame. Therefore the column attribute didn't work.

tdy Over a year ago

+1, just a couple suggestions: you can use value_counts(normalize=True) to get proportions automatically, and you can avoid calculating the min/max bounds by using -np.inf and np.inf

silverpegasus Over a year ago

Following up with my previous comment, I changed the series into dataframe with the following code: portretDF = portret_df.to_frame() and updated the header with portretDF.columns = ['Date', 'Return'] . However, an error message pops up: Length mismatch: Expected axis has 1 elements, new values have 2 elements

furas Over a year ago

@tdy I was thinking about -np.inf and np.inf but finally I resigned :)

furas Over a year ago

@silverpegasus in pd.cut(df['Daily Return'], ... it gets single column so it gets it as Series and as @tdy said you could use your series directly in pd.cut()

|

Collectives™ on Stack Overflow

Python Matplotlib - frequency table

2 Answers 2

4 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related