Separating and formatting a multidimensional array by dates Python

Question

I am trying to write a code where it separates the data in months with the month_changes. The Values and Val_dates are corelated, Val_dates are supposed to be the matching dates for the Values indexes. So [100,'2015-11-01 01:03:00'],[123, '2015-11-08 12:56:00']...... Each row in the multidimensional array is suppose to represent a single month so the first row [100,123,135.3,139.05,156.08,163.88,173.72] is for the November of 2015 and the sixth row [100,106] is for February of 2016 etc. I am trying to write a function where it iterates through the month rows as well as the number of valid indexes for that month. So for November of 2015 there are 4 dates that have the same month and year, ['2015-11-01 01:03:00', '2015-11-08 12:56:00', '2015-11-11 02:30:00', '2015-11-14 04:23:00'] so since the 4th index is the last on the first row it will output 139.05 which is the 4th index in [100,123,135.3,139.05,156.08,163.88,173.72]. For the rows that dont have any date matches it will just output 0. How could I get the Expected Output?

import numpy as np 

#[23,10,3,12,5,6]
Values = np.array([[100,123,135.3,139.05,156.08,163.88,173.72],
                  [100,110,113,126.56,132.89,140.86],
                  [100,103,115.36,121.13,128.4],
                  [100,112,117.6,124.66],
                  [100,105,111.3],
                  [100,106]])

Val_dates= ['2015-11-01 01:03:00', '2015-11-08 12:56:00', '2015-11-11 02:30:00', '2015-11-14 04:23:00', '2016-02-11 02:00:00', '2016-02-15 15:00:00']

month_changes = ['2015-11-01 00:00:00', '2015-12-01 00:00:00', '2016-01-01 00:00:00',
 '2016-02-01 00:00:00', '2016-03-01 00:00:00']

format_month = np.sort(month_changes)

def Monthly_Pnls(index, Values):
    # Digitize
    digit_month = np.digitize(index, format_month)
    Monthly_PnL = np.bincount(digit_month, weights=PnL)
    Monthly_PnL= np.around(Monthly_PnL[1:len(format_month)],1)
    print(Monthly_PnL)
    
    return Monthly_PnL

Monthly_Pnls(Val_dates, month_changes)

Expected Output:

[139.05,0,0,0,106,0]

Max Inputs:

Values = np.array([[123.         104.55       107.6865     105.53277    110.8094085
 117.45797301]
[85.       87.55     85.799    90.08895  95.494287]
[103.      100.94    105.987   112.34622]
[ 98.    102.9   109.074]
[105.  111.3]
[106.]])

Expected Output:

[123, 0.0, 0.0, 105.0, 0.0]

There is a problem: len(Values) -> 7 and len(Val_dates) -> 6 — Corralien
– Corralien, Commented Sep 9, 2021 at 21:33
You're welcome. Do you accept an answer with Pandas (tomorrow)? — Corralien
– Corralien, Commented Sep 9, 2021 at 21:57

Corralien · Accepted Answer · 2021-09-10 07:40:59Z

2

You can use Pandas to get expected result:

First convert to Pandas data structure:

df = pd.DataFrame({'dt': Val_dates, 'val': Values}).astype({'dt': 'datetime64'})
idx = pd.date_range(month_changes[0], month_changes[-1], freq='MS')

>>> df
                   dt                                                val
0 2015-11-01 01:03:00  [100, 123, 135.3, 139.05, 156.08, 163.88, 173.72]
1 2015-11-08 12:56:00            [100, 110, 113, 126.56, 132.89, 140.86]
2 2015-11-11 02:30:00                  [100, 103, 115.36, 121.13, 128.4]
3 2015-11-14 04:23:00                          [100, 112, 117.6, 124.66]
4 2016-02-11 02:00:00                                  [100, 105, 111.3]
5 2016-02-15 15:00:00                                         [100, 106]

>>> idx
DatetimeIndex(['2015-11-01', '2015-12-01', '2016-01-01', '2016-02-01',
               '2016-03-01'],
              dtype='datetime64[ns]', freq='MS')

Group by month, keep the first row of the group and get the right index:

>>> df.groupby(pd.Grouper(freq='MS', key='dt'))['val'] \
      .apply(lambda x: x.head(1).squeeze()[len(x)-1] if len(x) else 0) \
      .reindex(idx, fill_value=0) \
      .tolist()

[139.05, 0.0, 0.0, 105.0, 0.0]

OR (without if/else)

>>> df.set_index('dt', drop=False).resample('MS')['val'] \
      .agg((len, 'first')).dropna(how='any') \
      .apply(lambda x: x['first'][x['len']-1], axis=1) \
      .reindex(idx, fill_value=0) \
      .tolist()

[139.05, 0.0, 0.0, 105.0, 0.0]

The first method is 3x times faster than the second one

edited Sep 10, 2021 at 7:40

answered Sep 10, 2021 at 7:24

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

georgehere Over a year ago

hello is there a way to convert the code to get the max values for the code above by any chance? So if the Values were updated on the example above as the Max Inputs the values of the Expected Output would be [123, 0.0, 0.0, 105.0, 0.0]

georgehere Over a year ago

Hey could you look at this issue it is just like this one stackoverflow.com/questions/69124802/…

Collectives™ on Stack Overflow

Separating and formatting a multidimensional array by dates Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related