0

I am trying to write a code where it separates the data in months with the month_changes. The Values and Val_dates are corelated, Val_dates are supposed to be the matching dates for the Values indexes. So [100,'2015-11-01 01:03:00'],[123, '2015-11-08 12:56:00']...... Each row in the multidimensional array is suppose to represent a single month so the first row [100,123,135.3,139.05,156.08,163.88,173.72] is for the November of 2015 and the sixth row [100,106] is for February of 2016 etc. I am trying to write a function where it iterates through the month rows as well as the number of valid indexes for that month. So for November of 2015 there are 4 dates that have the same month and year, ['2015-11-01 01:03:00', '2015-11-08 12:56:00', '2015-11-11 02:30:00', '2015-11-14 04:23:00'] so since the 4th index is the last on the first row it will output 139.05 which is the 4th index in [100,123,135.3,139.05,156.08,163.88,173.72]. For the rows that dont have any date matches it will just output 0. How could I get the Expected Output?

import numpy as np 

#[23,10,3,12,5,6]
Values = np.array([[100,123,135.3,139.05,156.08,163.88,173.72],
                  [100,110,113,126.56,132.89,140.86],
                  [100,103,115.36,121.13,128.4],
                  [100,112,117.6,124.66],
                  [100,105,111.3],
                  [100,106]])

Val_dates= ['2015-11-01 01:03:00', '2015-11-08 12:56:00', '2015-11-11 02:30:00', '2015-11-14 04:23:00', '2016-02-11 02:00:00', '2016-02-15 15:00:00']

month_changes = ['2015-11-01 00:00:00', '2015-12-01 00:00:00', '2016-01-01 00:00:00',
 '2016-02-01 00:00:00', '2016-03-01 00:00:00']

format_month = np.sort(month_changes)

def Monthly_Pnls(index, Values):
    # Digitize
    digit_month = np.digitize(index, format_month)
    Monthly_PnL = np.bincount(digit_month, weights=PnL)
    Monthly_PnL= np.around(Monthly_PnL[1:len(format_month)],1)
    print(Monthly_PnL)
    
    return Monthly_PnL

Monthly_Pnls(Val_dates, month_changes)

Expected Output:

[139.05,0,0,0,106,0]

Max Inputs:

Values = np.array([[123.         104.55       107.6865     105.53277    110.8094085
 117.45797301]
[85.       87.55     85.799    90.08895  95.494287]
[103.      100.94    105.987   112.34622]
[ 98.    102.9   109.074]
[105.  111.3]
[106.]])

Expected Output:

[123, 0.0, 0.0, 105.0, 0.0]
4
  • There is a problem: len(Values) -> 7 and len(Val_dates) -> 6 Commented Sep 9, 2021 at 21:33
  • Corrected it thanks for letting me know Commented Sep 9, 2021 at 21:53
  • You're welcome. Do you accept an answer with Pandas (tomorrow)? Commented Sep 9, 2021 at 21:57
  • Yea pandas could work, thanks for the help Commented Sep 9, 2021 at 22:17

1 Answer 1

2

You can use Pandas to get expected result:

  1. First convert to Pandas data structure:
df = pd.DataFrame({'dt': Val_dates, 'val': Values}).astype({'dt': 'datetime64'})
idx = pd.date_range(month_changes[0], month_changes[-1], freq='MS')
>>> df
                   dt                                                val
0 2015-11-01 01:03:00  [100, 123, 135.3, 139.05, 156.08, 163.88, 173.72]
1 2015-11-08 12:56:00            [100, 110, 113, 126.56, 132.89, 140.86]
2 2015-11-11 02:30:00                  [100, 103, 115.36, 121.13, 128.4]
3 2015-11-14 04:23:00                          [100, 112, 117.6, 124.66]
4 2016-02-11 02:00:00                                  [100, 105, 111.3]
5 2016-02-15 15:00:00                                         [100, 106]

>>> idx
DatetimeIndex(['2015-11-01', '2015-12-01', '2016-01-01', '2016-02-01',
               '2016-03-01'],
              dtype='datetime64[ns]', freq='MS')
  1. Group by month, keep the first row of the group and get the right index:
>>> df.groupby(pd.Grouper(freq='MS', key='dt'))['val'] \
      .apply(lambda x: x.head(1).squeeze()[len(x)-1] if len(x) else 0) \
      .reindex(idx, fill_value=0) \
      .tolist()

[139.05, 0.0, 0.0, 105.0, 0.0]

OR (without if/else)

>>> df.set_index('dt', drop=False).resample('MS')['val'] \
      .agg((len, 'first')).dropna(how='any') \
      .apply(lambda x: x['first'][x['len']-1], axis=1) \
      .reindex(idx, fill_value=0) \
      .tolist()

[139.05, 0.0, 0.0, 105.0, 0.0]

The first method is 3x times faster than the second one

Sign up to request clarification or add additional context in comments.

2 Comments

hello is there a way to convert the code to get the max values for the code above by any chance? So if the Values were updated on the example above as the Max Inputs the values of the Expected Output would be [123, 0.0, 0.0, 105.0, 0.0]
Hey could you look at this issue it is just like this one stackoverflow.com/questions/69124802/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.