I have two Pandas DataFrames, one where each column is a cumulative distribution (all entries between [0,1] and monotonically increasing) and second with the values associated to each cumulative distribution.
I need to access the values associated to different points in the cumulative distributions (percentiles). For example I could be interested in the percentiles [.1,.9] I'm finding the location of these percentiles in the DataFrame with the associated values by checking where in the first DataFrame I should insert the percentiles. This gives me a 2-d numpy array where each column has the location of the row for that column.
How can I use this array to access the values in the DataFrame? Is there a better way to access the values in one of the DataFrames based on where the percentile is located in the first DataFrame?
import pandas pd
import numpy as np
cdfs = pd.DataFrame([[.1,.2],[.4,.3],[.8,.7],[1.0,1.0]])
df1 = pd.DataFrame([[-10.0,-8.0],[1.4,3.3],[5.8,8.7],[11.0,15.0]])
percentiles = [0.15,0.75]
spots = np.apply_along_axis(np.searchsorted,0,cdfs,percentiles)
This does not work:
df1[spots]
Expected output:
[[1.4 -8.0]
[5.8 15.0]]
This does work, but seems cumbersome:
output = pd.DataFrame(index=percentiles,columns=df1.columns)
for column in range(spots.shape[1]):
output.loc[percentiles,column] = df1.loc[spots[:,column],column].values