0

I have two Pandas DataFrames, one where each column is a cumulative distribution (all entries between [0,1] and monotonically increasing) and second with the values associated to each cumulative distribution.

I need to access the values associated to different points in the cumulative distributions (percentiles). For example I could be interested in the percentiles [.1,.9] I'm finding the location of these percentiles in the DataFrame with the associated values by checking where in the first DataFrame I should insert the percentiles. This gives me a 2-d numpy array where each column has the location of the row for that column.

How can I use this array to access the values in the DataFrame? Is there a better way to access the values in one of the DataFrames based on where the percentile is located in the first DataFrame?

import pandas pd
import numpy as np

cdfs = pd.DataFrame([[.1,.2],[.4,.3],[.8,.7],[1.0,1.0]])
df1 = pd.DataFrame([[-10.0,-8.0],[1.4,3.3],[5.8,8.7],[11.0,15.0]])
percentiles = [0.15,0.75]
spots = np.apply_along_axis(np.searchsorted,0,cdfs,percentiles)

This does not work:

df1[spots]

Expected output:

[[1.4 -8.0]
 [5.8 15.0]]

This does work, but seems cumbersome:

output = pd.DataFrame(index=percentiles,columns=df1.columns)
for column in range(spots.shape[1]):
    output.loc[percentiles,column] = df1.loc[spots[:,column],column].values

1 Answer 1

1

try this:

df1.values[spots, [0, 1]]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.