Slicing Pandas DataFrame with an array of integers specifying location

Question

I have two Pandas DataFrames, one where each column is a cumulative distribution (all entries between [0,1] and monotonically increasing) and second with the values associated to each cumulative distribution.

I need to access the values associated to different points in the cumulative distributions (percentiles). For example I could be interested in the percentiles [.1,.9] I'm finding the location of these percentiles in the DataFrame with the associated values by checking where in the first DataFrame I should insert the percentiles. This gives me a 2-d numpy array where each column has the location of the row for that column.

How can I use this array to access the values in the DataFrame? Is there a better way to access the values in one of the DataFrames based on where the percentile is located in the first DataFrame?

import pandas pd
import numpy as np

cdfs = pd.DataFrame([[.1,.2],[.4,.3],[.8,.7],[1.0,1.0]])
df1 = pd.DataFrame([[-10.0,-8.0],[1.4,3.3],[5.8,8.7],[11.0,15.0]])
percentiles = [0.15,0.75]
spots = np.apply_along_axis(np.searchsorted,0,cdfs,percentiles)

This does not work:

df1[spots]

Expected output:

[[1.4 -8.0]
 [5.8 15.0]]

This does work, but seems cumbersome:

output = pd.DataFrame(index=percentiles,columns=df1.columns)
for column in range(spots.shape[1]):
    output.loc[percentiles,column] = df1.loc[spots[:,column],column].values

HYRY · Accepted Answer · 2015-09-03 07:24:20Z

1

try this:

df1.values[spots, [0, 1]]

answered Sep 3, 2015 at 7:24

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Slicing Pandas DataFrame with an array of integers specifying location

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related