0

I'm new to python and pandas.

I'm trying to code something in which I compared the second last row of a pandas dataframe indexed by datetime with the last row of the dataframe. However, I am having trouble trying to get the index to compare properly.

For example, for a dataframe

dti = pd.date_range('2018-01-01', periods=10, freq='D')
df = pd.DataFrame(np.random.randn(10, 4), index=dti, columns=['value', 'value1', 'value2', 'value3'])

Produces a dataframe

               value    value1    value2    value3
2018-01-01  0.249018 -0.630937  1.120733  0.343331
2018-01-02 -0.347979  0.218776 -0.327967  0.569075
2018-01-03  0.879929  0.316554  0.159652 -0.872002
2018-01-04 -1.492683 -0.560168 -0.871728  1.739402
2018-01-05 -3.187776 -1.399635  0.873503 -0.098537
2018-01-06 -0.482267 -0.174304  0.821850  1.530861
2018-01-07  0.385600 -0.395019 -0.507524 -0.033996
2018-01-08  1.358576 -2.107309  0.956507 -0.360288
2018-01-09 -0.283369  0.630684  1.148554 -1.907604
2018-01-10  0.896697  0.030671  0.131533 -1.325552

I'm trying to compare

if df.loc[last_row, ['value']] > df.loc[second_last_row, ['value']]:
# Compare to see if 'value' in 2018-01-10 is higher than 'value' in 2018-01-09

I have tried using

length = len(df)
if if df.loc[length-1, ['value']] > df.loc[length-2, ['value']]:

but I get an error

TypeError: cannot do index indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [5] of <class 'int'>

It seems like the DatetimeIndex cannot me manipulated the same way as regular integer indices.

I have also considered using

df['value'].tail(1)

but I'm not sure how I would be able to access the second last row using the .tail method.

I need advice on how I would be able to retrieve data from the tail end of my dataframe based on its distance from the end of the dataframe. Thanks!

1
  • kindly read this guide on how to frame questions. i feel u should make some changes, by including an input dataframe, and an expected output. Commented Apr 21, 2020 at 3:23

1 Answer 1

1

Use:

np.random.seed(2020)
dti = pd.date_range('2018-01-01', periods=10, freq='D')
df = pd.DataFrame(np.random.randn(10, 4), index=dti, 
                  columns=['value','value1','value2','value3'])
print (df)
               value    value1    value2    value3
2018-01-01 -1.768846  0.075552 -1.130630 -0.651430
2018-01-02 -0.893116 -1.274101 -0.061154  0.064514
2018-01-03  0.410113 -0.572882 -0.801334  1.312035
2018-01-04  1.274699 -1.214358  0.313719 -1.444821
2018-01-05 -0.368961 -0.769227  0.392616  0.057294
2018-01-06  2.089979  0.041971 -0.048341 -0.513154
2018-01-07 -0.084589 -1.215450 -1.412931 -1.486911
2018-01-08  0.382225  0.937673  1.772678  0.878828
2018-01-09  0.331719 -0.306036  1.240266 -0.215627
2018-01-10  0.155929  0.098056  0.832096  2.045205

If want select last value of column is possible use indexing of DatetimeIndex like:

print (df.loc[df.index[-1], 'value'])
0.15592948059188771

print (df.loc[df.index[-2], 'value'])
0.3317191223642817

df.loc[df.index[-1], 'value'] > df.loc[df.index[-2], 'value']

Your code should be changed by select by positions by DataFrame.iloc, only necessary pass position of column value by Index.get_loc:

length = len(df)
pos = df.columns.get_loc('value')

print (df.iloc[length-1, pos])
0.15592948059188771

print (df.iloc[length-2, pos])
0.3317191223642817

df.iloc[length-1, pos] > df.iloc[length-2, pos]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.