2

Suppose there is a data frame as follows:

df = {
'Period': [1996,'Jan','Feb','March',1997,'Jan','Feb','March',1998,'Jan','Feb','March']
'Some-Values': [,'a','b','c',,'d','e','f',,'g',h','i']
}

and the rows between the values 1996 and 1997 needs to be extracted such that the resulting data frame is as follows:

df_res = {
    'Period': ['Jan','Feb','March']
    'Some-Values': ['a','b','c']
}

I am currently trying Pandas for this but am unable to find a solution.

1
  • Can anyone please do this in R? Commented Dec 6, 2018 at 6:31

2 Answers 2

2

Try to change your dataframe into "correct" way , then we can getting the information by using year information

df['Year']=df.loc[df['Some-Values']=='','Period']
df.Year=df.Year.ffill()
df=df.loc[df.Period!=df.Year,:]
df.loc[df.Year==1996,:]
Out[651]: 
  Period Some-Values  Year
1    Jan           a  1996
2    Feb           b  1996
3  March           c  1996
Sign up to request clarification or add additional context in comments.

Comments

1

One way via pd.Series.idxmax and pd.DataFrame.iloc:

df = pd.DataFrame({'Period': [1996,'Jan','Feb','March',1997,'Jan','Feb',
                              'March',1998,'Jan','Feb','March'],
                   'Some-Values': ['','a','b','c','','d','e','f','','g','h','i']})

res = df.iloc[(df['Period'] == 1996).idxmax()+1:(df['Period'] == 1997).idxmax()]

print(res)

  Period Some-Values
1    Jan           a
2    Feb           b
3  March           c

For readability, you can use a slice object:

slicer = slice((df['Period'] == 1996).idxmax()+1,
               (df['Period'] == 1997).idxmax())

res = df.iloc[slicer]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.