Select named index level from pandas DataFrame MultiIndex

Question

I created a dataframe as :

df1 = pandas.read_csv(ifile_name,  header=None,  sep=r"\s+",  usecols=[0,1,2,3,4],
                              index_col=[0,1,2], names=["year", "month", "day", "something1", "something2"])

now I would like to create another dataframe where year>2008. Hence I tried :

df2 = df1[df1.year>2008]

But getting error :

AttributeError: 'DataFrame' object has no attribute 'year'

I guess, it is not seeing the "year" among the columns because I defined it within index. But how can I get data based on year>2008 in that case?

cs95 · Accepted Answer · 2018-08-20 01:21:34Z

7

Get the level by name using MultiIndex.get_level_values and create a boolean mask for row selection:

df2 = df1[df1.index.get_level_values('year') > 2008]

If you plan to make modifications, create a copy of df1 so as to not operate on the view.

df2 = df1[df1.index.get_level_values('year') > 2008].copy()

answered Aug 20, 2018 at 1:21

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Confounded Over a year ago

Will making modification to a view df2 affect the original df1 in any way? If not, why make a copy? Thank you.

jpp · Accepted Answer · 2018-08-20 01:49:08Z

4

You are correct that year is an index rather than a column. One solution is to use pd.DataFrame.query, which lets you use index names directly:

df = pd.DataFrame({'year': [2005, 2010, 2015], 'value': [1, 2, 3]})
df = df.set_index('year')

res = df.query('year > 2008')

print(res)

      value
year       
2010      2
2015      3

answered Aug 20, 2018 at 1:49

jpp

166k37 gold badges301 silver badges363 bronze badges

Comments

BENY · Accepted Answer · 2018-08-20 02:05:38Z

3

Assuming your index is sorted

df.loc[2008:]
Out[259]: 
      value
year       
2010      2
2015      3

answered Aug 20, 2018 at 2:05

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Select named index level from pandas DataFrame MultiIndex

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related