4

I created a dataframe as :

df1 = pandas.read_csv(ifile_name,  header=None,  sep=r"\s+",  usecols=[0,1,2,3,4],
                              index_col=[0,1,2], names=["year", "month", "day", "something1", "something2"])

now I would like to create another dataframe where year>2008. Hence I tried :

df2 = df1[df1.year>2008]

But getting error :

AttributeError: 'DataFrame' object has no attribute 'year'

I guess, it is not seeing the "year" among the columns because I defined it within index. But how can I get data based on year>2008 in that case?

3 Answers 3

7

Get the level by name using MultiIndex.get_level_values and create a boolean mask for row selection:

df2 = df1[df1.index.get_level_values('year') > 2008]

If you plan to make modifications, create a copy of df1 so as to not operate on the view.

df2 = df1[df1.index.get_level_values('year') > 2008].copy()
Sign up to request clarification or add additional context in comments.

1 Comment

Will making modification to a view df2 affect the original df1 in any way? If not, why make a copy? Thank you.
4

You are correct that year is an index rather than a column. One solution is to use pd.DataFrame.query, which lets you use index names directly:

df = pd.DataFrame({'year': [2005, 2010, 2015], 'value': [1, 2, 3]})
df = df.set_index('year')

res = df.query('year > 2008')

print(res)

      value
year       
2010      2
2015      3

Comments

3

Assuming your index is sorted

df.loc[2008:]
Out[259]: 
      value
year       
2010      2
2015      3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.