1

Essentially, I have a multi-index that is first date-time, then integer value of the week number.

I want to select a week number, check some conditions of the data for that week, and if those conditions are satisfied, get the date-time level of the index.

Here's a simplified version:

# Dataframe creation
index = pd.date_range('1/1/2019', '1/4/2019')
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B':[3, 4, 5, 6], 'C':['a', 'a', 'b', 'b']}, index=index)

print(df)

            A  B  C
2019-01-01  1  3  a
2019-01-02  2  4  a
2019-01-03  3  5  b
2019-01-04  4  6  b

Then,

# make a multi-index with last column
df = df.set_index([df.index, 'C'])
print(df)

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4
2019-01-03 b  3  5
2019-01-04 b  4  6

Again, fine. According to this answer, I would expect that df.loc[['a']] would yield

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4

However, I get a key/index error. I instead tried df[df['C'] == 'a'], but also with a resulting key/index error.

So the psuedo-code for what I want to achieve is something like:

df = df.set_index([df.index, 'C'])

for value in 'C' level:
    check some condition on 'B' variable.
    if condition:
       get date level for this value

This seems like it would be fairly simple if I could figure out how to slice by specified index level... What am I missing?

Note:

I realize that I could just leave the C variable as a column and then do something like:

for c in df.C.unique():
    if (df[df.C == c].B >= 4).any():
        dates = df[df.C == c].index
        print(dates)

However, now I'm fixated on understanding multi-indexes just to learn Pandas better.

2 Answers 2

2

Quick fix: query works with level names:

df.query('C=="a"')

Output:

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4

A bit more details on multilevel indexing from the official doc. If you have multilevel index, you slice it by tuples (first_level, second_level):

df.loc[('2019/01/01', 'a')]

So you can get all the rows with a second-level value:

df.loc[(slice(None), 'a'), :]

and you get:

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4

or you can use xs:

df.xs('a', level='C')

which give (note that the second level index is gone):

            A  B
2019-01-01  1  3
2019-01-02  2  4

Finally, what you are trying to do for c in df.C.unique(): is more of groupby:

df.loc[df.groupby('C')['B'].transform('max')>=4].index

Output:

MultiIndex([('2019-01-01', 'a'),
            ('2019-01-02', 'a'),
            ('2019-01-03', 'b'),
            ('2019-01-04', 'b')],
           names=[None, 'C'])
Sign up to request clarification or add additional context in comments.

1 Comment

That's a great answer. Thank you. I considered groupby but I actually need the start and end dates for each week where there is matching criteria, hence the iteration idea.
0

I think what you want is:

df.loc[(slice(None), 'a'), :]

which will give you:

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4

This is based on Question 1b of this answer to the question Select rows in pandas MultiIndex DataFrame.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.