2

Using Pandas 2.3.2 on Python 3.9.2 via JupyterLab.

I've collected a bunch of thermal data from a thing. I've already collated that data into DataFrame chunks that look like this:

     zone      data     Setpoint
9    zone1   40.34347       40
13   zone1   40.07553       40
17   zone1   39.98359       40
21   zone1   40.06895       40
25   zone1   40.04465       40
..     ...        ...      ...
952  zone4  109.91890      110
956  zone4  109.90520      110
960  zone4  110.00600      110
964  zone4  110.02160      110
968  zone4  109.94940      110

Then I've used groupby and mean() to, well, group and create means:

means = temps[['zone','Setpoint','data']].groupby(['zone','Setpoint']).mean()

                      data
zone  Setpoint            
zone1 40         40.050959
      50         50.030125
      60         60.066517
      70         70.050257
      80         80.045247
      90         90.071484
      100       100.032826
      110       110.137990
zone3 40         39.990645
      50         50.015407
      60         60.053120
      70         70.044470
      80         80.043304
      90         90.077433
      100       100.070493
      110       110.140510
zone4 40         40.048473
      50         50.017906
      60         60.037044
      70         70.012458
      80         80.034280
      90         90.087850
      100       100.047793
      110       110.067390

(Aside: I notice how "data" is a line above "zone" and "Setpoint", and I don't know what that's trying to tell me about the structure)

I can grab one zone's worth of info by doing z1means = means.loc['zone1']:

                data
Setpoint            
40         40.050959
50         50.030125
60         60.066517
70         70.050257
80         80.045247
90         90.071484
100       100.032826
110       110.137990

What I can't figure out is how to get at, e.g., the "data" value where "Setpoint" is 80:

*some undiscovered syntax*

80.045247

I've come to understand that the "Setpoint" value is the index on the DataFrame object held in z1means but can't figure out how to refer to the data using it:

<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, 40 to 110
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   data    8 non-null      float64
dtypes: float64(1)
memory usage: 128.0 bytes
None

I can only imagine that the Index: 8 entries, 40 to 110 is telling me something important, but I can't figure out what that is or how to use it.

Things I've tried and their results:

z1means[80] -> KeyError: 80
z1means.at(80) -> TypeError: '_AtIndexer' object is not callable
z1means.loc(80) -> KeyError: 80, ValueError: No axis named 80 for object type DataFrame
z1means.iloc(80) -> Same as .loc 
z1means.loc(z1means['Setpoint'] == 80) -> KeyError: Setpoint

What am I just not grasping here?

2 Answers 2

2

Tried one more thing and stumbled across the answer.

Apparently the elevated "data" label was telling me something, and I can get to the value I want like this:

z1means['data'][80]

80.045247

(Or without the intermediate step: means.loc['zone1']['data'][80])

Sign up to request clarification or add additional context in comments.

Comments

0

First thing, to get zone1, you need to use loc:

means.loc['zone1']

Now, to access an arbitrary level, you can use a cross-section (xs):

means.xs(80, level='Setpoint')

Output:

            data
zone            
zone1  80.045247
zone3  80.043304
zone4  80.034280

And for a specific zone and Setpoint:

means.loc[('zone1', 80)]
# data    80.045247
# Name: (zone1, 80), dtype: float64

means.loc[('zone1', 80), 'data']
# 80

7 Comments

I think my stumbled-upon solution (means.loc['zone1']['data'][80]) is a simpler(?) syntax to what you've described.
Not once you understand how pandas indexing works. And it also is much less efficient since each slice will generate an intermediate. You should directly request the correct combination of index/col, it's much more explicit IMO.
I'll grant I have an extremely shallow understanding of indexing in pandas, not to mention what is vs isn't efficient.
Then I recommend to read this page of doc and then this one for MultiIndexes
Note that you could also have used means['data']['zone1'][80] (or means.loc['zone1'].loc[80]['data'])
...and this is why pandas drives me crazy in general. <del>What's intuitive about thing['x']['y'] and thing['y']['x'] being the same?</del> Ok I see the slight syntax difference. But that's just me editorializing. Thanks for your explanations.
(I was using .loc to get zone1...just transcribed my work incorrectly. Edited the question to correct the omission)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.