3

I have this dataframe:

      x        y        z        parameter     
0     26       24       25       Age
1     35       37       36       Age  
2     57       52       54.5     Age
3     160      164      162      Hgt           
4     182      163      172.5    Hgt             
5     175      167      171      Hgt              
6     95       71       83       Wgt     
7     110      68       89       Wgt     
8     89       65       77       Wgt 

I'm using pandas to get this final result:

      x        y        parameter     
0     160      164      Hgt           
1     182      163      Hgt             
2     175      167      Hgt 

I'm using groupby() to extract and isolate rows based on same parameter Hgt from the original dataframe

First, I added a column to set it as an index:

df = df.insert(0,'index', [count for count in range(df.shape[0])], True)

And the dataframe came out like this:

      index    x        y        z        parameter     
0     0        26       24       25       Age
1     1        35       37       36       Age  
2     2        57       52       54.5     Age
3     3        160      164      162      Hgt           
4     4        182      163      172.5    Hgt             
5     5        175      167      171      Hgt              
6     6        95       71       83       Wgt     
7     7        110      68       89       Wgt     
8     8        89       65       77       Wgt 

Then, I used the following code to group based on index and extract the columns I need:

df1 = df.groupby('index')[['x', 'y','parameter']]

And the output was:

      x        y        parameter     
0     26       24       Age
1     35       37       Age  
2     57       52       Age
3     160      164      Hgt           
4     182      163      Hgt             
5     175      167      Hgt              
6     95       71       Wgt     
7     110      68       Wgt     
8     89       65       Wgt   

After that, I used the following code to isolate only Hgt values:

df2 = df1[df1['parameter'] == 'Hgt']

When I ran df2, I got an error saying:

IndexError: Column(s) ['x', 'y', 'parameter'] already selected

Am I missing something here? What to do to get the final result?

2 Answers 2

2

Because you asked what you did wrong, let me point to useless/bad code.

Without any judgement (this is just to help you improve future code), almost everything is incorrect. It feels like a succession of complicated ways to do useless things. Let me give some details:

df = df.insert(0,'index', [count for count in range(df.shape[0])], True)

This seems a very convoluted way to do df.reset_index(). Even [count for count in range(df.shape[0])] could be have been simplified by using range(df.shape[0]) directly.

But this step is not even needed for a groupby as you can group by index level:

df.groupby(level=0)

But... the groupby is useless anyways as you only have single membered groups.

Also, when you do:

df1 = df.groupby('index')[['x', 'y','parameter']]

df1 is not a dataframe but a DataFrameGroupBy object. Very useful to store in a variable when you know what you're doing, this is however causing the error in your case as you thought this was a DataFrame. You need to apply an aggregation or transformation method of the DataFrameGroupBy object to get back a DataFrame, which you didn't (likely because, as seen above, there isn't much interesting to do on single-membered groups).

So when you run:

df1[df1['parameter'] == 'Hgt']

again, all is wrong as df1['parameter'] is equivalent to df.groupby('index')[['x', 'y','parameter']]['parameter'] (the cause of the error as you select twice 'parameter'). Even if you removed this error, the equality comparison would give a single True/False as you still have your DataFrameGroupBy and not a DataFrame, and this would incorrectly try to subselect an inexistent column of the DataFrameGroupBy.

I hope it helped!

Sign up to request clarification or add additional context in comments.

1 Comment

It does help. Thank you very much! From people like you, we learn :)
1

Do you really need groupby?

>>> df.loc[df['parameter'] == 'Hgt', ['x', 'y', 'parameter']].reset_index(drop=True)
     x    y parameter
0  160  164       Hgt
1  182  163       Hgt
2  175  167       Hgt

2 Comments

With your answer? I guess not! Thank you very much! Let's say, I need to use groupby() to my dataframe. Would you kindly show me what did I do wrong?
I don't know why you need to use groupby. Take the time to read the excellent explanation of @mozway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.