Manipulate pandas dataframe based on values in column

Question

I have a pandas dataframe as follows, with four columns. How would one trim the dataset based on the values in fourth column. The fourth column header is "isValid"

Input:

      X     Y    I  isValid
    -60.3 -15.63 25 1
    -60.2 -15.63 10 1
    -60.1 -15.63 0 0
    -60.0 -28.23 0 0
    -59.8 -28.23 25 1
    -59.7 -28.23 15 1
    -59.7 -28.23 0 1

Output - 1 :

X    Y     I
-60.3 -15.63 25 
-60.2 -15.63 10 
-59.8 -28.23 25 
-59.7 -28.23 15 
-59.7 -28.23 0

Edit: I was able to achieve Output 1, by using something as follows:

df = df.loc[df['isValid'] == 1]

Output 2:

For a given value in second column, average the third column values.

   Y      I
 -15.63 (25+10)/2 
 -28.23 (25+15)/2

I am presently converting everything into numpy arrays and working with loops. Hoping there is a much simpler way.

Thanks for the comment. Right after I asked the question, I figured that out. However, the process to get output 2 is still a mystery for me. — Jesh Kundem
– Jesh Kundem, Commented Sep 21, 2021 at 22:37
@Jesh Kundem this looks like spatial data. You cant handle coordinates like you are suggesting. Averaging on a latitude or longitude cant make the data meaningful — wwnde
– wwnde, Commented Sep 21, 2021 at 23:53

Scott Boston · Accepted Answer · 2021-09-21 22:44:30Z

3

Try:

df = pd.read_clipboard()

dfm = df[df['isValid'] == 1]

df_out = dfm.groupby('Y', as_index=False)['I'].mean()

Output:

       Y     I
0 -28.23  20.0
1 -15.63  17.5

answered Sep 21, 2021 at 22:44

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jesh Kundem Over a year ago

Thank you for the answer Scott. I believe I missed one aspect. Which is there are zero values in 'I' column, and they should not averaged. I edited the question accordingly.

Scott Boston Over a year ago

@JeshKundem dfm does eliminate the isValid == 0. Hence, the mean of I is calculated with out the isValid equal to zero. dfm is a filter version of df where isValid equals to 1. Then I am using dfm to calculate mean, therefore isValid equals to zero are not considered in the average calculation.

Jesh Kundem Over a year ago

@ Scott Boston - I mean the Zero values in "I" column. However I am accepting the answer, as I was able to eliminate them by adding another criteria df['I']!=0

Scott Boston Over a year ago

@JeshKundem Thanks. Happy Coding. Be safe and stay healthy.

Collectives™ on Stack Overflow

Manipulate pandas dataframe based on values in column

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related