0

I have a pandas dataframe as follows, with four columns. How would one trim the dataset based on the values in fourth column. The fourth column header is "isValid"

Input:

      X     Y    I  isValid
    -60.3 -15.63 25 1
    -60.2 -15.63 10 1
    -60.1 -15.63 0 0
    -60.0 -28.23 0 0
    -59.8 -28.23 25 1
    -59.7 -28.23 15 1
    -59.7 -28.23 0 1

Output - 1 :

X    Y     I
-60.3 -15.63 25 
-60.2 -15.63 10 
-59.8 -28.23 25 
-59.7 -28.23 15 
-59.7 -28.23 0 

Edit: I was able to achieve Output 1, by using something as follows:

df = df.loc[df['isValid'] == 1]

Output 2:

For a given value in second column, average the third column values.

   Y      I
 -15.63 (25+10)/2 
 -28.23 (25+15)/2

I am presently converting everything into numpy arrays and working with loops. Hoping there is a much simpler way.

4
  • Thanks for the comment. Right after I asked the question, I figured that out. However, the process to get output 2 is still a mystery for me. Commented Sep 21, 2021 at 22:37
  • What are your column headers for this dataframe? Commented Sep 21, 2021 at 22:38
  • Added the headers Commented Sep 21, 2021 at 22:40
  • @Jesh Kundem this looks like spatial data. You cant handle coordinates like you are suggesting. Averaging on a latitude or longitude cant make the data meaningful Commented Sep 21, 2021 at 23:53

1 Answer 1

3

Try:

df = pd.read_clipboard()

dfm = df[df['isValid'] == 1]

df_out = dfm.groupby('Y', as_index=False)['I'].mean()

Output:

       Y     I
0 -28.23  20.0
1 -15.63  17.5
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for the answer Scott. I believe I missed one aspect. Which is there are zero values in 'I' column, and they should not averaged. I edited the question accordingly.
@JeshKundem dfm does eliminate the isValid == 0. Hence, the mean of I is calculated with out the isValid equal to zero. dfm is a filter version of df where isValid equals to 1. Then I am using dfm to calculate mean, therefore isValid equals to zero are not considered in the average calculation.
@ Scott Boston - I mean the Zero values in "I" column. However I am accepting the answer, as I was able to eliminate them by adding another criteria df['I']!=0
@JeshKundem Thanks. Happy Coding. Be safe and stay healthy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.