0

I have two columns in a Pandas df that I would like to operate on. First, I would like to remove non-numeric values such as "High" from the column "score" and cast the remaining values as int (all data was input as strings). Next I would like to sum "score" based on unique "measure_id". How can I perform these two functions?

The df is:

nationwide_measures = pd.read_sql_query("""select state,
          measure_id,
          measure_name,
          score
from timely_and_effective_care___hospital;""", conn)

My failed attempt is:

 nationwide_measures1 = nationwide_measures.to_numeric(nationwide_measures{:,'score'}, errors='coerce')
2
  • what do you want the "score" to be after removing "high"? Do you want to remove the entire row? Commented Jul 17, 2017 at 15:42
  • Yes, if the score value is non-numeric- the tuple should be ignored. Commented Jul 17, 2017 at 15:43

2 Answers 2

0

You can select all the nationwide_measure rows where the score value is numeric and I hope they are in the string format so convert them to int and then use groupby to aggregate the scores based on measure_id.

nationwide_measures1 = nationwide_measures[nationwide_measures['score'].str.isalpha() != True]
nationwide_measures1['score'] = pd.to_numeric(nationwide_measures1['score'])
score_sum = nationwide_measures1.groupby('measure_id')['score'].sum()

Hope this helps Update: If you want sum,mean,min,max,std you can use .agg i.e

import numpy as np
score_sum = nationwide_measures1.groupby('measure_id')['score'].agg([pd.np.sum,pd.np.min, pd.np.max, pd.np.mean, pd.np.std])
Sign up to request clarification or add additional context in comments.

6 Comments

The only problem is that there are other string values that need to be removed as well. I need to amend your suggestion with something like "!= .isalpha()" Any thoughts?
"AttributeError: 'StringMethods' object has no attribute 'isaplha'"
That ran without a stack trace but the dataframe is still the same size.
This answer is not correct. Upon closer inspection, the score_sum values were a concatenation of the actual string values, not their sum. Your answer does not convert the 'score' values to int, which is what I need to do before I can apply the min,max,avg,stdev functions on the df
Yes, next I would like to compute the min, max, average, and stdev for each measure- can you help with that?
|
0

The answer to deleting tuples with non-numeric score values is:

nationwide_measures1 = nationwide_measures[nationwide_measures['score'].astype(str).str.isdigit()]

I found it here: Pandas select only numeric or integer field from dataframe

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.