I am scraping some data from an Excel file and processing it in python. However, the data in the column appear to have some strings while I need them to be integers. I am trying to sort the data but it gives me the error because it is trying to sort numbers on a string.
I am trying to count the number of murders committed by each age in the file.
This is my code to do so.
xl = pd.ExcelFile('Murders.xlsx')
df = xl.parse('Sheet1')
#df = df[df["Perpetrator Age"].ne("Blanks")]
age = df['Perpetrator Age']
#print(df["Perpetrator Age"].dtype)
freq1 = collections.Counter(df['Perpetrator Age'].sort_values())
freq = [{'Perpetrator_Age': m, 'Freq': f} for m, f in freq1.items()]
file = open("MurderPerpAge.js", "w+")
file.write(json.dumps(freq))
file.close()
I have tried using the Filter button built into Excel however there still appear to be strings in the data. This is the error/output:
TypeError: '<' not supported between instances of 'int' and 'str'
I expect the output to be ordered by the age as shown in the example below
[{"Perpetrator_Age": 15, "Freq": 5441}, {"Perpetrator_Age": 17, "Freq": 14196},...