2

I would like to filter only values below 10.000.000 in a column "Size" in a dataframe.

The dataframe example is below (original file is much larger):

        N       Ret  upside_tri        Size
0      77  0.000000      5.2256  58,019,065
1      77  0.000000      1.3836     969,692
2      77  0.000000      1.3543  12,792,661
3      77  0.000000      0.8839   5,721,553
4      77  0.000000      0.5477   6,984,648

In order to filter column "Size" with only values below 10.000.000, I am running the following code:

df = df[df.iloc[:, 3] < 10000000]

When I run the code to filter the dataframe with the criteria above, I keep receiving the error '<' not supported between instances of 'str' and 'int'.

Column "Size" only contains integer numbers, so it really does not make sense to me this error.

1 Answer 1

3

The column "Size" is of type str. Try to convert it to integer first:

df["Size"] = df["Size"].str.replace(",", "").astype(int)
print(df[df.iloc[:, 3] < 10000000])

Prints:

    N  Ret  upside_tri     Size
1  77  0.0      1.3836   969692
3  77  0.0      0.8839  5721553
4  77  0.0      0.5477  6984648

Or:

mask = df["Size"].str.replace(",", "").astype(int) < 10000000
print(df.loc[mask])

Prints:

    N  Ret  upside_tri       Size
1  77  0.0      1.3836    969,692
3  77  0.0      0.8839  5,721,553
4  77  0.0      0.5477  6,984,648
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! The problem was my excel (csv) format for the column I was trying to filter. I changed the format to general and it worked, but I only found this out because of your comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.