10

I am new here, ideally i would have commented this on the question from where i learned this usage of idxmax :

I used same approach and below is my code

df = pd.DataFrame(np.arange(16).reshape(4,4),columns=["A","B","C","D"],index=[0,1,2,3])

As soon as i use df[(df>6)] on this df these int values change to float?

        A   B   C   D
0   NaN NaN NaN NaN
1   NaN NaN NaN 7.0
2   8.0 9.0 10.0    11.0
3   12.0    13.0    14.0    15.0

Why does pandas do that? Also, i read somewhere i could use dtype=object on series , but are there some other ways to avoid such thing?

2

3 Answers 3

5

If you do want to have the int look like

df.astype(object).mask(df<=6)
Out[114]: 
     A    B    C    D
0  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN    7
2    8    9   10   11
3   12   13   14   15

You can looking for more information at here, and here

This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=object arrays instead.

More information about astype(object)

df.astype(object).mask(df<=6).applymap(type)
Out[115]: 
                 A                B                C                D
0  <class 'float'>  <class 'float'>  <class 'float'>  <class 'float'>
1  <class 'float'>  <class 'float'>  <class 'float'>    <class 'int'>
2    <class 'int'>    <class 'int'>    <class 'int'>    <class 'int'>
3    <class 'int'>    <class 'int'>    <class 'int'>    <class 'int'>
Sign up to request clarification or add additional context in comments.

Comments

4

The limitation is mostly with Numpy.

  • Numpy's ndarray can only be of a single type.
  • There does not exist an integer type null value.

So we end up with a dilemma when we do df[df > 6]. What is going to happen is Pandas is going to return a dataframe with values equal to df where df > 6 and null otherwise. But like I said, there isn't an integer null value. So we have a choice to make.

  1. Use None or np.nan for null values while making the entire ndarray of dtype==object
  2. Use np.nan as our null and make the entire array of dtype==float

Pandas chooses to make the arrays into float because keeping the values numeric will keep many of the advantages that come with numeric dtypes and their calculations.


Option 1
Use a fill value and pd.DataFrame.where

df.where(df > 6, -1)

    A   B   C   D
0  -1  -1  -1  -1
1  -1  -1  -1   7
2   8   9  10  11
3  12  13  14  15

Option 2
pd.DataFrame.stack and loc
By converting to a single dimension, we aren't forced to fill missing values in the rectangular grid with nulls.

df.stack().loc[lambda x: x > 6]

1  D     7
2  A     8
   B     9
   C    10
   D    11
3  A    12
   B    13
   C    14
   D    15
dtype: int64

Comments

2

In previous versions (<0.24.0) pandas indeed converted any int columns to floats, if even a single NaN was present. But not anymore, since Optional Nullable Integer Support is now officially added on pandas 0.24.0

pandas 0.24.x release notes Quote: "Pandas has gained the ability to hold integer dtypes with missing values.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.