1

What is the rationale between bool and boolean Dtype in Pandas?

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'col1': [True, False, False]}, dtype='bool')
print(df1)
print(df1.info())
print()

df2 = pd.DataFrame({'col1': [True, False, None]}, dtype='bool')
print("df2")
print(df2)
print(df2.info())
print()

df3 = pd.DataFrame({'col1': [True, False, np.nan]}, dtype='bool')
print("df3")
print(df3)
print(df3.info())
print()

df4 = pd.DataFrame({'col1': [True, False, None, np.nan]}, dtype='bool')
print("df4")
print(df4)
print(df4.info())
print()

df5 = pd.DataFrame({'col1': [True, False, False]}, dtype='boolean')
print("df5")
print(df5)
print(df5.info())
print()

df6 = pd.DataFrame({'col1': [True, False, None]}, dtype='boolean')
print("df6")
print(df6)
print(df6.info())
print()

df7 = pd.DataFrame({'col1': [True, False, np.nan]}, dtype='boolean')
print("df7")
print(df7)
print(df7.info())
print()

df8 = pd.DataFrame({'col1': [True, False, None, np.nan]}, dtype='boolean')
print("df8")
print(df8)
print(df8.info())

Why None and np.nan are treated differently for bool and boolean Dtype? What is the rationale behind it? If bool, both None and np.nan are treated as not null where None is False and np.nan is True. However, both are treated as null values <NA> if boolean.

df1
        col1
    0   True
    1  False
    2  False
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   col1    3 non-null      bool 
    dtypes: bool(1)
    memory usage: 135.0 bytes
    None
    
    df2
        col1
    0   True
    1  False
    2  False
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   col1    3 non-null      bool 
    dtypes: bool(1)
    memory usage: 135.0 bytes
    None
    
    df3
        col1
    0   True
    1  False
    2   True
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   col1    3 non-null      bool 
    dtypes: bool(1)
    memory usage: 135.0 bytes
    None
    
    df4
        col1
    0   True
    1  False
    2  False
    3   True
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 4 entries, 0 to 3
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   col1    4 non-null      bool 
    dtypes: bool(1)
    memory usage: 136.0 bytes
    None
    
    df5
        col1
    0   True
    1  False
    2  False
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   col1    3 non-null      boolean
    dtypes: boolean(1)
    memory usage: 138.0 bytes
    None
    
    df6
        col1
    0   True
    1  False
    2   <NA>
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   col1    2 non-null      boolean
    dtypes: boolean(1)
    memory usage: 138.0 bytes
    None
    
    df7
        col1
    0   True
    1  False
    2   <NA>
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   col1    2 non-null      boolean
    dtypes: boolean(1)
    memory usage: 138.0 bytes
    None
    
    df8
        col1
    0   True
    1  False
    2   <NA>
    3   <NA>
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 4 entries, 0 to 3
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   col1    2 non-null      boolean
    dtypes: boolean(1)
    memory usage: 140.0 bytes
    None
0

1 Answer 1

0
  • None vs np.nan

The following values are considered false:

  • None
  • False
  • zero of any numeric type, for example, 0, 0L, 0.0, 0j.
  • any empty sequence, for example, '', (), [].
  • any empty mapping, for example, {}.
  • instances of user-defined classes, if the class defines a nonzero() or len() method, when that method returns the integer zero or bool value False. 1 All other values are considered true — so objects of many types are always true.

Source: https://docs.python.org/2/library/stdtypes.html#truth-value-testing

So by these conventions None is False and np.nan is True.

  • boolean vs bool

Boolean dtype implements Kleene Logic (sometimes called three-value logic).

Source: https://pandas.pydata.org/docs/user_guide/boolean.html

For example, True | NA gives True because NA can be True or False and in both case the OR operation (|) will result to True because we have at least one True. Similarly, False | NA gives NA because we don't know if there is one True.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.