What is the rationale between bool and boolean Dtype in Pandas?
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'col1': [True, False, False]}, dtype='bool')
print(df1)
print(df1.info())
print()
df2 = pd.DataFrame({'col1': [True, False, None]}, dtype='bool')
print("df2")
print(df2)
print(df2.info())
print()
df3 = pd.DataFrame({'col1': [True, False, np.nan]}, dtype='bool')
print("df3")
print(df3)
print(df3.info())
print()
df4 = pd.DataFrame({'col1': [True, False, None, np.nan]}, dtype='bool')
print("df4")
print(df4)
print(df4.info())
print()
df5 = pd.DataFrame({'col1': [True, False, False]}, dtype='boolean')
print("df5")
print(df5)
print(df5.info())
print()
df6 = pd.DataFrame({'col1': [True, False, None]}, dtype='boolean')
print("df6")
print(df6)
print(df6.info())
print()
df7 = pd.DataFrame({'col1': [True, False, np.nan]}, dtype='boolean')
print("df7")
print(df7)
print(df7.info())
print()
df8 = pd.DataFrame({'col1': [True, False, None, np.nan]}, dtype='boolean')
print("df8")
print(df8)
print(df8.info())
Why None and np.nan are treated differently for bool and boolean Dtype? What is the rationale behind it?
If bool, both None and np.nan are treated as not null where None is False and np.nan is True.
However, both are treated as null values <NA> if boolean.
df1
col1
0 True
1 False
2 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null bool
dtypes: bool(1)
memory usage: 135.0 bytes
None
df2
col1
0 True
1 False
2 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null bool
dtypes: bool(1)
memory usage: 135.0 bytes
None
df3
col1
0 True
1 False
2 True
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null bool
dtypes: bool(1)
memory usage: 135.0 bytes
None
df4
col1
0 True
1 False
2 False
3 True
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 4 non-null bool
dtypes: bool(1)
memory usage: 136.0 bytes
None
df5
col1
0 True
1 False
2 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null boolean
dtypes: boolean(1)
memory usage: 138.0 bytes
None
df6
col1
0 True
1 False
2 <NA>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 2 non-null boolean
dtypes: boolean(1)
memory usage: 138.0 bytes
None
df7
col1
0 True
1 False
2 <NA>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 2 non-null boolean
dtypes: boolean(1)
memory usage: 138.0 bytes
None
df8
col1
0 True
1 False
2 <NA>
3 <NA>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 2 non-null boolean
dtypes: boolean(1)
memory usage: 140.0 bytes
None