pandas: remove rows with missing data

Question

I am using the following code to remove some rows with missing data in pandas:

df = df.replace(r'^\s+$', np.nan, regex=True)
df = df.replace(r'^\t+$', np.nan, regex=True)
df = df.dropna()

However, I still have some cells in the data frame looks blank/empty. Why is this happening? Any way to get rid of rows with such empty/blank cells? Thanks!

Can you show us samples of the dataframe so that we can reproduce the problems — Sreeram TP
– Sreeram TP, Commented Aug 22, 2018 at 5:30
Like @jezrael said, try adding df = df.replace('', np.nan, regex=True) before dropna in your code — Sreeram TP
– Sreeram TP, Commented Aug 22, 2018 at 5:31
Possible duplicate of Python Pandas DataFrame remove Empty Cells — user3483203
– user3483203, Commented Aug 22, 2018 at 5:43

jezrael · Accepted Answer · 2018-08-22 05:34:55Z

4

You can use:

df = df.replace('', np.nan)

If want simplify your code is possible join regexes by | and for empty space use ^$:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':['',5,4,5,5,4],
                   'C':['','  ','   ',4,2,3],
                   'D':[1,3,5,7,'       ',0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

df = df.replace(r'^\s+$|^\t+$|^$', np.nan, regex=True)
print (df)
   A    B    C    D  E  F
0  a  NaN  NaN  1.0  5  a
1  b  5.0  NaN  3.0  3  a
2  c  4.0  NaN  5.0  6  a
3  d  5.0  4.0  7.0  9  b
4  e  5.0  2.0  NaN  2  b
5  f  4.0  3.0  0.0  4  b

answered Aug 22, 2018 at 5:34

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jalazbe · Accepted Answer · 2018-08-22 05:35:05Z

2

Depending on your version of pandas you may do:

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False) axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Determine if rows or columns which contain missing values are removed.

0, or ‘index’ : Drop rows which contain missing values. 1, or ‘columns’ : Drop columns which contain missing value.
Deprecated since version 0.23.0:: Pass tuple or list to drop on multiple
axes. source

So, for now to drop rows with empty values

df = df.dropna(axis=0)

Should work

answered Aug 22, 2018 at 5:35

jalazbe

2,0274 gold badges26 silver badges45 bronze badges

Comments

halfer · Accepted Answer · 2024-01-27 00:13:05Z

I'm providing code with input and output data:

Input:

Original DataFrame:
Name   Age         City
0  Alice  25.0     New York
1    Bob   NaN  Los Angeles
2    NaN  30.0     New York
3  Diana  22.0          NaN
4  Ethan   NaN      Chicago

Code:

import pandas as pd
import numpy as np
data = {
    'Name': ['Alice', 'Bob', np.nan, 'Diana', 'Ethan'],
    'Age': [25, np.nan, 30, 22, np.nan],
    'City': ['New York', 'Los Angeles', 'New York', np.nan, 'Chicago']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
"""
    Here Im dropping null value of specific column
"""
df_cleaned = df.dropna(subset=['Name', 'Age', 'City'])
print("DataFrame after removing rows with missing data:")
print(df_cleaned)

Output:

DataFrame after removing rows with missing data:
Name   Age      City
0  Alice  25.0  New York

Collectives™ on Stack Overflow

pandas: remove rows with missing data

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related