3

I am using the following code to remove some rows with missing data in pandas:

df = df.replace(r'^\s+$', np.nan, regex=True)
df = df.replace(r'^\t+$', np.nan, regex=True)
df = df.dropna()

However, I still have some cells in the data frame looks blank/empty. Why is this happening? Any way to get rid of rows with such empty/blank cells? Thanks!

5
  • 2
    Can you show us samples of the dataframe so that we can reproduce the problems Commented Aug 22, 2018 at 5:30
  • 2
    What about df = df.replace('', np.nan) ? Commented Aug 22, 2018 at 5:30
  • 1
    Like @jezrael said, try adding df = df.replace('', np.nan, regex=True) before dropna in your code Commented Aug 22, 2018 at 5:31
  • @jezrael: add df = df.replace('', np.nan) works. Thanks! Commented Aug 22, 2018 at 5:34
  • 1
    Possible duplicate of Python Pandas DataFrame remove Empty Cells Commented Aug 22, 2018 at 5:43

3 Answers 3

4

You can use:

df = df.replace('', np.nan)

If want simplify your code is possible join regexes by | and for empty space use ^$:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':['',5,4,5,5,4],
                   'C':['','  ','   ',4,2,3],
                   'D':[1,3,5,7,'       ',0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

df = df.replace(r'^\s+$|^\t+$|^$', np.nan, regex=True)
print (df)
   A    B    C    D  E  F
0  a  NaN  NaN  1.0  5  a
1  b  5.0  NaN  3.0  3  a
2  c  4.0  NaN  5.0  6  a
3  d  5.0  4.0  7.0  9  b
4  e  5.0  2.0  NaN  2  b
5  f  4.0  3.0  0.0  4  b
Sign up to request clarification or add additional context in comments.

Comments

2

Depending on your version of pandas you may do:

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False) axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Determine if rows or columns which contain missing values are removed.

0, or ‘index’ : Drop rows which contain missing values. 1, or ‘columns’ : Drop columns which contain missing value.

Deprecated since version 0.23.0:: Pass tuple or list to drop on multiple

axes. source

So, for now to drop rows with empty values

df = df.dropna(axis=0)

Should work

Comments

0

I'm providing code with input and output data:

Input:

Original DataFrame:
Name   Age         City
0  Alice  25.0     New York
1    Bob   NaN  Los Angeles
2    NaN  30.0     New York
3  Diana  22.0          NaN
4  Ethan   NaN      Chicago

Code:

import pandas as pd
import numpy as np
data = {
    'Name': ['Alice', 'Bob', np.nan, 'Diana', 'Ethan'],
    'Age': [25, np.nan, 30, 22, np.nan],
    'City': ['New York', 'Los Angeles', 'New York', np.nan, 'Chicago']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
"""
    Here Im dropping null value of specific column
"""
df_cleaned = df.dropna(subset=['Name', 'Age', 'City'])
print("DataFrame after removing rows with missing data:")
print(df_cleaned)

Output:

DataFrame after removing rows with missing data:
Name   Age      City
0  Alice  25.0  New York

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.