2

I am new to pandas and trying to complete the following:

I have a dataframe which look like this:

row    A     B     
1      abc   abc 
2      abc   
3            abc 
4
5      abc   abc 

My desired output would look like this:

row    A     B     
1      abc   abc 
2      abc   
3            abc 
5      abc   abc 

I am trying to drop rows if there is no value in both A and B columns:

if finalized_export_cf[finalized_export_cf['A']].str.len()<2:
    if finalized_export_cf[finalized_export_cf['B']].str.len()<2:
        finalized_export_cf[finalized_export_cf['B']].drop()

But that gives me the following error:

ValueError: cannot index with vector containing NA / NaN values

How could I drop values when both columns have an empty cell? Thank you for your suggestions.

5 Answers 5

4

You can check whether all rows have a null by using .isnull() and all() in a chain. isnull() produces a dataframe with booleans, and all(axis=1) checks whether all values in a given rows are true. If that's the case, that means that all values in the rows are nulls:

inds = df[["A", "B"]].isnull().all(axis=1) 

You can then use inds to clean up all rows that have only nulls. First negate it using the tilda ~, or else you can only missing values:

df = df.loc[~inds, :]
Sign up to request clarification or add additional context in comments.

1 Comment

I have 10 columns and only need to check 2 of those, how would I proceed with this set up?
2

For your use case you can create a mask and get the values where A & B are not True:

mask = df.isna()
df[~((mask.A == True) & (mask.B == True))] 

output:

   row    A    B
0    1  abc  abc
1    2  abc  NaN
2    3  NaN  abc
4    5  abc  abc

Comments

2

If missing values are NaNs then use DataFrame.dropna with all and subset parameter:

print (df)
   row    A    B
0    1  abc  abc
1    2  abc  NaN
2    3  NaN  abc
3    4  NaN  NaN
4    5  abc  abc

df = df.dropna(how='all', subset=['A','B'])
print (df)
   row    A    B
0    1  abc  abc
1    2  abc  NaN
2    3  NaN  abc
4    5  abc  abc

Or if empty value is empty string use DataFrame.any with compare not equal '':

print (df)
   row    A    B
0    1  abc  abc
1    2  abc     
2    3       abc
3    4          
4    5  abc  abc


df = df[df[['A','B']].ne('').any(axis=1)]
print (df)
   row    A    B
0    1  abc  abc
1    2  abc     
2    3       abc
4    5  abc  abc

3 Comments

Hi jezrael, does this drop rows only if both A and B columns are empty or either one of those?
@KenHBS - sure, rows are removed
@JonasPalačionis - It test only A, B columns - all columns specified in subset parameter
1

if you have only two columns - you can use the how attribute of the pandas.dataFrame.dropna by setting it to 'all':

df.dropna(how='all')

Comments

1

first of all we need to change the blank spaces to NaN

df = df.replace(r'^\s*$',np.nan,regex=True)

then drop na whilst sub-setting your rows

df.dropna(subset=['A','B'],how='all').fillna(' ') # if you want spaces for na
print(df)
    row    A    B
0    1  abc  abc
1    2  abc     
2    3  abc     
4    5  abc  abc

1 Comment

@Jonas Palačionis - I suggest you use this answer - this will work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.