0

I want to remove a certain columns based on high null values. In few columns there is a value(in this case "Select) which is equivalent to null. I want to replace this with null so that i can calculate the null % and removes columns accordingly.

Lead Profile    City
Select          Select
Select          Select
Potential Lead  Mumbai
Select          Mumbai
Select          Mumbai

Tried using replace function as well as map function.

leads['Specialization'] = leads['Specialization'].replace('Select', "NaN")

This Code just replaces the string with string and doesnt actually impute null values

def colmap(x):
     return x.map({"Select": "Nan"})

df[['Lead Profile']] = df[['Lead Profile']].apply(colmap)

This code replaces all the values with NAN

4
  • Try importing numpy and df.replace('Select', np.nan) Commented Jun 6, 2019 at 8:08
  • It is pandas that im using Commented Jun 6, 2019 at 8:14
  • pandas requires numpy, so you can safely add an import numpy as np statement. Commented Jun 6, 2019 at 9:14
  • Thanks guys for the insights. Will keep these in mind for all the null value situations going forward. cheers! Commented Jun 6, 2019 at 9:16

2 Answers 2

1

to replace value with nulls:

df['col'] = df['col'].replace('value', np.nan)

otherwise to directly return only columns which have less than N times the Select values, you can use this:

df2 = df[[col for col in df.columns if len(df[df[col] == 'Select']) < N]]
Sign up to request clarification or add additional context in comments.

Comments

1

Besides Olivier's answer, in case you import data with read_csv or read_excel, these methods have na_values argument:

df = pd.read_csv('file.csv', na_values=['Select'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.