Python, pandas: Removing values with a small length in dataframe

Question

I am trying to remove all values in this pandas dataframe that have that have less than length 3, but not to all columns

import pandas 

df = pd.DataFrame({'id': [1, 2, 3],'player': ['w', 'George', 'Roland'], 'hometown': ['Miami', 'Caracas', 'Mexico City'], 'current_city': ['New York', '-', 'New York']})

columns_to_add = ['player', 'hometown', 'current_city']

for column_name in columns_to_add:
    df.loc[(len(df[column_name]) < 3), column_name] = None

I am trying the following code but I get the following error:

KeyError("cannot use a single bool to index into setitem")

Note:

score 1 · Accepted Answer · 2021-12-09 18:06:32Z

1

Try this:

df[df[columns_to_add].apply(lambda col: col.str.len() < 3)] = np.nan

Output:

>>> df
   id  player     hometown current_city
0   1     NaN        Miami     New York
1   2  George      Caracas          NaN
2   3  Roland  Mexico City     New York

edited Dec 9, 2021 at 18:06

answered Dec 9, 2021 at 17:54

user17242583

Sign up to request clarification or add additional context in comments.

Comments

Quang Hoang · Accepted Answer · 2021-12-09 18:09:24Z

1

You can use applymap to calculate the length, then np.where to update:

df[columns_to_add] = np.where(df[columns_to_add].applymap(len) >=3, 
                              df[columns_to_add], None)

Output:

   id  player     hometown current_city
0   1    None        Miami     New York
1   2  George      Caracas         None
2   3  Roland  Mexico City     New York

answered Dec 9, 2021 at 18:09

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

2 Comments

The Dan Over a year ago

How can I manage a NoneType value in my dataset? TypeError: object of type 'NoneType' has no len()

Quang Hoang Over a year ago

use lambda x: x if x is None else len(x) instead of just len. Note, in any case you should run that line once.

Marya · Accepted Answer · 2021-12-09 18:58:31Z

0

you can use the 'replace' function in DataFrame :

def find_string_less_lenth(list_of_values):
    return [i for i in list_of_values if len(i)<3]
for column_name in columns_to_add:
    df[column_name] = \
df[column_name].replace(find_string_less_lenth(df[column_name].values), 'none')

edited Dec 9, 2021 at 18:58

answered Dec 9, 2021 at 18:15

Marya

1781 silver badge8 bronze badges

1 Comment

Community Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Nicolai B. Thomsen · Accepted Answer · 2021-12-09 23:04:09Z

0

I think the simplest solution might be

new_df = df[columns_to_add]
new_df[new_df.applymap(len) > 3]

answered Dec 9, 2021 at 23:04

Nicolai B. Thomsen

9047 silver badges17 bronze badges

Comments

The Dan · Accepted Answer · 2021-12-10 18:17:15Z

0

The answer to the issue that took in consideration all variables correctly was the following:

import pandas
import numpy as np

df0 = pd.DataFrame({'id': [1, 2, 3],'player': ['w', 'George', 'Roland'], 'hometown': ['Miami', 'Caracas', 'Mexico City'], 'current_city': ['New York', '-', 'New York']})

columns_to_add = ['player', 'hometown', 'current_city']


df0[df0[columns_to_add].apply(lambda col: col.str.len() < 3)] = np.nan
df = df0.where(pandas.notnull(df0), None)

One important thing to understand is that columns_to_add does not include all the columns in the dataframe

answered Dec 10, 2021 at 18:17

The Dan

1,7206 gold badges31 silver badges60 bronze badges

Collectives™ on Stack Overflow

Python, pandas: Removing values with a small length in dataframe

5 Answers 5

Comments

2 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related