1

I am trying to remove all values in this pandas dataframe that have that have less than length 3, but not to all columns

import pandas 

df = pd.DataFrame({'id': [1, 2, 3],'player': ['w', 'George', 'Roland'], 'hometown': ['Miami', 'Caracas', 'Mexico City'], 'current_city': ['New York', '-', 'New York']})

columns_to_add = ['player', 'hometown', 'current_city']

for column_name in columns_to_add:
    df.loc[(len(df[column_name]) < 3), column_name] = None

I am trying the following code but I get the following error:

KeyError("cannot use a single bool to index into setitem")

Note:

5 Answers 5

1

Try this:

df[df[columns_to_add].apply(lambda col: col.str.len() < 3)] = np.nan

Output:

>>> df
   id  player     hometown current_city
0   1     NaN        Miami     New York
1   2  George      Caracas          NaN
2   3  Roland  Mexico City     New York
Sign up to request clarification or add additional context in comments.

Comments

1

You can use applymap to calculate the length, then np.where to update:

df[columns_to_add] = np.where(df[columns_to_add].applymap(len) >=3, 
                              df[columns_to_add], None)

Output:

   id  player     hometown current_city
0   1    None        Miami     New York
1   2  George      Caracas         None
2   3  Roland  Mexico City     New York

2 Comments

How can I manage a NoneType value in my dataset? TypeError: object of type 'NoneType' has no len()
use lambda x: x if x is None else len(x) instead of just len. Note, in any case you should run that line once.
0

you can use the 'replace' function in DataFrame :

def find_string_less_lenth(list_of_values):
    return [i for i in list_of_values if len(i)<3]
for column_name in columns_to_add:
    df[column_name] = \
df[column_name].replace(find_string_less_lenth(df[column_name].values), 'none')

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
0

I think the simplest solution might be

new_df = df[columns_to_add]
new_df[new_df.applymap(len) > 3]

Comments

0

The answer to the issue that took in consideration all variables correctly was the following:

import pandas
import numpy as np

df0 = pd.DataFrame({'id': [1, 2, 3],'player': ['w', 'George', 'Roland'], 'hometown': ['Miami', 'Caracas', 'Mexico City'], 'current_city': ['New York', '-', 'New York']})

columns_to_add = ['player', 'hometown', 'current_city']


df0[df0[columns_to_add].apply(lambda col: col.str.len() < 3)] = np.nan
df = df0.where(pandas.notnull(df0), None)

One important thing to understand is that columns_to_add does not include all the columns in the dataframe

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.