Delete string's elements in python dataframe according to elements length

Question

I have a python dataframe composed of 13 columns and 60000 lines, one of these column nammed "Text" (type object) contain quite long text cells :

    Text    ID  AI  BI  GH  JB  EQ  HE  EN  MA  WE  WR
2585    obstetric gynaecologicaladmissions owing abor...    2585    0   0   0   0   0   1   0   0   0   0
507     graphic illustration process flow help organiz...   507     0   0   0   0   0   0   0   0   1   0

Some words in some lines are sticked (like in the frist dataframe line : gynaecologicaladmissions), in order to get rid of this I would like to delete all these case in my entire dataset. I thought about delete, for each row in "Text" column, all word who has more than 13 characters

I've tried this line code :

res.loc[res['Text'].str.len() < 13]

But it only provide as result two empty lines.

How can I solve this problem ?

ggaurav · Accepted Answer · 2021-01-15 16:13:09Z

1

Let's take a sample dataframe

df

    text
0   obstetric gynaecologicaladmissions owing
1   graphic illustration process flow help
2   process flow help
3   illustrationprocess flow

As you have to check words length, you have to split each of the strings by separator (in this case space) and loop through the array and include those words whose length is <= 13. To loop through each of the array you can use apply

def func(x):
    res = list()
    for word in x:
        if len(word) <= 13:
            res.append(word)
    return " ".join(res)
    
df['text'] = df['text'].str.split().apply(func)
df
    
     text
0   obstetric owing
1   graphic illustration process flow help
2   process flow help
3   flow

edited Jan 15, 2021 at 16:13

answered Jan 15, 2021 at 15:58

ggaurav

1,8041 gold badge11 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JEG Over a year ago

Thanks for your answer. I would also like to conserve others word present in the line where a word > 13 characters is detected. For example line 0 would give " obstetric owing".

Collectives™ on Stack Overflow

Delete string's elements in python dataframe according to elements length

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related