0

I have a python dataframe composed of 13 columns and 60000 lines, one of these column nammed "Text" (type object) contain quite long text cells :

    Text    ID  AI  BI  GH  JB  EQ  HE  EN  MA  WE  WR
2585    obstetric gynaecologicaladmissions owing abor...    2585    0   0   0   0   0   1   0   0   0   0
507     graphic illustration process flow help organiz...   507     0   0   0   0   0   0   0   0   1   0

Some words in some lines are sticked (like in the frist dataframe line : gynaecologicaladmissions), in order to get rid of this I would like to delete all these case in my entire dataset. I thought about delete, for each row in "Text" column, all word who has more than 13 characters

I've tried this line code :

res.loc[res['Text'].str.len() < 13]

But it only provide as result two empty lines.

How can I solve this problem ?

1 Answer 1

1

Let's take a sample dataframe

df

    text
0   obstetric gynaecologicaladmissions owing
1   graphic illustration process flow help
2   process flow help
3   illustrationprocess flow

As you have to check words length, you have to split each of the strings by separator (in this case space) and loop through the array and include those words whose length is <= 13. To loop through each of the array you can use apply

def func(x):
    res = list()
    for word in x:
        if len(word) <= 13:
            res.append(word)
    return " ".join(res)
    
df['text'] = df['text'].str.split().apply(func)
df
    
     text
0   obstetric owing
1   graphic illustration process flow help
2   process flow help
3   flow
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer. I would also like to conserve others word present in the line where a word > 13 characters is detected. For example line 0 would give " obstetric owing".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.