1

I have this dataframe where I'm trying to delete all one word responses, with/without punctuation and could have spaces in front too. Most of the values are full, long sentences but please find below the kind I am trying to remove.

column
thanks
hello!
really....

My try textonly = re.sub('^.\w+\w+.$' , " " , df.column)

error (even though dtype is string) : expected string or bytes-like object

Another try which seems to go through but doesnt change anything :/

textonly = re.sub('^.\w+\w+.$' , " " , str(df.column))

Please help identify what I'm missing

2
  • Use df.column.str.replace instead Commented Nov 8, 2021 at 21:23
  • ^[^\n ]*\n(with multiline flag m set) matches lines with no spaces. Commented Nov 8, 2021 at 21:24

2 Answers 2

1

You can use

df['column'] = df['column'].str.replace(r'^\W*\w+\W*$', '', regex=True)

If you mean natural language words by "words", i.e. only consisting of letters, you may use

df['column'] = df['column'].str.replace(r'^[\W\d_]*[^\W\d_]+[\W\d_]*$', '', regex=True)

The regex matches

  • ^ - start of string
  • \W* - zero or more non-word chars
  • [\W\d_]* - zero or more non-word chars, digits and _
  • \w+ - one or more word chars
  • [^\W\d_]+ - one or more chars other than non-word chars, digits and _
  • \W* - zero or more non-word chars
  • $ - end of string.
Sign up to request clarification or add additional context in comments.

Comments

1

You could also not use regex and then check if the string has a space in it

x = [
    'hej med dig',
    'hej',
]

print([x for x in x if ' ' in x.strip()])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.