0

train dataframe with content column. content column has list for each row containing different words in that list.

content
[sure, tune, …, watch, donald, trump, “,”, late, ’ , night]
[abc, xyz, “,”,late, ’, night]

Code to remove regular expressions

import re
train['content'] = train['content'].map(lambda x: re.sub(r'\W+', '', x))

Error

TypeError: expected string or bytes-like object

Expected output

content
[sure, tune,  watch, donald, trump, late,   night]
[abc, xyz,late, night]

Notice all the special characters like ..., , and are gone and we are left only with words.

3
  • First off - each row of content must be a string itself, not an actual list, otherwise you'd have syntax errors before you even started, should probably clarify that. Commented Jun 26, 2020 at 13:08
  • each row of content is a list Commented Jun 26, 2020 at 13:08
  • Are all of the items in the list defined variables then? Either the list itself has to be a string, the items have to be strings / numbers, or they are variables that are previously defined. Commented Jun 26, 2020 at 13:11

2 Answers 2

1

You are trying to apply regular expression to the List object.

If your goal is to use this regex on every item of the list, you can apply re.sub for each item in list:

import re
def replace_func(item):
    return re.sub(r'\W+', '', item)

train['content'] = train['content'].map(lambda x: [replace_func(item) for item in x])
Sign up to request clarification or add additional context in comments.

2 Comments

Does not work.......this is separating all the individual letters
It would separate individual letters if there are strings in your content field. I assumed that all elements in the content field is a list. What does train.content.map(type).value_counts() shows?
0

Just do:

content=['sure', 'tune', '…', 'watch', 'donald', 'trump', '“,”', 'late', '’' , 'night']
content = list(map(lambda x: re.sub(r'\W+', '', x),content))

2 Comments

content is a column in dataset. there are 30,000 rows in content column....these 2 rows are just an eg
main idea is to parse strings by regex isn't it? Do content=train['content'].tolist() and do back reassign or do foreach on dataframe column; you need additional conversion anyway in my opinion.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.