Frustrated to say I'm stumped on this one. I'm extracting text from a paragraph:
paragraphs = re.findall(r'(<p(.*?)</p>)', html)
Then I want to scrap the tags and just keep the paragraph text, word by word:
paragraphs = re.sub(r'\<.*?\>', '', paragraphs)
Problem is that Python expects a string. If I understand it right I have to turn "paragraphs" into a string first. But, when I do:
paragraphs = str(paragraphs)
…I get the text letter by letter, the words are broken apart. Well, I'm new to Python and this confuses me.
1st question: Why isn't "paragraphs" a string to begin with?
2nd question: How do I convert "paragraph" into a string, keeping it word by word, such as:
paragraph = ['Two', 'words']