I am using beautiful soup and requests to put down information from a webpage, I am trying to get a list of book titles that are just the titles and do not include the text title= in font of the title.
Example text = 'a bunch of junk title=book1 more junk text title=book2'
what I am getting is titleList = ['title=book1', 'title=book2']
I want titleList = ['book1', 'book2']
I have tried matching groups and that does break the words title= and book1 apart but I am not sure how to append just group(2) to the list.
titleList = []
def getTitle(productUrl):
res = requests.get(productUrl, headers=headers)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'lxml')
title = re.compile(r'title=[A-Za-z0-9]+')
findTitle = title.findall(res.text.strip())
titleList.append(findTitle)
soupobject.