I'm trying to parse the text in the ebooks at gutenberg.org to extract info about the books, for example, the title.
Every book on there has a line like this:
*** START OF THIS PROJECT GUTENBERG EBOOK THE ADVENTURES OF SHERLOCK HOLMES ***
I'd like to use some thing like this:
book_name=()
index = 0
for line in finalLines:
index+=1
if "*** START OF THIS PROJECT GUTENBERG EBOOK "%%%"***" in line:
print(index, line)
book_name=%%%
but I'm obviously not doing it right. Can someone show me how it's done??
\*\*\* START OF THIS PROJECT GUTENBERG EBOOK (.*) \*\*\*. Learn more: docs.python.org/library/re.html regular-expressions.info/reference.html regexpal.com