0

I used BeautifulSoup to parse a website and store the content. It is in this form:

records = [[[<p>data_1_1</p>], [<p>data_1_2</p>],[], [<li>data_1_3</li>]],
           [[<p>data_2_1</p>], [<p>data_2_2</p>], [], [<li>data_2_3</li>]]]

I am having trouble making this:

records = [["data_1_1", "data_1_2", "data_1_3"],
           ["data_2_1", "data_2_2", "data_2_3"]]

I tried list comprehensions:

text_records = [sum(record, []) for record in records]

but the text is still wrapped in <p> or <li> tags.

text_records = [item.string for item in sum(record, []) for record in records]

takes the text out of tags, but this gives one large list, with the same values repeated multiple times.

I know there is plenty out there on list comprehensions in python, and I've searched SO, but I can't find anything to help with this situation.

2 Answers 2

1

Edit - This will work even for multiple items:

[sum([v.string for v in [item for item in record if item]], []) for record in records]

Adding the sum will make sure all the lists are combined into a single one per record.

Original:

This should work fine as long as you will always only have internal lists of a single item:

[[item[0].string for item in row if item] for row in records]

This will go through each record, make sure that the record exists with the if statement, and then append the first element of the list to the new record in it's string format.

Sign up to request clarification or add additional context in comments.

2 Comments

thank you, this worked. why item[0] though? what else is in there?
Welcome, glad it worked for you! It's not that there are additional items in there, it's simply that it's a list object. So you have to extract the item out of the list (even if there is just one) or else it will be treated as a separate list.
0

This will do the job just fine(although this many for loops is annoying,any suggestion is welcome).

records1 = [BeautifulSoup(k).text for i in records for j in i for k in j]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.