Linked Questions

0 votes
0 answers
636 views

I was trying to extract the text between tags from an .xml file but I found that some parts of the text has an <italic> tag that cuts the text after that and makes it impossible for me to get it....
smtn-nt's user avatar
0 votes
1 answer
55 views

I'm only getting value1 when using .findall('string') and rest is ignored. How to get whole value? xml file input: <resources> <string name="key">value1 <b>value2</b>...
Gorthez's user avatar
  • 421
64 votes
12 answers
70k views

Currently I have code that does something like this: soup = BeautifulSoup(value) for tag in soup.findAll(True): if tag.name not in VALID_TAGS: tag.extract() soup.renderContents() Except ...
Jason Christa's user avatar
43 votes
2 answers
37k views

Is there any way to get all the childrens node values within the ul tag. Input: <ul> <li class="type">Industry</li> <li><a href="/store/Browse/?N=355+361+...
pallavi's user avatar
  • 745
5 votes
1 answer
3k views

I am trying to import sentences from Shakespeare's NLTK corpus – following this help site – but I am having trouble getting access to the sentences (in order to train a word2vec model) : from nltk....
groromain92's user avatar
0 votes
1 answer
1k views

I am trying to parse and extract all the text inside of the claim-text tag and prepare it for a csv. So each claim tag will have a column containing all the claim-text. Basically the claims are ...
theEconCsEngineer's user avatar
0 votes
3 answers
2k views

I have an xml document from which I want to extract text based on tags. The part that I want to extract text from looks something like this : <BlockText attr1="blah" attr2=657 ID="Bhf76" lang="en"&...
MMM's user avatar
  • 305
-1 votes
1 answer
1k views

I currently have an XML file that I'd like to parse with Python. I'm using Python's Element Tree and it works fine except I had a question. The file currently looks something like: <Instance> ...
Sean's user avatar
  • 3,460
0 votes
1 answer
760 views

I have this XML file : <?xml version="1.0" encoding="UTF-8" standalone="true"?> <Component> <Custom/> <ID>1</ID> <LongDescription> <html><html> <...
Emna Jaoua's user avatar
0 votes
1 answer
499 views

I'm using xml.etree.ElementTree. When I try to get text from AbstractText, I get None or partial text if there are formatting tags like i, b or similar tags into the text. Here is a xml example <...
gabriele colia's user avatar
0 votes
1 answer
364 views

I have an XML file as below: <?xml version="1.0" encoding="UTF-8"?> <data> <text> I have <num1>two</num1> apples and <num2&...
Shaghayegh L's user avatar
1 vote
2 answers
209 views

I am currently writing a python script that can extract all of the text in an xml file. I am using the Element Tree library to interpret the data but I am running into this problem however when the ...
Handsome Alex's user avatar
0 votes
0 answers
154 views

I have an .xml file that I'm trying to extract the text from all p tags from. I've been able to implement the following code to do so, but it's not quite returning the output I want. tree = ET.parse(&...
spaghettiplants's user avatar