Linked Questions
13 questions linked to/from Python element tree - extract text from element, stripping tags
0
votes
0
answers
636
views
Get text from xml that is behind an extra tag in python [duplicate]
I was trying to extract the text between tags from an .xml file but I found that some parts of the text has an <italic> tag that cuts the text after that and makes it impossible for me to get it....
0
votes
1
answer
55
views
Parse XML to String with <b> tags [duplicate]
I'm only getting value1 when using .findall('string') and rest is ignored. How to get whole value?
xml file input:
<resources>
<string name="key">value1 <b>value2</b>...
64
votes
12
answers
70k
views
Remove a tag using BeautifulSoup but keep its contents
Currently I have code that does something like this:
soup = BeautifulSoup(value)
for tag in soup.findAll(True):
if tag.name not in VALID_TAGS:
tag.extract()
soup.renderContents()
Except ...
43
votes
2
answers
37k
views
xpath to get all the childrens text
Is there any way to get all the childrens node values within the ul tag.
Input:
<ul>
<li class="type">Industry</li>
<li><a href="/store/Browse/?N=355+361+...
5
votes
1
answer
3k
views
Python NLTK Shakespeare corpus
I am trying to import sentences from Shakespeare's NLTK corpus – following this help site – but I am having trouble getting access to the sentences (in order to train a word2vec model) :
from nltk....
0
votes
1
answer
1k
views
Python lxml extract text when a tag exists in the middle of the text
I am trying to parse and extract all the text inside of the claim-text tag and prepare it for a csv. So each claim tag will have a column containing all the claim-text.
Basically the claims are ...
0
votes
3
answers
2k
views
python xml.etree.ElementTree remove empty tag in the middle of text
I have an xml document from which I want to extract text based on tags.
The part that I want to extract text from looks something like this :
<BlockText attr1="blah" attr2=657 ID="Bhf76" lang="en"&...
-1
votes
1
answer
1k
views
Skipping "nested tags" when parsing XML with Python
I currently have an XML file that I'd like to parse with Python. I'm using Python's Element Tree and it works fine except I had a question.
The file currently looks something like:
<Instance>
...
0
votes
1
answer
760
views
How to parse XML file with xml.etree.ElementTree that have HTML content in its child
I have this XML file :
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<Component>
<Custom/>
<ID>1</ID>
<LongDescription>
<html><html> <...
0
votes
1
answer
499
views
python Parsing xml: get text from tag which contains <i> or <b> or similar
I'm using xml.etree.ElementTree. When I try to get text from AbstractText, I get None or partial text if there are formatting tags like i, b or similar tags into the text.
Here is a xml example
<...
0
votes
1
answer
364
views
Python parse XML content of an element when there is a child element
I have an XML file as below:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<text>
I have <num1>two</num1> apples and <num2&...
1
vote
2
answers
209
views
Parsing an xml file with an emphasis tag in it in python
I am currently writing a python script that can extract all of the text in an xml file. I am using the Element Tree library to interpret the data but I am running into this problem however when the ...
0
votes
0
answers
154
views
Parsing an XML file for multiple p tags to be merged with other p tags
I have an .xml file that I'm trying to extract the text from all p tags from. I've been able to implement the following code to do so, but it's not quite returning the output I want.
tree = ET.parse(&...