I have some xml code I wish to parse. I wish to use ElementTree and not BeautifulSoup as I am having some issues with the latter the way it handles the xml.
I wish to extract the text from the following:
- Abstract/AbstractText
- ArticleId when IdType="pmc"
- PublicationType with 'UI' value to be retrieved first before retrieving the corresponding text
Which functions of ElementTree do I use to do the work?
I have been trying to use .attrib, attrib.get(), .iter, .attrib[key] to get the text but I have not been successful in accessing the actual text.
<PubmedArticleSet>
<PubmedArticle>
<PMID Version="1">10890875</PMID>
<Journal>
<ISSN IssnType="Print">0143-005X</ISSN>
<Title>Journal of epidemiology and community health</Title>
</Journal>
<ArticleTitle>Sources of influence on medical practice.
</ArticleTitle>
<Abstract>
<AbstractText Label="OBJECTIVES" NlmCategory="OBJECTIVE">
To explore the opinion of general practitioners on the
importance and legitimacy of sources of influence on
medical practice.
</AbstractText>
<AbstractText Label="METHODS" NlmCategory="METHODS">
General practitioners (n=723) assigned to Primary Care
Teams (PCTs) in two Spanish regions were randomly selected
to participate in this study.
</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">
The most important and legitimate sources of influence according to general practitioners were: training courses and scientific articles, designing self developed protocols and discussing with colleagues.
</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">
The development of medical practice is determined by many factors, grouped around three big areas: organisational setting, professional system and social setting. </AbstractText>
</Abstract>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article
</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S.Gov't </PublicationType>
</PublicationTypeList>
<PubmedData>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">10890875</ArticleId>
<ArticleId IdType="pmc">PMC1731730</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
What I am hoping to get as a result is: generating every "label" of AbstractText getting the text for that "label"