How do I extract all HTML-style comments from a document, using Python?
I've tried using a regex:
text = 'hello, world <!-- comment -->'
re.match('<!--(.*?)-->', text)
But it produces nothing. I don't understand this since the same regex works fine on the same string at https://regex101.com/
UPDATE: My document is actually an XML file, and I'm parsing the document with pyquery (based on lxml), but I don't think lxml can extract comments that aren't inside a node. This is what the document looks like:
<?xml version="1.0" encoding="UTF-8"?>
<clinical_study rank="220398">
<intervention_browse>
<!-- CAUTION: The following MeSH terms are assigned with an imperfect algorithm -->
<mesh_term>Freund's Adjuvant</mesh_term>
<mesh_term>Keyhole-limpet hemocyanin</mesh_term>
</intervention_browse>
<!-- Results have not yet been posted for this study -->
</clinical_study>
UPDATE 2: Thanks for suggesting the other answer, but I'm already parsing the document extensively with lxml and don't want to rewrite everything with BeautifulSoup. Have updated title accordingly.

tagis anetree.comment-- have you tried that? And then ifTruecould just print thetagproperty value?tag, the comment is just floating.