I have an HTML file like the one below:
<!DOCTYPE HTML>
<html>
<head>
<title>Sezione microbiologia</title>
<link rel="stylesheet" src="./style.css">
</head>
<body>
<div id="content">
<section id="main">
<!-- SOME CONTENT... -->
<h1>Prima diluizione</h1>
<p>Some content including "prima diluizione"...</p>
<h1>Seconda diluizione</h1>
<p>Some content including "seconda diluizione"...</p>
<h1>Terza diluizione</h1>
<p>Some content including "terza diluizione"...</p>
</section>
<section id="second">
<!-- SOME CONTENT... -->
</section>
<section id="third">
<!-- SOME CONTENT... -->
</section>
<section id="footer">
<!-- SOME CONTENT... -->
</section>
</div>
</body>
</html>
Problem description:
I am trying to modify the headings <h1> that contain the the word diluizione to replace this word and its prefix with "Diluizione seriale". I tried to do this using Python replace(), the problem is that even lines in the <p> paragraphs are cut off, whilst I would only like lines in the h1 tags to be modified. On top of that, I still have not managed to find a way to automated taking out the prefix, ie "Prima", "Seconda", "Terza", etc.
The code I tried with
I currently came up with this:
with open('./home.html') as file:
text = file.read()
if "diluizione" in text:
text = text.replace("diluizione", "diluizione seriale")
But this outputs:
<div id="content">
<section id="main">
<!-- SOME CONTENT... -->
<h1>Prima diluizione seriale</h1>
<p>Some content including "prima diluizione seriale"...</p>
<h1>Seconda diluizione seriale</h1>
<p>Some content including "seconda diluizione seriale"...</p>
<h1>Terza diluizione seriale</h1>
<p>Some content including "terza diluizione seriale"...</p>
</section>
So as you can see, even text in the <p> tags is affected and the headings the prefix is still there.
My desired output would be:
<div id="content">
<section id="main">
<!-- SOME CONTENT... -->
<h1>Diluizione seriale</h1>
<p>Some content including "prima diluizione"...</p>
<h1>Diluizione seriale</h1>
<p>Some content including "seconda diluizione"...</p>
<h1>Diluizione seriale</h1>
<p>Some content including "terza diluizione"...</p>
</section>
Any help or suggestion is very appreciated, thanks very much in advance.