The hard part of this problem is that the presented document mixes formats -- it has a valid HTML structure, but also with XML-like elements which appear "tossed-in" without a particular pattern. There are ways to disentangle these parts, even as they aren't bulletproof and come with trade-offs.
In this case XML::LibXML can do the whole job, as it can deal with bad data, but note warnings.
use warnings;
use strict;
use feature 'say';
use Encode qw(encode_utf8);
use XML::LibXML;
my $html_doc = XML::LibXML->new(recover => 2)->parse_html_fh(\*DATA);
my $xml = encode_utf8(
$doc->findvalue('/html/body/pre/text()') =~ s/^[^<]*//r
);
my $xml_doc = XML::LibXML->new->parse_string($xml);
say for $xml_doc->findnodes('//key'); # node object stringifies
__DATA__
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Some tittle <localconfig>
<key name="ssl_default">
<value>sha256</value>
</key>
</title>
</head>
<body>
<h2>Some h2</h2>
<p>some text:
<pre> text <localconfig>
<key name="ssl_default">
<value>sha256</value>
</key>
<key name="some variable">
<value>1024</value>
</key>
</localconfig>
</pre>
</p>
<hr>
<i>
<small>Some text</small>
</i>
<hr/>
</body>
</html>
The parser option recover is what allows the above parsing to go through
A true value turns on recovery mode which allows one to parse broken XML or HTML data. [...]
As useful as this can be, it of course begs for extreme caution as we are willfully using bad data (or, rather, non-conforming data here). This case brings two such issues.
Regex is needed for entities. The example deals with those under <pre>, but there may be more. We need to inspect input and may need code changes for different data.
This makes use of the observation that the XML-like "tags" are given by entities (< etc), which are left as they are during parsing and only decoded later. However ...
... this isn't a rule and if some aren't given that way (but rather as <key>), then those can make the library parse the document into a (slightly) different tree. This again requires inspection of input, and possibly code adjustments for any new data.
Thanks to ikegami for bringing up the point of first parsing the data and only then dealing with the entities, for a discussion, and for the XML-code above. The original version of the XML-related code above first decoded and so ended up with a slightly different tree.
Also note that HTML::TreeBuilder does process this data with ignore_unknown set. Then the problem is that these new "tags" (<key> etc) are just data for it, so any practical use of the obtained tree would probably have to rely on regex.
One other way to deal with this data is with the flexible, high-level HTML parser, Marpa::HTML.
A very basic demo
use warnings;
use strict;
use feature 'say';
use Marpa::HTML qw(html);
use HTML::Entities qw(decode_entities);
my $input = do { local $/; <DATA> };
my $html = decode_entities($input);
my (@attrs, @cont);
my $marpa_key = Marpa::HTML::html(
\$html,
{
'key' => sub {
push @attrs, Marpa::HTML::attributes();
push @cont, Marpa::HTML::contents();
},
}
);
for my $i (0..$#cont) {
say "For attribute \"name=$attrs[$i]->{name}\" the <key> has: $cont[$i]"
}
__DATA__
...the same as in the first example, data from the question...
This collects views as it parses, using API for attributes and contents, for element <key>.
It may in principle be suitable for your problem since it accepts the mere semantics of <...> as an element. But those aren't treated as XML, what may be one downside if your data relies on XML more than shown. And, of course, this is a different approach with its own rules.
Note that the basic logic and use of the module is that each coderef returns, and this return is used for the element that it fired on; the rest of text is unchanged. So this is natural for changing particular elements of a document.
I've used it differently above, only to collect information about the "tags." That code prints
For attribute "name=ssl_default" the <key> has:
<value>sha256</value>
For attribute "name=some variable" the <key> has:
<value>1024</value>
<value>(shown in code) or<value>(referred to in text) ?<value>in the original response. Using decode_entities it transforms into <value>XML::LibXML->new->parse_html_stringfailed to handle it)