0

I am using the perl XML::LibXML module to manipulate an XML file.

I want to remove the opening and closing tags of an XML node if it has a certain attribute, making its text and subnodes as a part of the parent of the node.

Here's an unsuccessful attempt. If fails with a insertBefore/insertAfter: HIERARCHY_REQUEST_ERR:

#!/usr/bin/env perl
use 5.020;
use warnings;
use XML::LibXML;

#the input xml

my $inputstr = <<XML;
<root>
<a>
<b class="deletethistag">keep this text<c>keep this c node</c>keep this text too</b>
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML

my $desiredstr = <<XML ;
<root>
<a>keep this text<c>keep this c node</c>keep this text too
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML

my $dom = XML::LibXML->load_xml(
string => $inputstr
);

# Convert $inputstr to $desiredstr *** doesn't work ***
foreach my $node ($dom->findnodes(q#//a/b[@class="deletethistag"]/*#)) {
    my $nodestring = $node->toString(1);
    say STDERR $nodestring;
    my $replacementnode = XML::LibXML->load_xml(string => $nodestring);
    $node->parentNode()->insertAfter($replacementnode, $node);
    $node->unbindNode();
    }
say $dom->toString(1);

I want to use the code to remove <span lang="en" xml:space="preserve">...</span> markup from a file, but I have framed it as a more general question so that I understand more of the details of working with XML::LibXML.

1
  • Using load_xml in the loop makes no sense, and the source of the problem. A node can't be owned by two different documents, and a document can't be owned by a document. Commented May 6, 2024 at 5:02

1 Answer 1

1

$node->childNodes() returns all the text nodes and other sub-nodes of $node.

Insert all the children of $node into $node's parent at the same place as $node. Then delete the original $node with $node->unbindNode()

Here's a working script:

#!/usr/bin/env perl
use 5.020;
use warnings;
use XML::LibXML;

#the input xml
my $inputstr = <<XML;
<root>
<a>
<b class="deletethistag">keep this text<c>keep this c node</c>keep this text too</b>
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML

my $desiredstr = <<XML ;
<root>
<a>
keep this text<c>keep this c node</c>keep this text too
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML

my $dom = XML::LibXML->load_xml(
string => $inputstr
);

for my $node ($dom->findnodes(q#//a/b[@class="deletethistag"]#)) {
    my $parent = $node->parentNode();
    for my $child_node ( $node->childNodes() ) {
        $parent->insertBefore($child_node, $node);
        }
    $node->unbindNode();
    }
say $dom->toString();

H/T: https://stackoverflow.com/a/31680169/22989509

Sign up to request clarification or add additional context in comments.

1 Comment

Note that I put a line feed following the <a> in the desired output to match the input

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.