1

I have been trying a while to get this working but not having luck. Here is my text file (first.txt)

<metric>
 <baseFilter>
  <and>
   <or>
    <value field="id">1111</value>
    <value field="id">2222</value>
   </or>
   <or>
    <value field="resolution" />
   </or>
</metric>

I want to replace the strings between the first "or" and "/or" with these strings which is second text file (second.txt). I can have 50 or more value field lines between the first "or" and "/or", hence, i am searching for strings between "or" and "/or" and replacing with whatever in second.txt.

<value field="id">3333</value>
<value field="id">4444</value>

Expected output:

<metric>
 <baseFilter>
  <and>
   <or>
    <value field="id">3333</value>
    <value field="id">4444</value>
   </or>
   <or>
    <value field="resolution" />
   </or>
</metric>

I have got the following perl code for that.

#!/usr/bin/perl

my $first = 'first.txt';
open (my $fh, '<', $first) or die "cannot open file $first";
{
  local $/;
  $first = <$fh>;
}

$find = "([\s]+)(<or>)([\n\r\s]+).*(\n|.)+?([\n\r\s]+)(<\/or>)";

my $content = 'second.txt';
open (my $fh, '<', $content) or die "cannot open file $content";
{
 local $/;
 $content = <$fh>;
}

$first =~ s/$find/$1$2$3$content$5$6/;
print "After sub First is $first\n\n";

When I run my code, the substitution is not happening and my $first remains the same, ie, first.txt appears again. What am I missing ? I used my regex in an online regex tester like http://www.regexr.com/, my regex matches the multi-line string between the first "or" and "/or". Why is perl not liking my regex ?

4
  • can't you just search for #<value field="id">.*?</value>\s+<value field="id">.*?</value>#m and replace that? Commented Oct 1, 2014 at 21:38
  • The reason I am searching for string between the first "or" and "/or" is that I could have 50 value field lines and I need to replace with whatever in second.txt. Commented Oct 1, 2014 at 21:44
  • You should probably edit your question to state that, in that case... Commented Oct 1, 2014 at 21:48
  • I feel like someone needs to mention stackoverflow.com/a/1732454/3897316 Commented Oct 1, 2014 at 22:04

3 Answers 3

2

You are overcomplicating things in your match by trying to capture all those pieces of XML. The following regex is a much simpler way to perform the substitution:

$first =~ s#(<or>\s+)<value field="id">.*?</value>(\s*</or>)#$1$content$2#sm;

I've used the modifiers s and m, which allow matching over multiple lines, and allow . to include new line characters; thus we can replace any number of lines between the <or> opening and closing tags. I've also used # as a delimiter for my regex so I don't have to faff around with escaping the slashes in the XML close tags.

See perlre for more information on regular expressions and in particular, on modifiers.

Sign up to request clarification or add additional context in comments.

Comments

0

As always, it is a very bad idea to manipulate XMNL using regular expressions. So that you can see how simple it is to do things "properly", this program does what you ask using the XML::LibXML module.

  • An XML parser object is created and used to parse each line of the second.xml file, putting them into the @fragments array for use later

  • The first.xml file is parsed, and findnodes finds all the or elements, the first of which is emptied with removeChildNodes and filled again with each line from @fragments using appendChild

  • Finally the XML is formatted using toString and printed

use strict;
use warnings;
use 5.010;
use autodie;

use XML::LibXML;

my $parser = XML::LibXML->new(no_blanks => 1);

open my $fh, '<', 'second.xml';
my @fragments = map {
   chomp;
   $parser->parse_balanced_chunk($_);
} <$fh>;
close $fh;

my $xml = $parser->load_xml(location => 'first.xml');

my @or_nodes = $xml->findnodes('//or');
$or_nodes[0]->removeChildNodes;
$or_nodes[0]->appendChild($_) for @fragments;

print $xml->toString(1);

output

<?xml version="1.0"?>
<metric>
  <baseFilter>
    <and>
      <or>
        <value field="id">3333</value>
        <value field="id">4444</value>
      </or>
      <or>
        <value field="resolution"/>
      </or>
    </and>
  </baseFilter>
</metric>

Comments

0

First load your new values into an array.

Then use $INPLACE_EDIT to edit your file using logic like so:

#!/usr/bin/perl
use strict;
use warnings;

my @newvals = qw(3333 4444);

while (<DATA>) {
    s{<value field="id">\K\w+(?=</value>)}{shift @newvals}e if @newvals;
    print;
}

__DATA__
<metric>
 <baseFilter>
  <and>
   <or>
    <value field="id">1111</value>
    <value field="id">2222</value>
   </or>
   <or>
    <value field="resolution" />
   </or>
</metric>

Outputs:

<metric>
 <baseFilter>
  <and>
   <or>
    <value field="id">3333</value>
    <value field="id">4444</value>
   </or>
   <or>
    <value field="resolution" />
   </or>
</metric>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.