1

I have an XML file that is not bound by lines. It has the tags <tag1> and </tag1> that has some trashed variables from the code that generated it (I am not able to correct that right now). I would like to be able to change the characters within these tags to correct them. The characters are sometimes special.

I have this Perl one-liner to show me the contents between the tags, but now I want to be able to replace in the file what it has found.

perl -0777 -ne 'while (/(?<=perform_cnt).*?(?=\<\/perform_cnt)/s) {print $& . "\n";      s/perform_cnt.*?\<\/perform_cnt//s}' output_error.txt

Here's an example of the XML. Notice the junk chars in-between the tags perform_cnt.

<text1>120105728</text1><perform_cnt>ÈPm=</perform_cnt>
<text1>120106394</text1><perform_cnt>†AQ;4K\_Ô23{YYÔ@Nx</perform_cnt>

I need to replace these with like a 0.

2
  • 1
    Please update your question with sample of the input file that you need to process. Commented Apr 17, 2012 at 12:52
  • 9
    Have you tried using an XML parser instead? Commented Apr 17, 2012 at 12:56

2 Answers 2

8

I love XML::Twig for these sorts of things. It takes a little getting used to, but once you understand the design (and a little about DOM processing), many things become extremely easy:

use XML::Twig;

my $xml = <<'HERE';
<root>
<text1>120105728</text1><perform_cnt>ÈPm=</perform_cnt>
<text1>120106394</text1><perform_cnt>†AQ;4K\_Ô23{YYÔ@Nx</perform_cnt>
</root>
HERE

my $twig = XML::Twig->new(   
    twig_handlers => { 
        perform_cnt   => sub { 
            say "Text is " => $_->text;  # get the current text

            $_->set_text( 'Buster' );    # set the new text
            },
      },
    pretty_print => 'indented',
    );

$twig->parse( $xml );
$twig->flush; 

With indented pretty printing, I get:

<root>
  <text1>120105728</text1>
  <perform_cnt>Buster</perform_cnt>
  <text1>120106394</text1>
  <perform_cnt>Buster</perform_cnt>
</root>
Sign up to request clarification or add additional context in comments.

Comments

0

It is a bad practice to use regex for xml parsing

Anyway - the code is:

#!/usr/bin/perl

use strict;
use warnings;

my $tag = 'perform_cnt';

open my $fh, '<file.txt' or die $!;
foreach (<$fh>) {
  s/(<$tag>)(.*?)(<\/$tag>)/$1$3/g;
  print "$_";
}
close $fh;

And output is:

<text1>120105728</text1><perform_cnt></perform_cnt>
<text1>120106394</text1><perform_cnt></perform_cnt>

9 Comments

If you wanna eliminate <perform_cnt></perform_cnt> from output, then replace in code /$1$3/ with //.
also, output for print "$_" is not the best. use print;
@loldop - If you are looking for short code, then maybe. Otherwise I don't see a reason for that. Short code then can look like s/(<$tag>)(.*?)(<\/$tag>)/$1$3/g && print for <$fh>; replacing the entire foreach loop.
it's the same. if you want, use print; print "\n"; OR print "$_\n"; but ordinary i use say function say{ return (@_,"\n");}
@loldop - I know what is that, but it is just not standard use and actually say is from Perl 5.10+ I believe, so not each Perl got it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.