1

I have a file with the following random structures:

USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="AAA" FORMAT="ascii" TEXT="L2"

or

USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="BBB" THRESHOLDID="1" FORMAT="ascii" TEXT="L2"

I am trying to parse it with perl to get the values like the following:

1362224754632;00966590832186;580;AAA;L2

Below is the code:

if($Record =~ /USMS (.*?)|<REQ MSISDN="(.*?)" CONTRACT="(.*?)" SUBSCRIPTION="(.*?)" FORMAT="(.*?)" THRESHOLDID="(.*?)" TEXT="(.*?)"/)
{
                              print LOGFILE "$1;$2;$3;$4;$5;$6;$7\n";
}
elsif($Record =~ /USMS (.*?)|<REQ MSISDN="(.*?)" CONTRACT="(.*?)" SUBSCRIPTION="(.*?)" FORMAT="(.*?)" TEXT="(.*?)"/)
{
                              print LOGFILE "$1;$2;$3;$4;$5;$6\n";
}

But I am getting always:

;;;;;

4 Answers 4

3

Pipe (|) is a special character in regular expressions. Escape it, like: \| and it will work.

if($Record =~ /USMS (.*?)\|<REQ MSISDN="(.*?)" CONTRACT="(.*?)" SUBSCRIPTION="(.*?)" FORMAT="(.*?)" THRESHOLDID="(.*?)" TEXT="(.*?)"/)

and the same for the else branch.

Sign up to request clarification or add additional context in comments.

Comments

3

Instead of using a single regex, I would split the data into its separate sections first, then approach them separately.

my($usms_part, $request) = split / \s* \|<REQ \s* /x, $Record;

my($usms_id) = $usms_part =~ /^USMS (\d+)$/;

my %request;
while( $request =~ /(\w+)="(.*?)"/g ) {
    $request{$1} = $2;
}

Rather than having to hard code all the possible key/value pairs, and their possible orderings, you can parse them generically in one piece of code.

2 Comments

@Schwern - I up'd your answer. If any of keys are changed in the data file, the original regular expressions will fail - this includes: order of keys, spelling of the keys, and the key counts. Far better to have a more open (or general-purpose?) capturing of the key/value pairs, to account for future changes. And storing in an indexed hash, good add. Though I would have used some form of 'record-number' for the first level key: '$request{$rnum}{$1} = $2;'
I would follow that approach as well.
1

Change

(.*?) 

to

([a-zA-Z0-9]*)

1 Comment

This would give him 1362224754632;;;;; instead of just ;;;;;, but wouldn't fix the unescaped pipe problem. It's still good advice in general, though.
0

It looks like all you want is the fields contained in double-quotes.

That looks like this

use strict;
use warnings;

while (<DATA>) {
  my @values = /"([^"]+)"/g;
  print join(';', @values), "\n";
}

__DATA__
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="AAA" FORMAT="ascii" TEXT="L2"
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="BBB" THRESHOLDID="1" FORMAT="ascii" TEXT="L2"

output

00966590832186;580;AAA;ascii;L2
00966590832186;580;BBB;1;ascii;L2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.