Perl regex splitting a single line

Question

I'm having some problems with regex in Perl.

I'm having a line: #23 = CARTESIAN_POINT ( 'NONE', ( -1.822612853216911200, 55.22284222837789300, 8.566382866014988600 ) ) ;

And I want to split the line into different values.

Right now I have (#[0-9]+)\s=\s([A-Z]+_[A-Z]+)\s(.*) this. This will have these values as output:

$array[0]=#23
$array[1]=CARTESIAN_POINT
$array[2]=( 'NONE',  ( -1.822612853216911200, 55.22284222837789300, 8.566382866014988600 ) ) ;

I want this line: ( 'NONE', ( -1.822612853216911200, 55.22284222837789300, 8.566382866014988600 ) ) ; to split up to different values like.

PARAM[0] = 'NONE',
PARAM[1] = ( -1.822612853216911200, 55.22284222837789300, 8.566382866014988600 )

or

PARAM[0] = 'NONE',
PARAM[1] = -1.822612853216911200
PARAM[2] = 55.22284222837789300
PARAM[3] = 8.566382866014988600

But I can't quite figure out how to do it. I tried different things but none of them is mentioning worthy.

I hope someone is able to help me or point me in the right direction. Thanks in advance!

zdim · Accepted Answer · 2019-09-13 07:44:48Z

3

This is fairly straightforward when broken into multiple (two) steps.

First extract the text with coordinates, the stuff inside CARTESIAN_POINT( ... )

my ($coord_text) = $string =~ /= \s+ [A-Z_]+ \s+ \( \s* (.+) \s* \)/x;

where /x allows for those spaces inside, for readability. The .+ is greedy and gets everything up to the very last ), including the nested (...). Then get coordinates out of that

my @coords = $coord_text =~ /([A-Z]+|[0-9-.]+)/g;

Here we allow either a word (like that NONE), or a number (in shown format^†).

Altogether, with the intermediate step "hidden" inside a do lexical scope

use warnings;
use strict;
use feature 'say';

my $string = q(#23 = CARTESIAN_POINT ( 'NONE', ( -1.822612853216911200, 55.22284222837789300, 8.566382866014988600 ) ) ; );

my @coords = do {
    my ($coord_text) = $string =~ /=\s+[A-Z_]+\s+\(\s*(.+)\s*\)/; 
    $coord_text =~ /([A-Z]+|[0-9-.]+)/g;
};

say for @coords;

This is easily tweaked for variations in requirements/outcomes, slight or major

To capture quotes around NONE as well (shown in OP), add quotes to the character class for the word, [A-Z\x22\x27]. I use hex in case this is a "one-liner" in a bash script or some such, since context isn't specified. In a normal script you can use " and '
To get numbers in a string instead of a list, as mentioned in the question, use
```
$coord_text =~ /([A-Z]+|$[^)]+$)/g;
```
instead of the second statement in the do block above

I assume that you have a list containing either words (like NONE) or straight lists of coordinates (numbers), without any further nesting or similar syntactic complexities.

Note If the input can be a multiline string then add /s modifier to the regex. With it the . matches a newline as well and it all works the same as above (it does in my tests). This should only be needed in the first regex, making it

my ($coord_text) = $string =~ /=\s+[A-Z_]+\s+\(\s*(.+)\s*\)/s;

but it won't hurt in the other one either.

^† The used character class [0-9-.] also allows garbage (like -.-2 etc). If you need to confirm that you indeed have a number in the given format please add checks for that. The best way to test for a number is looks_like_number from Scalar::Util.

edited Sep 13, 2019 at 7:44

answered Sep 11, 2019 at 16:47

zdim

67.2k5 gold badges59 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

mHvNG Over a year ago

Thanks for the detailed explanation! I’ll try it as soon as I can.

zdim Over a year ago

@mHvNG You are most welcome. Note that I edited a little in the meanwhile, and in particular I just added a "Note" at the end about how to modify it to work with multiline stirngs.

mHvNG Over a year ago

Thanks for the help. Its really appreciated!

Dave Cross · Accepted Answer · 2019-09-11 15:36:51Z

This is what Text::Balanced is for.

#!/usr/bin/perl

use strict;
use warnings;

use Text::Balanced qw[extract_bracketed];
use Data::Dumper;

while (<DATA>) {
  # Extract the bit of your string between the first and last brackets
  my $extracted = extract_bracketed($_, '(', '[^()]*');
  # Then split what's left on strings of brackets, whitespace and commas.
  # But grep the list to remove any zero-length strings that you get.
  my @bits = grep { length } split /[\(\)\s,]+/, $extracted;
  print Dumper \@bits;
}

__DATA__
#23 = CARTESIAN_POINT ( 'NONE',  ( -1.822612853216911200, 55.22284222837789300, 8.566382866014988600 ) ) ;

Output:

$VAR1 = [
          '\'NONE\'',
          '-1.822612853216911200',
          '55.22284222837789300',
          '8.566382866014988600'
        ];

MonkeyZeus · Accepted Answer · 2019-09-11 15:25:05Z

0

You need to repeat your pattern as many times as needed and supply the appropriate capture groups:

#[0-9]+\s*=\s*[A-Z]+_[A-Z]+\s*\(\s*'([A-Z]+)',\s*\(\s*(-?\d+\.\d+),\s*(-?\d+\.\d+),\s*(-?\d+\.\d+)

https://regex101.com/r/GJ6yDi/1/

answered Sep 11, 2019 at 15:25

MonkeyZeus

20.8k4 gold badges41 silver badges83 bronze badges

1 Comment

mHvNG Over a year ago

Thanks! I'll try it as soon as I can.

Jeff Y · Accepted Answer · 2019-09-11 17:59:32Z

0

If you don't care about the nesting, and just want to get all the "values" into an array, you might consider the simpler solution of just splitting on a discard of all unwanted (non-value) characters: /[(),;=\s]+/

$ cat line
#23 = CARTESIAN_POINT ( 'NONE',  ( -1.822612853216911200, 55.22284222837789300, 8.566382866014988600 ) ) ;

$ perl -ne '@array = split /[(),;=\s]+/; print join "|", @array; print "\n"' line
#23|CARTESIAN_POINT|'NONE'|-1.822612853216911200|55.22284222837789300|8.566382866014988600

answered Sep 11, 2019 at 17:59

Jeff Y

2,4661 gold badge14 silver badges18 bronze badges

Collectives™ on Stack Overflow

Perl regex splitting a single line

4 Answers 4

3 Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related