2

There is string "-test aaaa -machine bbb -from ccc"

How to extract "aaaa", "bbb", "ccc" using regular?

Even string is "-from   ccc   -test    aaaa    -machine bbb"
(Different order, several space....)

I had tried some code, but always got invalid data.

$str = "-test aaaa     -machine  bbb  -from ccc";
$str =~ /-test\s*(.*)\s*/;

print

aaaa   -machine  bbb  -from ccc

I also want to handle the below case

-test aa_aa -machine aab-baa-aba -from ccc
1
  • 1
    This will work for your test data: perl -e 'use strict; use warnings; my $str = "-test aaaa -machine bbb -from ccc"; while ($str =~ m/ (\w+)/g) { print $1."\n"; }' Commented Oct 5, 2016 at 8:05

4 Answers 4

7

You don't have to use a regex, you can use a hash for that.

use strict;
use warnings;
use Data::Dumper;

my $str = '-test aaaa   -machine  bbb  -from ccc';
my %field = split ' ', $str;
print Dumper(\%field);

The output:

$VAR1 = {
          '-from' => 'ccc',
          '-machine' => 'bbb',
          '-test' => 'aaaa'
        };

No matter what the order is, the split returns an array of pairs (in the shape [word1, word2, word3, word4, word5, word6] and word1, word3, word5 will be -field_name) that when assigned to a hash, creates it in the way that now, if you want to get the string after -test for example, you just access it by typing $field{"-test"} and do whatever you want with it.

EDIT: It doesn't even matter how many spaces you have in between the words or what characters are in the words. It works the same way for all cases as long as you keep it in the format -some_field something -another_field another_thing ...

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. I had never think about "split". It's a good choice =)
6

I'm going to answer the question that (I think) underlies your question - not the question that you asked.

It looks to me like you are parsing command-line options. So use a command-line option parser, rather than reinventing that for yourself. Getopt::Long is part of the standard Perl distribution.

#!/usr/bin/perl

use strict;
use warnings;
# We use modern Perl (here, specifically, say())
use 5.010;

use Getopt::Long 'GetOptionsFromString';
use Data::Dumper;

my %options;

my $str = '-test aa_aa -machine aab-baa-aba -from ccc';
GetOptionsFromString($str, \%options, 'test=s', 'machine=s', 'from=s');

say Dumper \%options;

Normally, you'd use the function GetOptions() as you're parsing the command-line options that are available in @ARGV. I'm not sure how the options ended up in your string, but there's a useful GetOptionsFromString() function for this situation.

Update: To explain why your code didn't work.

$str = "-test aa_aa     -machine  aab-baa-aba  -from ccc";
$str =~ /-test\s*(.*)\s*/;

You're capturing what matches (.*). But .* is greedy. That is, it matches as much data as it can. And, in this case, that means it matches until the end of the line. There are (at least!) a couple of ways to fix this.

1/ Make the match non-greedy by adding ?.

$str =~ /-test\s*(.*?)\s*/;

2/ Be more explicit about what you're looking for - in this case non-whitespace characters.

$str =~ /-test\s*(\S*)\s*/;

Comments

1
my @matches;
my $regex = qr/-\w+\s+([\w-]+)/;

my $string = q{-test aaaa -machine bbb -from ccc};
@matches = $string =~ /$regex/g;
print "Matches for first string are: @matches\n";

my $other_string = q{-from   ccc   -test    aaaa    -machine bbb};
@matches = $other_string =~ /$regex/g;
print "Matches for second string are: @matches\n";

my $third_string = q{-test aa_aa -machine aab-baa-aba -from ccc};
@matches = $third_string =~ /$regex/g;

print "Matches for third string are: @matches";

2 Comments

Thanks a lot...But if some symbol in data like "-test aa_aa -machine aab-baa-aba -from ccc". How to get data correctly?
Thanks for your reply
-2

This should do the trick

$str = "-test aa_aa     -machine  aab-baa-aba  -from ccc";
($test,$machine,$from) = $str =~ /\-test(.+)\-machine(.+)\-from(.+)/;

print "Test: $test, Machine: $machine, From: $from";

1 Comment

The order of -test, -machine and -form in the string can change (as explain in the question), in which case your solution won't work. Moreover, you are capturing the white-spaces as well with your .+, which isn't ideal. And if an additional parameter were to be added, it will be captured as the value of one of the previous parameter. Also, there is no need to escape - in a regex (except sometimes inside [...].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.