0

I am using Perl v5.10 on CentOS 6.8

My program reads a list of host names into Perl array @aVmList. I am trying to extract only the machine name from each of them.

Some of the host names are fully qualified, some are not. Some contain dashes or underscores.

I have no control over the contents of the array.

Here is an example of the data I'm working with.

my @aVmList = qw(
    vmserver1.domain.com
    vmserver2
    vm-server-three.otherdomain.com
    server_four.domain.com
    server5
    server6
    some-silly-vm-name
    another_server.maybewithadomain.com
);

I would like to extract only the machine name from each element, ending up with the following.

vmserver1 
vmserver2
vm-server-three 
server_four 
server5
server6
some-silly-vm-name
another_server

I found the regex /(.*?)\./ which almost works, but only when all of the names are fully qualified.

foreach ( @aVmList ) {

    $_ =~ /(.*?)\./;

    my $sVmName = $1;

    print $sVmName;
}

I thought I needed to use a look-behind for the dots. I came up with the following

$_ =~ /([A-Za-z0-9-_]+)(?!=\.)/;

which seemed to work in the regex tester, but when I ran my Perl script it still matched the whole string.

I don't like the path I'm headed down with the regex pattern above, because now I'm assuming that the host names will only contain "word" characters or a hyphen.

I know I shouldn't have to account for special characters in host names, but I'm trying to base the regex pattern on matching anything before the first dot in a domain name suffix.something.com.

I also found Regular expression to extract hostname from fully qualified domain name which sounded like what I wanted, but neither of the suggestions from there seemed to work.

I tried:

$_ =~ (.+?)(?=\.)

and

$_ =~ ^([^.]+)\..*$
2
  • 1
    Split your string on dots and take the first part. Commented Jan 11, 2017 at 19:15
  • [A-Za-z0-9-_]+ is usually written [\w-] Commented Jan 11, 2017 at 20:16

3 Answers 3

1

The negated character class [^...] matches any character except those listed. Then

my ($name) = $_ =~ /([^.]+)/;

matches all characters up to the first . and stops at it, thus there is no reason to explicitly match the dot (nor the rest of the line). The match is captured and assigned to $name.


When the match operator is used in the list context it returns the list of all matches

my @matches = $var =~ m/$pattern/g;

Even if there is only one match we need the list context so that the match is returned, thus the parenthesis in my ($name) = ..., to impose the list context on the match operator. In the above example this is done by assigning to an array. Otherwise we'd have the scalar context, in which case the match operator behaves differently. See this in perlop and see perlretut.

The m above may be omitted and most often is. But note that this is not always the case, for example when different delimeters are used. I suggest a good read through perlretut.

The default input and pattern-searching space ($_) in your loop holds the currently processed element. Regex by default works with $_ so $_ need not be specified. See General Variables in perlvar, and see a regex-related comment in the perlop link. So you can do

foreach (@vm_list) {
    /([^.]+)/;           # OK but better assign directly from the match
    my $host_name = $1;
} 

However, it is clearer to assign directly from the match, as in the answer.

Sign up to request clarification or add additional context in comments.

2 Comments

I think you have let the OP down here. This is a very tiny part of their solution, and for goodness sake explain that $_ is the default operand.
@Borodin It didn't seem to me that this was a problem, but rather that they firstly needed to see the idiom ([^X]+). But on a more careful reading ... you are right about explanations, added.
1

I think you're making this more complicated than it needs to be. Split on periods and use the first part:

use strict;
use warnings;
use 5.012;

while (<DATA>) {
    chomp;
    say ((split(/\./))[0]);
}

__DATA__
vmserver1.domain.com
vmserver2
vm-server-three.otherdomain.com
server_four.domain.com
server5
server6
some-silly-vm-name
another_server.maybewithadomain.com

Output:

vmserver1
vmserver2
vm-server-three
server_four
server5
server6
some-silly-vm-name
another_server

Comments

0

There are no such things as "fully-qualified" or "partially-qualified" host names. The host name is the first part of a URL after the protocol name, and its contents are protocol-dependent and host-dependent. You must define what you mean before writing regular expression patterns

It is easy to separate parts of a string divided by dots, but you haven't specified which part or parts you want. It feels like you are casting about, writing varieties of random code in the hope that one of them works

This isn't really an answer, and you will never get a proper solution until you have established exactly what you need. It is very wrong to keep trying things until you get a correct output for your sample input. Your software will throw your company's business if you publish it like that. Your code must work for every input it could ever possibly have. That is why you must understand the meaning of your requirement instead of just the words and your small amount of data

Are you forced to use Hungarian notation like @aVmList? It is not very popular any more, and has no place in Perl, where the initial @ says that the item is an array, so a is superfluous and makes your program less readable. And it is the Perl way to avoid capital letters in identifiers for lexical variables, so your array would be much better as @vm_list

Your first attempt

$_ =~ /(.*?)\./;

is identical to

/(.*?)\./;

which does nothing at all other than possibly setting $1 if the pattern matches. You don't seem to have grasped the purpose of $_, and it's not the place to explain it fully here

Forget about look-around constructs. The first thing you need to do is to define a rule that extracts the required part of your host name. How do you do it when you look at a host name

What happens to a.b.c.d.co.jp?

What happens to a.b.c.vm-server-three.otherdomain.com.server_four.domain.com.co.uk?

You can't write those off on the basis that your code will never see such strings. If you cannot be certain that they have already been validated by the calling code then you must check them yourself before you attempt to extract the appropriate part.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.