0

I need some help on string parsing in perl. I've an http server that respond with something like this:

<html>
<head><title></title></head><body>
T:17.10;H:32.10
</body></html>

I need to catch the two numbers (in the example 17.10 and 32.10) and put them in two variables that I will use for do some if...then...else cycle.

I'm not so expert in string manipulation and regex, at the moment I'm tring to do this:

my $url = 'http://192.168.25.9';
my $content = get $url;
die "Couldn't get $url" unless defined $content;
my @lines = split /\n/, $content;
$content2 = $lines[2];
$content2 =~ tr/T://d;
$content2 =~ tr/H://d;
my @lines2 = split /;/, $content2;
$tem = $lines2[0];
$hum = $lines2[1];

$tem =~ m{(\d+\.\d+)};
$hum =~ m{(\d+\.\d+)};

but when I print out the line I see something strange: characters missing, space in the line, etc. It seems that I've some strange invisible characters that create confusion.

Could you suggest me a better way for have the two number in two numeric variables?

Thanks Fabio

2
  • Which variables are you printing and what are you seeing? Use Data::Dumper to display the content of your variables. Commented Jan 13, 2015 at 18:51
  • How are you getting the html? Which module does get come from. Commented Jan 13, 2015 at 19:59

3 Answers 3

6

A complete solution, avoiding parsing HTML with REGEX (ref: RegEx match open tags except XHTML self-contained tags ) :

use strict; use warnings;

# base perl module to fetch HTML
use LWP::UserAgent;
# base perl module to parse HTML
use HTML::TreeBuilder;

# fetching part
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(GET => "http://192.168.25.9");
my $res = $ua->request($req);
die $res->status_line, "\n" unless $res->is_success;

# parsing part
my $tree = HTML::TreeBuilder->new();
# get text from HTML
my $out = $tree->parse($res->decoded_content)->format;
# extract the expected string from the text output
if ($out =~ /^\s*T:(\d{2}\.\d{2});H:(\d{2}\.\d{2}).*/) {
    print join "\n", $1, $2;
}

OUTPUT:

17.10
32.10
Sign up to request clarification or add additional context in comments.

1 Comment

I don't see any point in involving HTML::TreeBuilder at all -- certainly not just for the purpose of formatting the HTML. It would also be wise to use decoded_content instead of content, as you don't know whether the HTTP content is compressed.
2

Specifically for such requests you can do so:

my ($t, $h) = map { (/T:(\d+|\d+.\d+);H:(\d+|\d+.\d+)/)?($1, $2):() } @req;
print "$t, $h\n", $t * $h;

Output:

17.10, 32.10
548.91

where @req is an array with chomped strings of the received request

2 Comments

This solution seems fine to me, and I don't understand the downvote. The other solutions go into a lot of unnecessary work to format or strip the HTML, which is entirely unnecessary unless you wanted to ensure that the required text is the sole content of the <body> element, which none of them do. Your regex is a little naive though, as I think there is a good chance that the numeric values may look like 123.4 or 2.8896, or even 42, and your pattern will match none of these.
Thanks for the support, Borodin. Agree with your comments. Fixed regexp. Perhaps now it's more flexible.
1

For your purpose, this is all you need:

my ($tem, $hum) = $content =~ /T:(\d{2}\.\d{2});H:(\d{2}\.\d{2})/;

If you need more general parse (e.g. to support a temperature or humidity >= 100, single digit values, etc...):

my ($tem, $hum) = $content =~ /T:(\d+(?:\.\d+)?);H:(\d+(?:\.\d+)?)/;

2 Comments

I don't see any need to remove the HTML markup. The data is either there or it is not, although I guess there is an infinitesimal chance that a false match could be found in the value of one of the attributes.
That's true, he's not really parsing html here. The answer just comes down to the one line: my ($tem, $hum) = $content =~ /T:(\d{2}.\d{2});H:(\d{2}.\d{2})/;

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.