0

i try to parse a html page which a have loaded with perl. i need to get the src="asd/jkl/xyz.css" for example out of the html-repsone to manipulate the path to an absolute.

the reason why i want to do this is, that is need the css inline in a E-Mail head ...

so my try to realize this is:

  1. load the page via perl
  2. get the src of the linked css
  3. load the css files via perl
  4. parse the css und put the contents of the css files in the head-tag of my generated email.

has anyone a better idea or a working regex?

3
  • 8
    use... parser... not... regex... Commented Jan 15, 2014 at 17:13
  • possible duplicate of RegEx match open tags except XHTML self-contained tags Commented Jan 15, 2014 at 17:42
  • If you want to use a regex, you have to show the exact text you will be parsing: there's a difference between <link rel="stylesheet" href="foo.css"> and <link href="foo.css" rel="stylesheet">, for example. Having said that, it is rarely a good idea to parse HTML with regex. Use a real HTML parser as tenub suggested. Commented Jan 15, 2014 at 17:43

1 Answer 1

1

Try something like this:

#!/usr/bin/env perl

use XML::LibXML;

my $parser = XML::LibXML->new();
my $doc = $parser->load_html(location => "http://mywebsite.com", recover => 2);

print $doc->findnodes('//link[@rel="stylesheet"]/@src');

Reference: http://metacpan.org/pod/XML::LibXML

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.