2

I want to use regex from a source file named source.html or source.txt:

<OPTION value=5>&nbsp;&nbsp;5 - Course Alpha (3)</OPTION> <OPTION value=6>&nbsp;&nbsp;6 - Course Beta (3)</OPTION>

to get:

5 - Course Alpha (3)
6 - Course Beta (3)

I mean I have to find a pattern:

<OPTION v

and

 finding first number after it 

so getting everything till I see:

</OPTION>

How can I implement it with Perl using Regex?

PS: It should read the content from a file and write output to a file.

4
  • maybe m/;(\d+)\s+-\s+?.+?)(/ Commented Apr 13, 2011 at 13:56
  • 1
    possible duplicate of HTML parsing in perl Commented Apr 13, 2011 at 14:00
  • How can I install Mojo::DOM to my system? Commented Apr 13, 2011 at 14:05
  • 1
    Have a look through stackoverflow.com/search?q=installing+perl+modules Commented Apr 13, 2011 at 14:12

3 Answers 3

5

You do not want to use a regex, you want to use an HTML parser. Here's a good article on the subject which explains why regexes are fragile and how to use HTML::TreeBuilder.

There's also a small pile of similar questions and answers about extracting data from HTML documents.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your answers and ideas. I am reading them. Just one thing more, I can use any other libraries however I want to implement regex for this.
Schwern, while I support your position about how challenging it can be to use regexes on general, open-ended HTML, we do not know enough about the original poster’s situation to know whether that position is fully justified and applicable in his case. As I know you know, regexes on discrete and well-accounted-for HTML snippets is perfect reasonable, and indeed at times even preferred. If they can control or otherwise delineate the limits of the problem-space to a small enough subset, a regex is a lot easier than a parsing approach, but if they cannot, then it is not. Agreed?
@tchrist Yeah. I'm still not trusting him with a gun.
1
perl -lwe '$_="<OPTION value=5>&nbsp;&nbsp;5 - Course Alpha (3)</OPTION> <OPTION value=6>&nbsp;&nbsp;6 - Course Beta (3)</OPTION>"; s/\&nbsp;//g; print $1 while /<OPTION [^>]*>([^<]+)/g'

2 Comments

You should format code by putting four spaces before each line. You can also select it and click the {} button. More helpful tips at the Markdown Editing Help page.
I tried perl -0777ne s/\&nbsp;//g; "print $1 while /<OPTION [^>]*>([^<]+)/g" source.txt however it says 'nbsp' is not recognized as an internal or external command, operable program or batch file. cos I have &nbsp; at my file?
0

What about

/<OPTION v.*?>.*?(\d.+?)<\/OPTION>/

http://regexr.com?2thm8

There you will find your strings in the first capturing group.

8 Comments

It would be nice to get a reason for down votes. Otherwise its not possible that I improve my answer, and I am also just a human. OK, I recognised an error and will change it.
@stema thanks for your answer. How can I check it with website that you wrote here?
You gave the right answer to the wrong question. The OP does not know that using regexes to parse HTML is a bad idea, a straight answer is not helpful.
@Schwern, I know that. But for simple cases to get some values regex can be an option. I don't want to have an argue if this is here the case or not. I don't parse html so I can't advice him in the use of those tools. But +1 for your answer. Always use the right tool!
@kamaci The use of this page is quite straight forward. Enter your regex on the top and your test strings into the large text field. Matches are marcked in blue and if you move the mouse over a match it shoes you the content of the capturing groups. On the right there is a help on the different regex expressions. Be aware regexes differs from language to language, I don't know what regex engine is behind that side, I use it anyway for Perl, for the most expressions its no problem.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.