I need to parse some values from HTML. I'm using the following regex to parse out some groups, but am having difficulty when there are optional tags in the middle of the HTML. I need some rule to pull out the values from repeated version of the HTML page, even when the optional tags are included.
onclick="return raise('SelectFare', new SelectFareEventArgs(1, 3, 'F'))" required="true" requiredError="Please select a flight and fare in every market."></td><td>Regular Fare</td><td>Adult<br></td><td align="right" style="font-size:110%;">91.99 EUR<br><div style="font-style: italic; font-size: 10px;">Only<span style="color: red;"> 4 </span>seats left at this fare</div></td><td></td><td><b>Fri</b>30 Sep 11<br><b>Flight</b>FR 818</td><td>15:10 Depart<br>16:15 Arrive</td></tr><tr id="1_2011_8_30_23_45_00"><td><div class="planeImg1" title="Click to select this fare on this flight"></div></td><td><input
For example, the optional <div style="font-style: italic; font-size: 10px;">Only<span style="color: red;"> 4 </span>seats left at this fare</div> section of this is messing it up.
tr><tr id="1_2011_9_21_16_05_00"><td><div class="planeImg1" title="Click to select this fare on this flight"></div></td><td><input id="AvailabilityInputFRSelectView_RadioButtonMkt1Fare2" type="radio" name="AvailabilityInputFRSelectView$market1" value="H~HDIS1~XXXC~~RoundFrom|FR~ 816~ ~~DUB~10/21/2011 14:55~EDI~10/21/2011 16:05" onclick="return raise('SelectFare', new SelectFareEventArgs(1, 2, 'H'))" required="true" requiredError="Please select a flight and fare in every market."></td><td>No Taxes</td><td>Adult<br></td><td align="right" style="font-size:110%;"><strike style="color:#F00;font-size:80%;"><b style="color: #999;">22.99 EUR</b></strike>
(-35%)
<br>14.94 EUR<br></td><td></td><td><b>Fri</b>21 Oct 11<br><b>Flight</b>FR 816</td><td>14:55 Depart<br>16:05 Arrive</td></tr><tr id="1_2011_9_21_16_15_00"><td><div class="planeImg1" title="Click
The
<strike . . </strike>. . (-35%). . <br>14.94 EUR<br></td>
part of the HTML above is messing it up as well.
This is the regex I'm trying (and various other versions!!):
"Please select(?:.*?)<td>(.*?)</td><td>(.*?)<br></td><td align=\"right\" style=\"font-size:110%;\">(.*?)<br>(.*?)<br>(?:.*?)</b>(.*?)<br><b>Flight</b>(.*?)</td><td>(.*?)<br>(.*?)</td>"
I'd appreciate any help at all on this, or even a reference to learning how to parse out optional HTML tags altogether.
Thanks.