I'm using regular expressions to extract data from a website, but now I found a problem.
This is part of the original HTML that I want to parse. I want to extract the text after "descuentos-" and the city, after the "<a href>".
<div id="cities2_2">
<a href = "http://website.com/descuentos-espana/">Badajoz</a>
<a href = "http://website.com/descuentos-espana/">Badalona</a>
<a href = "http://website.com/descuentos-barcelona/">Barcelona</a>
<a href = "http://website.com/descuentos-bilbao/">Bilbao</a>
<a href = "http://website.com/descuentos-espana/">Burgos</a>
</div>
</div>
<div class="capa_cities" onmouseover="act_formato(3, 2);"
onmouseout="desact_formato(3, 2);">
<h2 id="title_city3_2">C</h2>
<div id="cities3_2">
<a href = "http://website.com/descuentos-espana/">Cáceres</a>
<a href = "http://website.com/descuentos-cadiz/">Cádiz</a>
<a href = "http://website.com/descuentos-espana/">Cartagena</a>
<a href = "http://website.com/descuentos-espana/">Castellón</a>
<a href = "http://website.com/descuentos-espana/">Ceuta</a>
<a href = "http://website.com/descuentos-espana/">Ciudad Real</a>
<a href = "http://website.com/descuentos-cordoba/">Córdoba</a>
<a href = "http://website.com/descuentos-espana/">Cuenca</a>
I could look for <a href = "http://website.com/descuentos-(.*)">, but there are others that match the pattern in the website. So I now have this pattern:
#<div id="cities[0-9]+_2">(<a href = "http://website.com/descuentos-(.*?)/">(.*?)</a>)*#
I'd like to have it recursive. I mean: for each "<a href = "http://website.com/descuentos-(.* )/">(.*)</a>" found, search for the two small patterns inside.
Is there a way to achieve this in regex, or I have to reprocess it through preg_match_all?