0

I'm trying to pull data out of an html file into an array using PHP regex. Below are two rows of the datafile. I want to extract the partnumber (the 9517170 is one example), model, make, and the download URL. Here is my failed regex attempt to extract the part number and URL:

/Row[0|1] ([0-9]+)"(.*?)(\/component[0-9a-zA-Z_:-\/]+)/

Any regex gurus out there that can get me pointed in the right direction?

Thanks!

    <tr id="table_6_row_127" class="fabrik_row oddRow1 9517170">
            <td class="fabrik_row___jos_baseplates___DemcoPart" ><a class='fabrik___rowlink' href='/baseplates/fitlist/details/6/6/127.html'>9517170</a></td>
            <td class="fabrik_row___jos_baseplates___Make" >Subaru</td>
            <td class="fabrik_row___jos_baseplates___Model" >Legacy Outback *4</td>
            <td class="fabrik_row___jos_baseplates___Years" >03-04</td>
            <td class="fabrik_row___jos_baseplates___A" >3</td>
            <td class="fabrik_row___jos_baseplates___B" >25</td>
            <td class="fabrik_row___jos_baseplates___C" >23</td>
            <td class="fabrik_row___jos_baseplates___D" >15 1/2</td>
            <td class="fabrik_row___jos_baseplates___Price" >370</td>
            <td class="fabrik_row___jos_baseplates___Download" ><a href='/component/docman/doc_download/250-tp20170.html' target='_self'>TP20170</a></td>
    </tr>
<tr id="table_6_row_431" class="fabrik_row oddRow0 9518272">
            <td class="fabrik_row___jos_baseplates___DemcoPart" ><a class='fabrik___rowlink' href='/baseplates/fitlist/details/6/6/431.html'>9518272</a></td>
            <td class="fabrik_row___jos_baseplates___Make" >Subaru</td>
            <td class="fabrik_row___jos_baseplates___Model" >Outback *4*9</td>
            <td class="fabrik_row___jos_baseplates___Years" >10-11</td>
            <td class="fabrik_row___jos_baseplates___A" >3</td>
            <td class="fabrik_row___jos_baseplates___B" >30</td>
            <td class="fabrik_row___jos_baseplates___C" >25-1/8"</td>
            <td class="fabrik_row___jos_baseplates___D" >17-1/4"</td>
            <td class="fabrik_row___jos_baseplates___Price" >370</td>
            <td class="fabrik_row___jos_baseplates___Download" ><a href='http://demco-products.com/component/docman/doc_download/921-tp20272.html' target='_self'>tp20272</a></td>
    </tr>
2

1 Answer 1

2

Use DOMDocument::loadHTML? It uses libxml under the hood which is fast and robust.

Don't try to parse HTML with regex's.

I made that bold because I see it a lot on here and the solutions are always fragile at best and buggy at worst. Once you use a true HTML parser to get the attributes you want then using a regex is more reasonable.

Sign up to request clarification or add additional context in comments.

2 Comments

I've read the documentation on loadHTML(), but it is not at all clear how I can use that function to put the variables I want into a PHP array. There also do not seem to be any examples out there for extracting tabular data using that function. Anyone know of a good tutorial on this?
I believe you can use xpaths to get an array of tags of a certain type which is only one step away from what you want.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.