The following question applies to any programming language
I am working on a program which on providing the webpage-source-code as input will extract some specific kind of data.
Suppose I provided the following page-source as input to my program:
<table>
<tr>
<td id="a" class="product-name">Product A</td>
<td id="1" class="product-price">$100</td>
</tr>
<tr>
<td id="b" class="product-name">Product B</td>
<td id="2" class="product-price">$200</td>
</tr>
<tr>
<td id="c" class="product-name">Product C</td>
<td id="3" class="product-price">$300</td>
</tr>
</table
On this webpage, there are products mentioned along with their selling price. The webpage look like this:
Product A: $100
Product B: $200
Product C: $300
I want to use this page-source to copy this data to database. Since the product names and their prices are mentioned in fix tags and classes (like <td> or <div> etc), How can I extract the data these programmatically? Is there any good algorithm/code/library to extract such data from a page-source?
I think this can be done by using getElementByID in Javascript. But I am not sure. Or XML can be used? How? Any other good method/algorithms?
Note: I am doing this to my own website. I already have an old website and I want to use all the data in my new one. Entering all the data manually again is a huge task. So I want to copy data from my old one. Any programming language is okay with me.