I've written some code in vba for the purpose of making twofold "POST" requests to get to the destination page and harvest name and address from there. There are two types of structures within which the desired results lie. One type of structure holds name and address in a single "th" storage and the other holds name in one "td" and address in another "td". So, to handle this I had to use error handler to get the most out of it. By using xmlhttp I could not get any result so I used WinHttpRequest in my script to get the result by enabling redirection. My script is running errorlessly at this moment. However, any suggestion to improve my code specially by handling error more efficiently will be highly appreciated.
Here is the full working code:
Sub reverse_scraping()
Dim http As New WinHttp.WinHttpRequest, html As New HTMLDocument
Dim posts As Object, post As Object
Dim ArgStr As String, ArgStr_ano As String, cNo As String, cName As String
For Each cel In Range("A2:A" & Cells(Rows.Count, 1).End(xlUp).Row)
If cel.Value <> "" Then
cNo = cel.Value
cName = Replace(cel.Offset(0, 1).Value, " ", "+")
End If
ArgStr = "search=addr"
ArgStr_ano = "TaxYear=2017&stnum=" & cNo & "&stname=" & cName
With http
.Option(6) = True
.Open "POST", "https://public.hcad.org/records/QuickSearch.asp", False
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"
.setRequestHeader "Referer", "https://public.hcad.org/records/quicksearch.asp"
.send ArgStr
End With
With http
.Option(6) = True
.Open "POST", "https://public.hcad.org/records/QuickRecord.asp", False
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"
.setRequestHeader "Referer", "https://public.hcad.org/records/quicksearch.asp"
.send ArgStr_ano
html.body.innerHTML = .responseText
End With
On Error Resume Next
Set posts = html.getElementsByClassName("data")(2).getElementsByTagName("th")
Set post = html.getElementsByClassName("bgcolor_1")(0).getElementsByTagName("tr")(1).getElementsByTagName("td")
If posts.Length Or post.Length Then
cel.Offset(0, 2) = posts(0).innerText
cel.Offset(0, 2) = post(1).innerText
cel.Offset(0, 3) = post(2).innerText
End If
Next cel
End Sub
Here are the two links to show how to reach the destination page (in first link it is needed to click on the "search by address" button to get the search option):
1 https://www.dropbox.com/s/e9on9zwqzmcboze/1Untitled.jpg?dl=0
2 https://www.dropbox.com/s/0lchpde8uq63jps/pics.jpg?dl=0
search to be made using the below documents placing those in column "A" and "B" respectively and results will be placed in column "c" to the corresponding cells.
Street No Street Name
6330 LAUTREC DR
5522 DARLING ST
7411 SANDLE ST
10234 LUCORE ST