I would like to extract content from a webpage. However, when I get response text it includes JavaScript, which cannot be processed like a browser-opened page.
Can this method be used to get HTML content or only browser emulation can help? Or maybe there are some different methods of receiving this content?
Dim oXMLHTTP As New MSXML2.XMLHTTP
Dim htmlObj As New HTMLDocument
With oXMLHTTP
.Open "GET", "http://www.manta.com/ic/mtqyfk0/ca/riverbend-holdings-inc", False
.send
If .ReadyState = 4 And .Status = 200 Then
htmlObj.body.innerHTML = .responseText
'do things
End If
End With
Response text:
<!DOCTYPE html>
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta http-equiv="cache-control" content="max-age=0" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="expires" content="0" />
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" />
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="refresh" content="10; url=/distil_r_blocked.html?Ref=/ic/mtq599v/ca/45th-street-limited-partnership&distil_RID=2115B138-A1BF-11E6-A957-C0595F6B962F&distil_TID=20161103121454" />
<script type="text/javascript">
(function(window){
try {
if (typeof sessionStorage !== 'undefined'){
sessionStorage.setItem('distil_referrer', document.referrer);
}
} catch (e){}
})(window);
</script>
<script type="text/javascript" src="/ser-yrbwqfedrrwwvctvyavy.js" defer></script><style type="text/css">#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#verxvaxcuczwcwecuxsx{display:none!important}</style></head>
<body>
<div id="distil_ident_block"> </div>
</body>
</html>