2

I'm writing a vbscript to pull some data from a webpage, strip out a few key pieces of information and write those to a file.

At the moment my script to access the pages and save the file contents to a string is this:

Set WshShell = WScript.CreateObject("WScript.Shell")
Set http = CreateObject("Microsoft.XmlHttp")

'Load Webpage where address is URL
http.open "GET", URL, FALSE
http.send ""
'Assign webpage contents as a string to variable called Webpage
WEBPAGE = http.responseText

I need to save the content to a string so I can use a regular expression on it to pull out the content that I need.

This script works perfectly, EXCEPT for when the pages contain non-standard characters (such as é). When the page contains something like this, the script throws up an error and stops.

I'm guessing this is something to do with the encoding, but I can't work out how to fix it. Can anyone point me in the right direction? Thanks guys

Edit

Thanks to the help here I realised I've asked the wrong question! It turns out I was downloading the content fine - the problem was, afterwards I was trying to edit it and write it out to a file, and the file was in the wrong format. I had this:

Set objTextFile = objFSO.OpenTextFile(OutputFile, 8, True,)

Changing it to this:

Set objTextFile = objFSO.OpenTextFile(OutputFile, 8, True, -1)

Seems to have fixed it. What a crazy world, eh? Thanks for the help.

1 Answer 1

2

You may need to set the correct header blocks before send

eg the following is an example only. You will need to find out what this is exactly for your website

   http.open "GET", URL, FALSE
    http.SetRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
    http.SetRequestHeader "Accept", "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
    http.SetRequestHeader "Accept-Language", "en-us,en;q=0.5"
    http.SetRequestHeader "Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7"
    http.send ""

EDIT:

What about this instead. It works ok here

Dim XMLHttpReq,URL,WEBPAGE
Const Eacute  = "%C3%89"

Set XMLHttpReq = CreateObject("MSXML2.ServerXMLHTTP")

URL = "http://en.wikipedia.org/wiki/%C3%89"
'Load Webpage where address is URL
XMLHttpReq.Open "GET", URL, False
XMLHttpReq.send ""
'Assign webpage contents as a string to variable called Webpage
WEBPAGE = XMLHttpReq.responseText
WEBPAGE = Replace(WEBPAGE, Eacute, "É")
'Debug.Print WEBPAGE

The E acute in this case returns as string %C3%89 and you can force it to whatever character you choose if required.

EDIT2:

Just to add, if you're doing this with VBScript you may find this method useful

Dim XMLHttpReq, URL, WEBPAGE, fso, f
Const Eacute = "%C3%89"
Set XMLHttpReq = CreateObject("MSXML2.ServerXMLHTTP")
URL = "http://en.wikipedia.org/wiki/%C3%89"
XMLHttpReq.Open "GET", URL, False
XMLHttpReq.send ""
WEBPAGE = XMLHttpReq.responseText

Save2File WEBPAGE, "C:\Users\osknows\Desktop\test.txt"

Sub Save2File (sText, sFile)
    Dim oStream
    Set oStream = CreateObject("ADODB.Stream")
    With oStream
        .Open
        .CharSet = "utf-8"
        .WriteText sText
        .SaveToFile sFile, 2
    End With
    Set oStream = Nothing
End Sub
Sign up to request clarification or add additional context in comments.

4 Comments

Unless, I'm being stupid, I still can't figure it out. Take this page for example: en.wikipedia.org/wiki/É - it looks like it's utf-8 to me, but when I stick that in accept-charset, it still throws up the same error. I could load the file as a binary file, I suppose, but I don't want to do that, because I want to manipulate the string before outputting it.
Thanks, by the way, for your help!
Actually, bear with me, I'm having a play... may have go it!
Ah, actually, it looks like I've asked the wrong question here. You're dead right, that does work. It's what I'm trying to do with it NEXT once I've pulled it into a variable that's breaking it. D'oh! Let me have a play and see if I can fix it. Thanks very much for the help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.