I'm interested in writing a script, preferably one easy to add on to browsers with tools such as Greasemonkey, that sends a page's HTML source code to an external server, where it will later be parsed and useful data would be sent to a database.
However, I haven't seen anything like that and I'm not sure how to approach this task. I would imagine some sort of HTTP post would be the best approach, but I'm completely new to those ideas, and I'm not even exactly where to send the data to parse it (it doesn't make sense to send an entire HTML document to a database, for instance).
So basically, my overall goal is something that works like this (note that I only need help with steps 1 and 2. I am familiar with data parsing techniques, I've just never applied them to the web):
- User views a particular page
- Source code is sent via greasemonkey or some other tool to a server
- The code is parsed into meaningful data that is stored in a MySQL database.
Any tips or help is greatly appreciated, thank you!
Edit: Code
ihtml = document.body.innerHTML;
GM_xmlhttpRequest({
method:'POST',
url:'http://www.myURL.com/getData.php',
data:"SomeData=" + escape(ihtml)
});
Edit: Current JS Log:
Namespace/GMScriptName: Server Response: 200
OK
4
Date: Sun, 19 Dec 2010 02:41:55 GMT
Server: Apache/1.3.42 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.8e-fips-rhel5 PHP-CGI/0.9
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
Array
(
)
http://www.url.com/getData.php