Simple Dynamic Web Scraping - Without BeautifulSoup

Question

I'm busy trying to scrape a dynamic website in order to get a URL that I can use to download the server software for a game every time it updates.

The site is "http://craftstud.io/builds" and where it says "Server XX.X.X.X" is what I'm trying to scrape.

I really don't want it to get complicated with Javascript and external modules, so if there is a simple solution I am all ears.

I also can't for the life of me get third party modules installed such as BeautifulSoup (Stupid Windows).

Thanks all!

I believe it's sometimes there is a legitimate reason not to install 3rd party modules; in your case you could learn how to use easy_install or pip in windows. This can be done quite easily with some googling, and you won't be limited to the standard library anymore — dm03514
– dm03514, Commented Jan 21, 2013 at 20:28
What's wrong with Windows? You can install any modules with pip/easy_install or just put in your project's directory. — Ivan Yurchenko
– Ivan Yurchenko, Commented Jan 21, 2013 at 20:31
Parsing html with standard library is much more complicated, than installing a third party module. Try to do it, and if you get stuck, come back and ask for help. — root
– root, Commented Jan 21, 2013 at 20:35
@dm03514 - Yes there is a legitimate reason. I'm busy programming on Windows for Linux and I might end up distributing the script across multiple Linux VPS'. So I don't want to end up troubleshooting all the servers trying to find missing modules all the time. — Skowt
– Skowt, Commented Jan 22, 2013 at 14:55

Justin O Barber · Accepted Answer · 2013-01-21 20:59:20Z

3

If you want something simple, consider using a simple regular expression:

>>> import re
>>> import urllib2
>>> html = urllib2.urlopen("http://craftstud.io/builds").read()
>>> re.search(r"Server \d+\.\d+\.\d+\.\d+", html).group()
'Server 0.1.24.1'

That said, if you can install BeautifulSoup4 via pip, you'll find lots of use for it in the future. (Make sure you use pip install BeautifulSoup4 instead of just pip install BeautifulSoup I just installed a copy on a windows machine a couple days ago.)

edited Jan 21, 2013 at 20:59

answered Jan 21, 2013 at 20:32

Justin O Barber

11.6k2 gold badges43 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Skowt Over a year ago

Hey thanks for the example. However I didn't use the exact same method. I ended up replacing re.search with the string plus ".*\" which means any character, * means repeating continuously and then \ means stop repeating when you find the text after that. Thanks though!

Collectives™ on Stack Overflow

Simple Dynamic Web Scraping - Without BeautifulSoup

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related