2

I'm busy trying to scrape a dynamic website in order to get a URL that I can use to download the server software for a game every time it updates.

The site is "http://craftstud.io/builds" and where it says "Server XX.X.X.X" is what I'm trying to scrape.

I really don't want it to get complicated with Javascript and external modules, so if there is a simple solution I am all ears.

I also can't for the life of me get third party modules installed such as BeautifulSoup (Stupid Windows).

Thanks all!

4
  • I believe it's sometimes there is a legitimate reason not to install 3rd party modules; in your case you could learn how to use easy_install or pip in windows. This can be done quite easily with some googling, and you won't be limited to the standard library anymore Commented Jan 21, 2013 at 20:28
  • What's wrong with Windows? You can install any modules with pip/easy_install or just put in your project's directory. Commented Jan 21, 2013 at 20:31
  • Parsing html with standard library is much more complicated, than installing a third party module. Try to do it, and if you get stuck, come back and ask for help. Commented Jan 21, 2013 at 20:35
  • @dm03514 - Yes there is a legitimate reason. I'm busy programming on Windows for Linux and I might end up distributing the script across multiple Linux VPS'. So I don't want to end up troubleshooting all the servers trying to find missing modules all the time. Commented Jan 22, 2013 at 14:55

1 Answer 1

3

If you want something simple, consider using a simple regular expression:

>>> import re
>>> import urllib2
>>> html = urllib2.urlopen("http://craftstud.io/builds").read()
>>> re.search(r"Server \d+\.\d+\.\d+\.\d+", html).group()
'Server 0.1.24.1'

That said, if you can install BeautifulSoup4 via pip, you'll find lots of use for it in the future. (Make sure you use pip install BeautifulSoup4 instead of just pip install BeautifulSoup I just installed a copy on a windows machine a couple days ago.)

Sign up to request clarification or add additional context in comments.

1 Comment

Hey thanks for the example. However I didn't use the exact same method. I ended up replacing re.search with the string plus ".*\" which means any character, * means repeating continuously and then \ means stop repeating when you find the text after that. Thanks though!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.