I haven't used RegEx before, and everyone seems to agree that it's bad for webscraping and html in particular, but I'm not really sure how to solve my little challenge without.
I have a small Python scraper that opens 24 different webpages. In each webpage, there's links to other webpages. I want to make a simple solution that gets the links that I need and even though the webpages are somewhat similar, the links that I want are not.
The only common thing between the urls seems to be a specific string: 'uge' or 'Uge' (uge means week in Danish - and the week number changes every week, duh). It's not like the urls have a common ID or something like that I could use to target the correct ones each time.
I figure it would be possible using RegEx to go through the webpage and find all urls that has 'uge' or 'Uge' in them and then open them. But is there a way to do that using BS? And if I do it using RegEx, how would a possible solution look like?
For example, here are two of the urls I want to grab in different webpages:
http://www.domstol.dk/KobenhavnsByret/retslister/Pages/Uge45-Tvangsauktioner.aspx
http://www.domstol.dk/esbjerg/retslister/Pages/Straffesageruge32.aspx