Scrapy and content updated with JavaScript

Question

I want to scrape (headliner, date, time) a local music venue site: http://www.bluebirdtheater.net/events

I've used scrapy and I've successfully scraped what's on the site. However there's a load more button. I've seen other solutions where the load more button returns a POST response in rendered html that can be scraped.

With the browser inspector I see that this site calls a get method: http://www.bluebirdtheater.net/events/events_ajax/40

I used scrapy to call url to crawl but the response is in unreadable for scrapy (JavaScript? Unrendered DOM? Can anyone tell me what it is? I'm curious.) Can I still use a scrapy-only approach?

I've seen people use Selenium to physically click the more button and load all the data, and then scrape it.

The string needs to be escaped. Its basically a text file.

user378704
– user378704

2015-01-08 22:20:16 +00:00
Commented Jan 8, 2015 at 22:20 — user378704
– user378704, Commented Jan 8, 2015 at 22:20
Thanks! I can continue working on a scrapy-only approach.

uberdwang
– uberdwang

2015-01-08 23:10:02 +00:00
Commented Jan 8, 2015 at 23:10 — uberdwang
– uberdwang, Commented Jan 8, 2015 at 23:10

user378704 · Accepted Answer · 2015-01-08 22:36:20Z

1

I feel sorry for doing this but heres a quick fix that will replace all spaces and new lines and backslashes.

print s.replace('\n', '').replace('\t', '').replace('\\','')

answered Jan 8, 2015 at 22:36

user378704

Sign up to request clarification or add additional context in comments.

2 Comments

uberdwang Over a year ago

I had to use double escapes to complete remove the space and new lines but I get the gist. Thanks for your help! .replace('\\n', '').replace('\\t', '').

user378704 Over a year ago

Whichever works for you. You can close the question by accepting the answer.

Collectives™ on Stack Overflow

Scrapy and content updated with JavaScript

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related