0

I want to scrape (headliner, date, time) a local music venue site: http://www.bluebirdtheater.net/events

I've used scrapy and I've successfully scraped what's on the site. However there's a load more button. I've seen other solutions where the load more button returns a POST response in rendered html that can be scraped.

With the browser inspector I see that this site calls a get method: http://www.bluebirdtheater.net/events/events_ajax/40

I used scrapy to call url to crawl but the response is in unreadable for scrapy (JavaScript? Unrendered DOM? Can anyone tell me what it is? I'm curious.) Can I still use a scrapy-only approach?

I've seen people use Selenium to physically click the more button and load all the data, and then scrape it.

2
  • The string needs to be escaped. Its basically a text file. Commented Jan 8, 2015 at 22:20
  • Thanks! I can continue working on a scrapy-only approach. Commented Jan 8, 2015 at 23:10

1 Answer 1

1

I feel sorry for doing this but heres a quick fix that will replace all spaces and new lines and backslashes.

print s.replace('\n', '').replace('\t', '').replace('\\','')
Sign up to request clarification or add additional context in comments.

2 Comments

I had to use double escapes to complete remove the space and new lines but I get the gist. Thanks for your help! .replace('\\n', '').replace('\\t', '').
Whichever works for you. You can close the question by accepting the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.