4

I am working on a web scraping program, but I have run into a problem using scrapy with javascript generated content. I know that scrapy is not built to do this type of scraping, but I have been trying to use scrapyjs or splash to accomplish what I need.

However, I cannot get either of these two modules to work correctly with scrapy. My question is if anyone has a minimal example they can show that uses scrapyjs or splash to render javascript pages?

Edit: My platform is ubuntu and I working with python. For scrapyjs I just put the source in the uppermost directory of the scrapy project and I have yet to find any real guides on how to use splash. The reason I am asking about splash is because it seems a more powerful module for javascript rendering and is mentioned a lot in the same conversation as scrapjs.

1
  • What's your platform, how did you install scrapyjs or splash? What errors if any are you getting? Commented Feb 5, 2014 at 9:56

1 Answer 1

1

I believe all you have to do is implement process_links in your Spider:

def proxy_url(url):
        return "http://localhost:8050/render.html?url=%s&timeout=15&wait=1" % url


def process_links(self,links):
        for link in links:
            link.url = proxy_url(link.url)
        return links
Sign up to request clarification or add additional context in comments.

5 Comments

I don't fully understand what proxy_url() is doing. Could you explain?
I got your solution to work in a slightly modified form.
@Adamkucera, Can you please share the modified form?
@MahmoudM.Abdel-Fattah What I did was put the splash mechanism to generate js inside a parse function only if the response requires it. I didn't put it in front of all of the links if I don't need to. Does this help?
Can you update your answer and show us a sample. I don't use Python on a daily basis and would like to see your sample code please.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.