2

I'm having a bit of trouble. I want to run a shell command within Python in a specific directory. Based on the code I found on the internet I need the following inclusions:

import os
import subprocess
import shlex

And then the code itself is below

os.chdir('/etc/test/')
cmd = 'scrapy crawl test'
subprocess.call(shlex.split(cmd))

As it looks like, I am trying to run the command "scrapy crawl test" within the /etc/test/ directory. When I run this manually with terminal it seems to work fine however when I run it with this python code it gives me an error:

INFO Exception occured while scraping: [Errno 2] No such file or directory

Is anyone able to tell me if my code is incorrect, or if I am going about this the wrong way perhaps.

12
  • Is there any additional traceback information, or just that one-line error? Commented Aug 19, 2013 at 20:36
  • 1
    As a side note, cmd = ['scrapy', 'crawl', 'test'] then subprocess.call(cmd) is simpler, and probably harder to get wrong; no need to use shlex here. But that' won't affect the problem you're trying to solve. Commented Aug 19, 2013 at 20:37
  • @abamert I can't find any further traceback information I'm afraid. Would I still need the os.chdir command as well in your case? Commented Aug 19, 2013 at 20:40
  • 1
    @Jimmy: scrapy is a Python library. Have you gone through the Getting Started and Tutorial stuff? Everything you want to do from within Python, you can do in Python. Everything you want to do at the shell or in a cronjob or whatever, you can do with the command line tool. If you're trying to run the command-line tool from within Python, you're probably making a mistake earlier on in the process… but it's hard to be sure what that is without more information on what you're trying to do. Commented Aug 19, 2013 at 20:54
  • 1
    @Jimmy: At any rate, the error you're seeing is coming from scrapy, not from your code. That could mean there's a bug in your spider, or your directory layout isn't what you expect, or a million other things. Have you tried using scrapy shell to debug it, as described in the tutorial? Commented Aug 19, 2013 at 20:55

1 Answer 1

3

Why are you using subprocess? A common practice to run Scrapy from a script is to use twisted's reactor. Taken from docs:

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log
from testspiders.spiders.followall import FollowAllSpider

spider = FollowAllSpider(domain='scrapinghub.com')
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() # the script will block here

There is plenty of examples out there:

Hope that helps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.