4

I am new to Scrapy and Python and I am enjoying it.

Is it possible to debug a scrapy project using Visual Studio? If it is possible, how?

5 Answers 5

4

I've created a init file named runner.py

from scrapy.cmdline import execute
execute(['scrapy','crawl', 'spider_name'])

you just need to set that file as the startup in the project options.

it works with visual studio 2015

Sign up to request clarification or add additional context in comments.

Comments

2

You can install PTVS in visual studio 2012. Then create a python project from existing python code, and import your code.

If you are familiar with Visual Studio, it's the same as other languages in Visual Studio, like C++/C#. Just create some break points and start your script with Debugging.

As ThanhNienDiCho said, add "-mscrapy.cmdline crawl your_spider_name" to your interpreter argument.

PTVS screenshot

2 Comments

Thank you Yuvan, I have managed to debug Python previously. What I was looking for was to debug scrapy using VS. However, I have found this link pytools.codeplex.com/… Which elaborates debug options using VS.
I don't think scrapy has any difference with a normal python project. All normal python script files. What you mentioned is the same as PTVS in my answer.
1

Well, I tried all of the answers given to the OP and none worked for me. The closest of all seems to be the one posted by @Rafal Zajac , however it also failed for me.

I ended up finding the solution in here, however in there also some answers are no longer working in new versions.

So the version that seems to work for me is this:

from scrapy.crawler import CrawlerProcess
from tutorial.spiders.dmoz_spider import DmozSpider
import scrapy.utils.project 
from sys import stdin

print ("init...")
spider = DmozSpider()
setttings = scrapy.utils.project.get_project_settings()
process = CrawlerProcess(setttings)
process.crawl(spider)
process.start()
x = stdin.read(1)

This should be in the startup script, no script arguments are required.

1 Comment

Thanks for pointing out that my solution doesn't work anymore. I'm back on Scrapy so I had to fix the debugging in VS (again). It looks like there's only a small difference to what I originally suggested. I've updated my answer...
0

I had the same problem, and Yuan's initial answer didn't work for me.

To run Scrapy, you need to open cmd.exe and

cd "project directory"
scrapy crawl namespider
  • scrapy is scrapy.bat.
  • namespider is the value of the field in spider class.
  • To run Scrapy from Visual Studio, use input parameters of -mscrapy.cmdline crawl your_spider_name. See https://i.sstatic.net/KiPUc.jpg.

Comments

0

UPDATE:

It looks like with version 1.1 of scrapy you have to change the "Script Arguments" in your project debug settings to "runspider <spider file name>.py" and it should work as expected:

enter image description here


I'm new to python and scrapy too and I think I had exactly the same problem.

I was following a tutorial from Scrapy's website: http://doc.scrapy.org/en/latest/intro/tutorial.html, so first I generated the file structure for the scrapy project "tutorial".

Next step was to create new python project "From existing python code" and select the top folder "tutorial". When the wizard asks which file types to import I'd just use *.* to import everything. If you leave the default settings it won't import file scrapy.cfg.

I guess you got this far and what you just wanted was to put a breakpoint e.g. in the spider class, hit F5 and start debugging?

I tried as suggested:

As ThanhNienDiCho said, add "-mscrapy.cmdline crawl your_spider_name" to your interpreter argument.

In this case you also have to set the startup file - I couldn't figure out this part. You can't use any files from the project because that's not how it works, right? I tried adding dummy.py (empty file) on the top level as a startup file but then I was getting a message from Scrapy that "unknown command: crawl" - just the message you would get if you run command "scrapy" but not from the project folder. Maybe there is a way to make it work and someone could explain the full setup using this approach? I couldn't get it right.

Finally I noticed that the linux equivalent of scrapy.bat is a python file with following content:

from scrapy.cmdline import execute
execute()

So I replaced my dummy.py with file scrapy_runner.py (the file name doesn't matter) with the above content - and that was my startup file.

Now the last thing was to add to the Project Properties -> Debug -> Script Argument following value:

crawl dmoz

where "dmoz" was the name of the name of the spider from the tutorial.

This setup works for me. I hope this helps.

enter image description here

2 Comments

why pass "crawl dmoz" as arguments? I get the error can't open file: "crawl" when passing mscrapy.cmdline crawl dmoz I get the "unknown command: crawl" error
I created my comment 2 years ago and never worked with scrapy since. From what I remember you need to pass "crawl dmoz" so that the resulting command executed by Visual Studio when debugging is: "python scrapy_runner.py crawl dmoz". The parameters "crawl" and ""dmoz" are then used when the function "execute()" from the file scrapy_runner.py is executed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.