I am working on a data mining project where I want to grab info from multiple sites simultaneously. I am currently doing this by running the same mining script in 20 different terminal windows (on OS X).
My belief (which may be incorrect) is running the script in separate terminal windows is why the mining is executed in parallel.
Questions:
A) If I am incorrect about using multiple terminal windows, what would be the best approach?
B) If I am right to use multiple terminal windows, is there an efficient way to have the script execute in 20 different terminal windows?
I set up a prototype using 2 scripts.
Script 1 is trigger.py and is intended to send a list of arguments to a second script. In the trigger script below I am using numbers but the idea would be to send urls.
Script 2 is the execute.py and is intended to receive the argument and execute, ideally in a new terminal windows per argument. In practice, if this approach is the best way, then I put the miner in this script and have it recieve the url, open a new terminal window, and run.
Right now it simply executes in the same window. This is, again, the problem I am seeking help with.
Script 1 trigger.py
#!/usr/bin/python
import os
import sys
class newTerm(object):
def __init__(self, number):
self.number = number
def run(self):
os.system('/Users/InNov8/Desktop/execute.py ' + str(self.number))
starts = [100, 500, 1000, 2000]
for s in starts:
new = newTerm(s)
new.run()
Script 2 execute.py
#!/usr/bin/python
import sys
print 'Number of arguments:', len(sys.argv), 'arguments.'
print 'Argument List:', str(sys.argv)
number = int(sys.argv[1])
print number, number + 400
grab info from multiple sitesyou mean scraping a web site, I strongly suggest to look at web scraper frameworks like scrapy or something similar. the latter saves you the trouble of starting and managing parallel tasks yourself.screen-man screen, which would allow you to run multiple sessions in a single terminal window, especially if the scripts produce stdout/stderr.