Python OS X how to have script launch in parallel in multiple terminal windows

Question

I am working on a data mining project where I want to grab info from multiple sites simultaneously. I am currently doing this by running the same mining script in 20 different terminal windows (on OS X).

My belief (which may be incorrect) is running the script in separate terminal windows is why the mining is executed in parallel.

Questions:

A) If I am incorrect about using multiple terminal windows, what would be the best approach?

B) If I am right to use multiple terminal windows, is there an efficient way to have the script execute in 20 different terminal windows?

I set up a prototype using 2 scripts.

Script 1 is trigger.py and is intended to send a list of arguments to a second script. In the trigger script below I am using numbers but the idea would be to send urls.

Script 2 is the execute.py and is intended to receive the argument and execute, ideally in a new terminal windows per argument. In practice, if this approach is the best way, then I put the miner in this script and have it recieve the url, open a new terminal window, and run.

Right now it simply executes in the same window. This is, again, the problem I am seeking help with.

Script 1 trigger.py

#!/usr/bin/python
import os
import sys


class newTerm(object):
    def __init__(self, number):
        self.number = number

    def run(self):
        os.system('/Users/InNov8/Desktop/execute.py ' + str(self.number))


starts = [100, 500, 1000, 2000]
for s in starts:
    new = newTerm(s)
    new.run()

Script 2 execute.py

#!/usr/bin/python

import sys

print 'Number of arguments:', len(sys.argv), 'arguments.'
print 'Argument List:', str(sys.argv)
number = int(sys.argv[1])
print number, number + 400

if by grab info from multiple sites you mean scraping a web site, I strongly suggest to look at web scraper frameworks like scrapy or something similar. the latter saves you the trouble of starting and managing parallel tasks yourself. — miraculixx
– miraculixx, Commented Sep 18, 2015 at 23:26
You might want to look at a screen - man screen, which would allow you to run multiple sessions in a single terminal window, especially if the scripts produce stdout/stderr. — AChampion
– AChampion, Commented Sep 18, 2015 at 23:31

asdf · Accepted Answer · 2015-09-18 23:45:47Z

1

An easy way to do this would be to run the scripts in the background, which is actually pretty simple. Just append an & to the end of your call (sending the command to the background) and you can run them all in the same terminal:

python trigger.py [params] &

You could even compile a bash script to start all of them simultaneously with one command. You could also use this to aggregate return values into one place for ease of use:

miner.sh

#!/bin/bash
python trigger.py [params1] &
python trigger.py [params2] &
#etc

edited Sep 18, 2015 at 23:45

answered Sep 18, 2015 at 23:20

asdf

3,0872 gold badges24 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Manish Over a year ago

Thanks everyone for all the suggestions. I selected this as the answer b/c it was the easiest to implement. However, I will be trying the other suggestions as they seem excellent too and probably things I should learn about. Thanks everyone!

AChampion · Accepted Answer · 2015-09-18 23:52:33Z

It is not the separate terminal sessions but the separate processes/threads that allows things run in parallel. You can run them in the same shell in background as per @asdf.
You can even run them in the same process if you look at the threading module.

However, if they produce output (e.g. diagnostic/progress messages) they will output over the top of each other. In that case you can use screen to launch a number of process in a virtual terminal session but have independent input and output:

os.system('screen -dm scrape /Users/InNov8/Desktop/execute.py ' + str(self.number))

The -dm means launch in a detached state. Then you can attach to this screen from any terminal window with:

$ screen -r scrape

You can move between the various running processes with <Crtl-a>n and <Ctrl-a>p and detach with <Ctrl-a>d

Myk Willis · Accepted Answer · 2015-09-19 00:50:17Z

To do parallel execution in multiple processes, look at the multiprocessing module.

The code below is a simple example that launches one process for each url in an array. In practice (if the number of urls is arbitrarily large), you would probably want to use a Pool instead, so that you could queue the urls to a fixed number of processes.

from multiprocessing import Process

def worker_process(url):
    # process url...
    print 'processing %s' % url

def main():
    urls = ['http://www1.example.com/', 'http://www2.example.com/']
    workers = []
    for i in range(0, len(urls)):
        p = Process(target=worker_process, args=(urls[i],))
        p.start()
        workers.append(p)

    for worker in workers:
        worker.join()

if __name__ == '__main__':
    main()

Collectives™ on Stack Overflow

Python OS X how to have script launch in parallel in multiple terminal windows

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related