1

I have a 10-line file apps.txt containing information (app id, api key and secret key) of 10 applications. Fields in each line of the file are arguments to a program interacting with a server. Another file data.txt containing data as input to the program. I want the program start one instance for each line in apps.txt and split data.txt to number of applications to process. How to use GNU Parallel to do this? I tried below command but can't get the desired behavior:

cat data.txt | parallel [-N1] -j10 --pipe --no-run-if-empty --line-buffer ./program.py {1} {2} {3} :::: apps.txt

apps.txt
AppID1 API_Key1 Secret_Key1
AppID2 API_Key2 Secret_Key2
...
AppID10 API_Key10 Secret_Key10

2 Answers 2

2

I interpret your question in the way that you have 10 workers and you want to distribute blocks of stdin to those.

Use GNU Parallel's slot replacement string and have an array of which the index is the information of the worker. Bash's arrays count index from 0, so subtract 1 from slot().

# Set each entry in array 'worker' to one line from apps
parset worker echo :::: apps.txt
doit() {
  workerid="$1"
  echo "do stuff on ${worker[$workerid]}"
  # Read stuff from stdin and do 'wc' on that
  wc
}
# env_parallel is needed to get $worker exported
# -j10 must be the number of lines in apps.txt
cat data.txt | env_parallel -j10 --pipe doit '{= $_=slot()-1 =}'
Sign up to request clarification or add additional context in comments.

1 Comment

That's exactly what I need. Thanks, learn a lot from your reply.
0

I don't really understand what you said about data.txt. I just know:

  1. Split file

    $ split -l 10 data.txt -d -a 4 split_file

    This command will generate {split_file000, split_file001, ...} from data.txt according to number of lines (10).

  2. Split and pass command

    $ cat app.txt | xargs -n 3 ./program.py

    This command is equal to:

    $ ./program.py APPID1 APP_KEY1 SECRET_KEY1

    $ ./program.py APPID2 APP_KEY2 SECRET_KEY2

    $ ...

2 Comments

The file data.txt is a huge file, I'd like to handle it using 10 jobs without manually splitting it.
How about cat app.txt | xagrs -n 3 ./program.py data.txt 10 ? It will execute ./program.py data.txt 10 APPIDX APP_KEYX SECRET_KEYX. So, you can get input filename from sys.argv[1], get number of process from sys.argv[2], get app options from (sys.argv[3], sys.argv[4], sys.argv[5]). The you can do your splitting job.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.