How to log each iteration in a parallel function

Question

I am migrating LTO-6 tapes to LTO-9 tapes. I wrote a script for this that uses parallel so that I can have two or 4 tape drives running at the same time. It works totally fine but the logging. I was trying to get a start date and time for each tape being migrated and a start date and time for each tape completing. I couldn't make this work, so I am logging the completed batch at the end of the loop. Is there a better way to do this so that I can log each "start" and "end" as it occurs while keeping the parallel command?

What I am trying to get is a list of tapes like this:

>>> STARTED 000123 at 2025-06-14_07:43
!!! FINISHED 000123 at 2025-06-14_12:20
>>> STARTED 000437 at 2025-06-14_07:43
!!! FINISHED 000437 at 2025-06-14_13:05

or 
>>> STARTED 000123 at 2025-06-14_07:43
>>> STARTED 000437 at 2025-06-14_07:43
!!! FINISHED 000123 at 2025-06-14_12:20
!!! FINISHED 000437 at 2025-06-14_13:05
etc..

The scripts has several functions but the relevant one is this one:

function process1() {
   VAR1=( $(cat ${LIST1}) 
   if [[ ${VAR1[@]} ]]; then
      for i in "${VAR1[@]}"; do
         DATE=$(date +%Y-%m-%d_%R)
         echo ">> STARTED" ${i} at $DATE >> /scratch/Migrate/job1.daily.progress 

         # line to convert LTO tapes
         sem -j${JOB} --id my_id1 fsmedcopy -T ANTF -s LTO6_Pool_a -v LTO9_Pool_4 -G y -r ${i}

          echo -e "Active tape ${i}"
      done
      sem --wait --id my_id1

      DATE=$(date +%Y-%m-%d_%R)
      echo "!!! FINISHED JOB" ${LIST1} at $DATE >> /scratch/Migrate/job1.daily.progress 
      echo "" >> /scratch/Migrate/job1.daily.progress
   fi
   exit
}

What I get instead is this:

>>> STARTED 000123 at 2025-06-14_07:43
>>> STARTED 000437 at 2025-06-14_07:43
!!! FINISHED 000123 at 2025-06-14_07:43
!!! FINISHED 000437 at 2025-06-14_07:43

Notice that both the "STARTED" and "FINISHED" ran at the same time when the loop ran. It should have logged the FINISHED when the process was completed. I believe this is because sem runs in the background and doesn't wait for the command to execute.

setting aside the issue of the sample output messages not matching with what the echo calls generate ... and assuming the primary goal is to have multiple processes appending to a single log file (eg, job1.daily.progress) ... and assuming you don't want the messages in the log file to get garbled/scrambled (eg, 2+ processes writing to the log file at the same exact time) ... I'd probably look at something as simple as using flock to manage writes to the log file .... — markp-fuso
– markp-fuso, Commented Jun 15 at 1:48
to make the flock coding more manageable I'd suggest a single function to handles all write activity to the log file; this limits all flock coding to just that one piece of code (ie, the flock function); in the main processes/functions you would replace the current echo calls with calls to the flock function (passing the desired message as an argument to the function); for high volumes of write activity (from multiple processes) this approach could lead to some delay/contention on writes (to the log file), but for occasional writes there should be no delay/contention issues — markp-fuso
– markp-fuso, Commented Jun 15 at 1:53
Is the problem that lines are garbled in your log file caused by multiple processes writing at once? If so, you could use sem in front of your echo to ensure only one process writes at a time just the same as you use sem in front of your tape-related commands. Alternatively, you could write to the system log with the logger command. — Mark Setchell
– Mark Setchell, Commented Jun 15 at 7:27
Use your (r)syslogd with logger. See man rsyslogd and man logger. — Cyrus
– Cyrus, Commented Jun 15 at 9:02
See bash script to run a constant number of jobs in the background, I've posted a full sample confining all outputs separately. Then try maybe referenced parShellCheck.sh sample script. — F. Hauri - Give Up GitHub
– F. Hauri - Give Up GitHub, Commented Jun 15 at 16:35

markp-fuso · Accepted Answer · 2025-06-16 17:15:43Z

3

Assumptions/understandings:

need the echo STARTED/FINISHED calls to (effectively) run in the background with the associated tape migration command

Using this sem function example as a template, one (simple) example of wrapping the echo STARTED/FINISHED calls plus a sleep command (in lieu of of OP's tape migration command) in a function and then calling the function via sem:

$ cat sem_test.sh
#!/bin/bash

runme() {
    jobno="$1"
    secs="$2"
    printf ">>> STARTED ${jobno} @ %(%Y-%m-%d %H:%M:%S)T\n" '-1' >> test.log
    sleep "${secs}"
    printf "!!! FINISHED ${jobno} @ %(%Y-%m-%d %H:%M:%S)T\n" '-1'  >> test.log
}

export -f runme                                   # export function so sem can see it

> test.log

jobno_secs=(1:2 2:5 3:7)                          # format: jobno:secs_to_sleep]

while IFS=":" read -r j s
do
    echo "####### calling job # $j with a $s sec sleep"
    sem -j 20 --id testid "runme $j $s"
done < <(printf "%s\n" "${jobno_secs[@]}")

sem --wait --id testid

echo "##################"
head test.log

NOTES:

I don't work with sem so this is the barebones basics I used to get this working
we're using the bash builtin printf which supports the special feature for printing a date and time (%(date_time_format)T); the trailing -1 tells printf to use the current system date and time
OP's $(date) / echo calls should work just fine (in place of the printf)
looking at OP's current code ... it may be as 'simple' as removing the sem call from within the process1 function and instead have sem call process1 ... ?

Taking for a test drive:

$ ./sem_test.sh
####### calling job # 1 with a 2 sec sleep
####### calling job # 2 with a 5 sec sleep
####### calling job # 3 with a 7 sec sleep
                                         <<<<<<<<<<<< there's a pause for 7 seconds and then ...
##################
>>> STARTED 1 @ 2025-06-16 11:19:09
>>> STARTED 2 @ 2025-06-16 11:19:09 
>>> STARTED 3 @ 2025-06-16 11:19:09 
!!! FINISHED 1 @ 2025-06-16 11:19:11
!!! FINISHED 2 @ 2025-06-16 11:19:14
!!! FINISHED 3 @ 2025-06-16 11:19:16

edited Jun 16 at 17:15

answered Jun 16 at 16:22

markp-fuso

38.6k5 gold badges24 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mark Setchell Jun 16 at 18:14

Not sure if you are aware, sem is just a symlink to GNU Parallel and is explained in manpages for GNU Parallel.

markp-fuso Jun 16 at 18:34

@MarkSetchell nope, didn't realize that; I stopped at type -a sem => /usr/bin/sem / /bin/sem; thanks for the heads up

Robin Jun 16 at 18:59

@markp-fuso Very interesting. I'll will try to apply this solution to my code.

Collectives™ on Stack Overflow

How to log each iteration in a parallel function

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related