I'm trying to get to run two parallel asynchronous processes in a loop, one main and one via COPROC, in order to be able to collect and synchronize results, when both have finished (for each loop), to speed up large-test run times.
Cygwin Bash version is 5.2.21(1)-release, parallel tool isn't available, so I'm trying to use COPROC.
The resources found on the web are unclear, whether or not multiple independent COPROC instances should be possible from version >5.0 (despite warnings about process still exists) - here, it didn't seem to work and to be on the safe side, I've restricted my code to trying to run just one co-process in parallel.
In the following example, an array of values is set up and filtered in two different ways in the function loopedElapsedTimeCalculation().
This method is called two times in succession, inside each loop iteration, 1st with a IS_COP flag, by initializing the standard COPROC, then calculating and collecting its result times outside the function (in a nested loop), 2nd by a regular call of the function, separately from the COPROC and collecting the times, calculated from inside the method.
The results for the different filtering approaches are finally being summed up in its corresponding associative array cells, which should hold the added-up elapsed times from the method's parallel invocations, both in the main process and COPROC:
testMethods[PARAM_SUBST_ARRAY]
testMethods[AWK_REGEX_FROM_ARRAY]
Here the example test code:
#! /bin/bash
shopt -s extglob
# Define amount of test array value sets to generate and test iteration loops to run
declare -i i sets=1 loops=2
declare -a tmpArr elapsedTestTimes
# Associative array for collecting measured test approach run time sums
declare -A testMethods
testMethods[PARAM_SUBST_ARRAY]=
testMethods[AWK_REGEX_FROM_ARRAY]=
# Initializing test array
while [[ $((++i)) -le $sets ]]; do
tmpArr+=("?? exec2.bin[$i]"\
"A file00 0.bin[$i]"\
" A file11*1.bin[$i]"\
"MR file22\03457zwei.bin[$i]"\
" C file33\t3.bin[$i]"\
"T file44\$4.bin[$i]"\
'D file55"$(echo EXE)"5.bin['$i']'\
" R101 renamedW1[$i]"\
"R102 renamedI2[$i]"\
"R104R104 myproject/src/test/util/MyUtil2Test.java[$i]")
done
# Measures the elapsed stop/start times for different filtering approaches on the test array.
# When called without arg1 flag, it calculates the delta, summing it up in the corresponding
# 'testMethods' assoc array cells, for each filter measured.
# Without flag, only the plain '<TEST_METHODS_KEY>=$stopTime-$startTime' value string is
# returned to the caller, which has to calculate and add the measured times to the corresponding
# 'testMethods' assoc array cells of the coproc async invocation
loopedElapsedTimeCalculation() {
#[[ $1 ]] && echo -e "$1\n" >/dev/tty
local stopTime1A stopTime1B startTime1A startTime1B IFS _IFS
# COPROC invocation (external calc, sum and set of test run times)
if [[ $1 ]]; then
### Perf-test filtering ARRAY via Pattern Matching in Parameter Substitution
printf 'COP - PARAM_SUBST_ARRAY__OLD_VAL: %f\n' "${testMethods[PARAM_SUBST_ARRAY]}" >/dev/tty
startTime1A="$EPOCHREALTIME"
printf '%s\n' "${tmpArr[@]/#[? ]*}" >/dev/null
stopTime1A="$EPOCHREALTIME"
printf '%s\n' "PARAM_SUBST_ARRAY=$stopTime1A-$startTime1A"
printf 'COP - PARAM_SUBST_ARRAY__NEW_TMP: %f\n' "$(awk 'BEGIN { printf '"$stopTime1A"' - '"$startTime1A"'; }')" >/dev/tty
### Perf-test filtering ARRAY via AWK extended RegEx
printf 'COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: %f\n' "${testMethods[AWK_REGEX_FROM_ARRAY]}" >/dev/tty
startTime2A="$EPOCHREALTIME"
_IFS="$IFS"; IFS=$'\n'
awk '/^[^ ?].*/ { print }' <<<"${tmpArr[*]}" >/dev/null
IFS="$_IFS"
stopTime2A="$EPOCHREALTIME"
printf '%s\n' "AWK_REGEX_FROM_ARRAY=$stopTime2A-$startTime2A"
printf 'COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: %f\n' "$(awk 'BEGIN { printf '"$stopTime2A"' - '"$startTime2A"'; }')" >/dev/tty
# Regular invocation (implicit calc, sum and set of test run times)
else
### Perf-test filtering ARRAY via Pattern Matching in Parameter Substitution
printf 'NO_COP - PARAM_SUBST_ARRAY__OLD_VAL: %f\n' "${testMethods[PARAM_SUBST_ARRAY]}" >/dev/tty
startTime1B="$EPOCHREALTIME"
printf '%s\n' "${tmpArr[@]/#[? ]*}" >/dev/null
stopTime1B="$EPOCHREALTIME"
read -r PARAM_SUBST_ARRAY < <(
awk 'BEGIN { printf "%f", '"${testMethods[PARAM_SUBST_ARRAY]}"' + '"$stopTime1B"' - '"$startTime1B"'; }'
)
testMethods[PARAM_SUBST_ARRAY]="$PARAM_SUBST_ARRAY"
printf 'NO_COP - PARAM_SUBST_ARRAY__NEW_TMP: %f\n' "$(awk 'BEGIN { printf '"$stopTime1B"' - '"$startTime1B"'; }')" >/dev/tty
printf 'NO_COP - PARAM_SUBST_ARRAY__NEW_VAL: %f\n' "${testMethods[PARAM_SUBST_ARRAY]}" >/dev/tty
### Perf-test filtering ARRAY via AWK extended RegEx
printf 'NO_COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: %f\n' "${testMethods[AWK_REGEX_FROM_ARRAY]}" >/dev/tty
startTime2B="$EPOCHREALTIME"
_IFS="$IFS"; IFS=$'\n'
awk '/^[^ ?].*/ { print }' <<<"${tmpArr[*]}" >/dev/null
IFS="$_IFS"
stopTime2B="$EPOCHREALTIME"
read -r AWK_REGEX_FROM_ARRAY < <(
awk 'BEGIN { printf "%f", '"${testMethods[AWK_REGEX_FROM_ARRAY]}"' + '"$stopTime2B"' - '"$startTime2B"'; }'
)
testMethods[AWK_REGEX_FROM_ARRAY]="$AWK_REGEX_FROM_ARRAY"
printf 'NO_COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: %f\n' "$(awk 'BEGIN { printf '"$stopTime2B"' - '"$startTime2B"'; }')" >/dev/tty
printf 'NO_COP - AWK_REGEX_FROM_ARRAY__NEW_VAL: %f\n' "${testMethods[AWK_REGEX_FROM_ARRAY]}" >/dev/tty
fi
}
i=0
# Should repeat test runs $loops times, with 2 independently running async invocations
# of loopedElapsedTimeCalculation() each time, to reduce large-test run times
while [[ $((i++)) -lt $loops ]]; do
# Initialize COPROC to run test method
coproc { loopedElapsedTimeCalculation 'IS_COP'; }
# Retrieve and process KEY/TIME value results while COPROC returns them
# --> should run asynchronously, but doesn't !
while read; do
for m in "${!testMethods[@]}"; do
if [[ "${REPLY%=*}" == "$m" ]]; then
printf 'M: %s\n' "$m" >/dev/tty;
sleep 1
read -d $'\n' -r tmpTime < <(
awk 'BEGIN { printf "%f", '"${testMethods[$m]}"' + '"${REPLY#*=}"'; }';
)
testMethods["$m"]="$tmpTime"
fi
done
printf 'COP - %s__NEW_VAL: %f\n' "$m" "${testMethods[$m]}" >/dev/tty;
done <&"$COPROC"
# Regular test method run
loopedElapsedTimeCalculation
# Wait for COPROC to finish, before new async test iteration can start
wait $COPROC_PID
done
echo -e >/dev/tty
for m in "${!testMethods[@]}"; do
printf '%s SUM: %f\n' "$m" "${testMethods[$m]}" >/dev/tty
done
I've been trying to adapt a modified example with continuous retrieval of COPROC results, like here:
while read output <&"${COPROC[0]}"; do echo $output; done
As using this inside the loop/with looped awk call would result in the error
"$COPROC": Bad file descriptor
I've changed it to the non-failing variant
while read; do ... done <&"$COPROC"
So far, this seems to work, but the asynchronous parallelization will not:
COP - PARAM_SUBST_ARRAY__OLD_VAL: 0.000000
M: PARAM_SUBST_ARRAY
COP - PARAM_SUBST_ARRAY__NEW_TMP: 0.000105
COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: 0.000000
COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: 0.043876
COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000105
M: AWK_REGEX_FROM_ARRAY
COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000105
NO_COP - PARAM_SUBST_ARRAY__OLD_VAL: 0.000105
NO_COP - PARAM_SUBST_ARRAY__NEW_TMP: 0.000100
NO_COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000205
NO_COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: 0.043876
NO_COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: 0.043439
NO_COP - AWK_REGEX_FROM_ARRAY__NEW_VAL: 0.087315
COP - PARAM_SUBST_ARRAY__OLD_VAL: 0.000205
M: PARAM_SUBST_ARRAY
COP - PARAM_SUBST_ARRAY__NEW_TMP: 0.000102
COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: 0.087315
COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: 0.044305
COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000307
M: AWK_REGEX_FROM_ARRAY
COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000307
NO_COP - PARAM_SUBST_ARRAY__OLD_VAL: 0.000307
NO_COP - PARAM_SUBST_ARRAY__NEW_TMP: 0.000090
NO_COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000397
NO_COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: 0.131620
NO_COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: 0.041941
NO_COP - AWK_REGEX_FROM_ARRAY__NEW_VAL: 0.173561
The inserted debug output and sleep 1 calls inside the nested result retrieval loop for COPROC show, that main and COPROC do not run asynchronously, instead main is only invoked each time, after the COPROC has finished - so this runs synchronized.
When putting the tegular test method call to loopedElapsedTimeCalculation before the nested while loop (if extracted into a separate method, too), the result seems to show the intended behavior of asynchronous execution:
NO_COP - PARAM_SUBST_ARRAY__OLD_VAL: 0.000000
COP - PARAM_SUBST_ARRAY__OLD_VAL: 0.000000
COP - PARAM_SUBST_ARRAY__NEW_TMP: 0.000127
COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: 0.000000
NO_COP - PARAM_SUBST_ARRAY__NEW_TMP: 0.000088
NO_COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000088
NO_COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: 0.000000
COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: 0.055613
NO_COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: 0.045002
NO_COP - AWK_REGEX_FROM_ARRAY__NEW_VAL: 0.045002
M: PARAM_SUBST_ARRAY
COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000215
M: AWK_REGEX_FROM_ARRAY
COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000215
NO_COP - PARAM_SUBST_ARRAY__OLD_VAL: 0.000215
COP - PARAM_SUBST_ARRAY__OLD_VAL: 0.000215
COP - PARAM_SUBST_ARRAY__NEW_TMP: 0.000077
COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: 0.100615
NO_COP - PARAM_SUBST_ARRAY__NEW_TMP: 0.000074
NO_COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000289
NO_COP - AWK_REGEX_FROM_ARRAY__OLD_VAL: 0.100615
COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: 0.106964
NO_COP - AWK_REGEX_FROM_ARRAY__NEW_TMP: 0.042208
NO_COP - AWK_REGEX_FROM_ARRAY__NEW_VAL: 0.142823
M: PARAM_SUBST_ARRAY
COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000366
M: AWK_REGEX_FROM_ARRAY
COP - PARAM_SUBST_ARRAY__NEW_VAL: 0.000366
AWK_REGEX_FROM_ARRAY SUM: 0.249787
PARAM_SUBST_ARRAY SUM: 0.000366
But now, intermittently a race condition (how exactly?) occurs, sometimes for one, or for both of COPROC's access:
"$COPROC": Bad file descriptor
Why is that and how to change the code, so the asynchronous test times calculation works as intended?
Of course I can move the nested while loop inside the COPROC, but then the changes to the testmethods array aren't propagated to the main process, which was the very intention of using a co-process, instead of just detaching it, via & and using an explicit global or export var in main won't work with sub-processes, either.