Skip to main content
edited title
Link
Stéphane Chazelas
  • 586.9k
  • 96
  • 1.1k
  • 1.7k

Using parallel processes to speed up an interative bash loop, but I need the to create an associative array inside the loop

Became Hot Network Question
added 77 characters in body
Source Link

Using bashBash, I have an indexed array that isrepresents a list of files:

a=("1.json" "2.json" "3.json" ... "5309.json")

I am parsing two data fields from those JSON files into two associative arrays:

declare -A idArr
declare -A valueArr

for i in "${a[@]}"; do
    jqId="$(jq -M ".fileId" <"${i}")"
    jqValue="$(jq -M ".value" <"${i}")"
    # If there are already items in the associative array, add the new items separated by a newline
    idArr[${i}]="${idArr[${i}]}${idArr[${i}]:+$'\n'}${jqId}"
    valueArr[${jqId}]="${valueArr[${jqId}]}${valueArr[${jqId}]:+$'\n'}${jqValue}"
done

BecauseSince I'm iterating through one file at a time, it takes somea considerable amount of time to get throughprocess all the files. But I needrequire the associative arrays created inwithin the loop to exist outside of the loop/afterpersist beyond its scope, even after the loop is completedhas finished. 

Is there a waymethod, such as using parallel processing or someany other methodapproach, that I couldwould allow me to concurrently run a bunch of theseprocess multiple array items and still haveenable them addto contribute data to the associative arrays?

Using bash, I have an indexed array that is a list of files:

a=("1.json" "2.json" "3.json" ... "5309.json")

I am parsing two data fields from those JSON files into two associative arrays:

declare -A idArr
declare -A valueArr

for i in "${a[@]}"; do
    jqId="$(jq -M ".fileId" <"${i}")"
    jqValue="$(jq -M ".value" <"${i}")"
    # If there are already items in the associative array, add the new items separated by a newline
    idArr[${i}]="${idArr[${i}]}${idArr[${i}]:+$'\n'}${jqId}"
    valueArr[${jqId}]="${valueArr[${jqId}]}${valueArr[${jqId}]:+$'\n'}${jqValue}"
done

Because I'm iterating through one file at a time, it takes some time to get through all the files. But I need the associative arrays created in the loop to exist outside of the loop/after the loop is completed. Is there a way, using parallel or some other method, that I could concurrently run a bunch of these array items and still have them add data to the associative arrays?

Using Bash, I have an indexed array that represents a list of files:

a=("1.json" "2.json" "3.json" ... "5309.json")

I am parsing two data fields from those JSON files into two associative arrays:

declare -A idArr
declare -A valueArr

for i in "${a[@]}"; do
    jqId="$(jq -M ".fileId" <"${i}")"
    jqValue="$(jq -M ".value" <"${i}")"
    # If there are already items in the associative array, add the new items separated by a newline
    idArr[${i}]="${idArr[${i}]}${idArr[${i}]:+$'\n'}${jqId}"
    valueArr[${jqId}]="${valueArr[${jqId}]}${valueArr[${jqId}]:+$'\n'}${jqValue}"
done

Since I'm iterating through one file at a time, it takes a considerable amount of time to process all the files. I require the associative arrays created within the loop to persist beyond its scope, even after the loop has finished. 

Is there a method, such as using parallel processing or any other approach, that would allow me to concurrently process multiple array items and still enable them to contribute data to the associative arrays?

deleted 21 characters in body
Source Link
goose
  • 33
  • 3

Using bash, I have an indexed array that is a list of files:

a=("1.json" "2.json" "3.json" ... "5309.json")

I am parsing two data fields from those JSON files into two associative arrays:

declare -A idArr
declare -A valueArr

for i in "${a[@]}"; do
    jqId="$(jq -M ".fileId" <"${i}")"
    jqValue="$(jq -M ".value" <"${i}")"
    # If there are already items in the associative array, add the new items separated by a newline
    idArr[${i}]="${idArr[${i}]}${idArr[${i}]:+$'\n'}${jqId}"
    valueArr[${idArr[${i}]jqId}]="${valueArr[${idArr[${i}]jqId}]}${valueArr[${idArr[${i}]jqId}]:+$'\n'}${jqValue}"
done

Because I'm iterating through one file at a time, it takes some time to get through all the files. But I need the associative arrays created in the loop to exist outside of the loop/after the loop is completed. Is there a way, using parallel or some other method, that I could concurrently run a bunch of these array items and still have them add data to the associative arrays?

Using bash, I have an indexed array that is a list of files:

a=("1.json" "2.json" "3.json" ... "5309.json")

I am parsing two data fields from those JSON files into two associative arrays:

declare -A idArr
declare -A valueArr

for i in "${a[@]}"; do
    jqId="$(jq -M ".fileId" <"${i}")"
    jqValue="$(jq -M ".value" <"${i}")"
    # If there are already items in the associative array, add the new items separated by a newline
    idArr[${i}]="${idArr[${i}]}${idArr[${i}]:+$'\n'}${jqId}"
    valueArr[${idArr[${i}]}]="${valueArr[${idArr[${i}]}]}${valueArr[${idArr[${i}]}]:+$'\n'}${jqValue}"
done

Because I'm iterating through one file at a time, it takes some time to get through all the files. But I need the associative arrays created in the loop to exist outside of the loop/after the loop is completed. Is there a way, using parallel or some other method, that I could concurrently run a bunch of these array items and still have them add data to the associative arrays?

Using bash, I have an indexed array that is a list of files:

a=("1.json" "2.json" "3.json" ... "5309.json")

I am parsing two data fields from those JSON files into two associative arrays:

declare -A idArr
declare -A valueArr

for i in "${a[@]}"; do
    jqId="$(jq -M ".fileId" <"${i}")"
    jqValue="$(jq -M ".value" <"${i}")"
    # If there are already items in the associative array, add the new items separated by a newline
    idArr[${i}]="${idArr[${i}]}${idArr[${i}]:+$'\n'}${jqId}"
    valueArr[${jqId}]="${valueArr[${jqId}]}${valueArr[${jqId}]:+$'\n'}${jqValue}"
done

Because I'm iterating through one file at a time, it takes some time to get through all the files. But I need the associative arrays created in the loop to exist outside of the loop/after the loop is completed. Is there a way, using parallel or some other method, that I could concurrently run a bunch of these array items and still have them add data to the associative arrays?

added 243 characters in body
Source Link
goose
  • 33
  • 3
Loading
Source Link
goose
  • 33
  • 3
Loading