optimize multiple sed commands in shell script

Question

I have a folder containing many text files with json content in it. With jq library, I am able extract the "commodities" array and write it to a file. The "commodities-output.txt" is a temp file that contains brackets "[", "]" and "null" values apart from the string values in the array. I want to remove the square brackets, "null" value and get the unique string values in a text file. Is there a way to optimise the sed command so that I don't have to create temporary text files such as "commodities-output.txt" and only have one output file with all the string values I need that are uniq and sorted(optional to be sorted).

$F=foldername
for entry in $F*.json
do
  echo "processing $entry"
  jq '.[].commodities' $entry >> commodities-output.txt
done
sed '/[][]/d' commodities-output.txt | sed '/null/d' commodities-output.txt | sort commodities-output.txt | uniq >> commodities.txt

echo "processing complete!"

Does this answer your question? Redirecting output of bash for loop — Joe
– Joe, Commented Aug 29, 2022 at 10:16
A better fix altogether is to fix the jq filter so it doesn't output null or empty lists. stackoverflow.com/questions/56692037/… — tripleee
– tripleee, Commented Aug 29, 2022 at 10:43
sed cmd file | sed cmd file | ... is nonsensical. The first sed reads from the file, but its output is completely ignored as the 2nd sed also reads from the file. — William Pursell
– William Pursell, Commented Aug 29, 2022 at 13:17

tripleee · Accepted Answer · 2022-08-30 05:42:54Z

1

You can easily do all of this in jq.

files=( "$F"*.json )
echo "$0: processing ${files[0]}" >&2
jq '.[] | select(.commodities != [] and .commodities != null) | .commodities' "${files[0]}"

I refactored to use a Bash array to get the first of the matching files.

If for some reason you can't refactor your code to run entirely in jq, you definitely want to prefer pipes over temporary files.

for entry in "$F"*.json
do
  echo "$0: processing $entry" >&2
  jq '.[].commodities' "$entry"
  break
done |
sed -e '/[][]/d' -e '/null/d' |
sort -u > commodities.txt

Notice also how we take care to print the progress diagnostics to standard error (>&2) and include the name of the script in the diagnostic message. That way, when you have scripts running scripts running scripts, you can see which one wants your attention.

Also, When to wrap quotes around a shell variable

edited Aug 30, 2022 at 5:42

answered Aug 29, 2022 at 13:13

tripleee

192k37 gold badges318 silver badges368 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

neelmeg Over a year ago

I went with the second pipe approach which seems to be easy to understand, fits to my purpose and gives the expected result.

tripleee Over a year ago

You can easily use the loop from the second with the jq expression from the first if you prefer.

Fedor Martynov · Accepted Answer · 2022-08-29 10:58:26Z

-1

...
# write to target file, no temp needed
jq '.[].commodities' $entry >> commodities.txt
...
# You can read it with first sed command and pipe the output to next sed command (it reads stdin) and to the next commands
# Also, sort has -u flag that do the same as uniq, so you don't need a separate command
# At the end rewrite your target file with the result from sort
sed '/[][]/d' commodities.txt | sed '/null/d' | sort -u > commodities.txt

answered Aug 29, 2022 at 10:58

Fedor Martynov

12 bronze badges

2 Comments

neelmeg Over a year ago

Thanks!! this is something I am looking for. Since the files are too many, it's taking time. I will test it with these commands and mark it as the right answer.

neelmeg Over a year ago

I guess there is some error, the last file comes up as empty.

Collectives™ on Stack Overflow

optimize multiple sed commands in shell script

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related