I have a bash script which iterates over many files: f1.gz, f2.gz, .. fn.gz
Each files contains millions of lines and each line could match one pattern out of set: p1, p2, .. pn
Depending on that, the matching line should go to a specific file. The patterns are obtained with date manipulations.
I wrote a couple of versions of the same but I'm not satisfied at all and I would like to ask if any better way/solution can be achieved without recurring to writing anything in compiled language.
Here's what I have:
for FILE in `ls f*.gz`
do
echo "uncompressing only once per file -- $FILE: "
gzcat $FILE > .myfile.txt
while IFS='' read -r LINE || [[ -n "$LINE" ]]; do
for DATE in "$@" # I pass to my script several dates like 20201015, 20201014, etc
do
for i in {0..23};
do
p="DATE_PATTERNS_$DATE[$i]" # I prepared these before to avoid running "date" millions of times
echo $LINE | awk -v pat=${!p} -F '"' '$1 ~ pat {print $2" "$4" "$6}' >> $DATE.txt
done
done
done < .myfile.txt
done
Thanks
DATE_PATTERNS_$DATE[$i]generated? How you prepared it?echo $LINE- this is just a constant pattern?awk:awk '{ ....}' pattern_file myfile.txt; haveawkload the first file into an array, then as parsing the 2nd file look for the desired field 'in' the array; google search onawk load array files FNR==NR NR==FNRwill bring up a ton of hits, like this and this; net result: oneawkper file, each line scanned just oncegzcat $ifile > tmp. Look at processing likegunzip -c $file | awk -v inputList="....." ' ...'where yourawkaccepts a list of dates/conditions that it will filter for and will use internalprint $0 > "/path/to/data/file.txt"commands to generate your output. Good luck.