fill an array with all the ocurrences of the pattern
First convert your file to have meaningful delimiter, ex. null byte, with ex. GNU sed with -z switch:
sed -z 's/"\([^"]*\)"[^"]*/\1\00/g'
I've added the [^"]* on the end, so that characters not between " are removed.
After it it becomes more trivial to parse it.
You can get the first element with:
head -z -n1
Or sort and count the occurrences:
sort -z | uniq -z -c
Or load to an array with bash's maparray:
maparray -d '' -t arr < <(<input sed -z 's/"\([^"]*\)"[^"]*/\1\00/'g))
Alternatively you can use ex. $'\01' as the separator, as long as it's unique, it becomes simple to parse such data in bash.
Handling such streams is a bit hard in bash. You can't set variable value in shell with embedded null byte. Also expect sometimes warnings on command substitutions. Usually when handling data with arbitrary bytes, I convert it with xxd -p to plain ascii and back with xxd -r -p. With that, it becomes easier.
The following script:
cat <<'EOF' >input
"My name
is XXX"
"My name is YYY"
"Today
is
the "
EOF
sed -z 's/"\([^"]*\)"[^"]*/\1\x00/g' input > input_parsed
echo "##First element is:"
printf '"'
<input_parsed head -z -n1
printf '"\n'
echo "##Elemets count are:"
<input_parsed sort -z | uniq -z -c
echo
echo "##The array is:"
mapfile -d '' -t arr <input_parsed
declare -p arr
will output (the formatting is a bit off, because of the non-newline delimetered output from uniq):
##First element is:
"My name
is XXX"
##Elemets count are:
1 My name
is XXX 1 My name is YYY 1 Today
is
the
##The array is:
declare -a arr=([0]=$'My name\nis XXX' [1]="My name is YYY" [2]=$'Today\nis\nthe ')
Tested on repl.it.
""? Also note that writing a parser with only regular expressions to handle escapes sequences is impossible.grep -Eoz '"[^"]*"' filethe newline isn't considered- what do you mean by this? How is it not considered?grep -zshould work.