I don't know why everybody gives code of some other language when you have specifically asked for bash.
Use bash's inbuilt variable expansion facility for this, it is much faster than calling an external program like sed for every filename. For only few names this does not matter much, but it can add up for a large number of files.
The code
#!/bin/bash
for file in "GENOME1_00001 HYPOTHETICAL PROTEIN A" "GENOME1_00002 HYPOTHETICAL PROTEIN B" "GENOME1_00003 HYPOTHETICAL PROTEIN C"
do
echo -n $file
new_name="${file%_*}|HYPOTHETICAL PROTEIN ${file##*EIN }"
echo " -> ${new_name}"
done
which calls no external tools, yields the output
GENOME1_00001 HYPOTHETICAL PROTEIN A -> GENOME1|HYPOTHETICAL PROTEIN A
GENOME1_00002 HYPOTHETICAL PROTEIN B -> GENOME1|HYPOTHETICAL PROTEIN B
GENOME1_00003 HYPOTHETICAL PROTEIN C -> GENOME1|HYPOTHETICAL PROTEIN C
as you asked for.
As explained in the comment, I was assuming the '>' at the beginning of the line was some kind of prompt, and only those lines are to be converted. IMHO it's fairly trivial to modify the code to accommodate Sotto Voce's objection, but then again, maybe, it's not. Here is a version that deals with all lines, as Sotto Voce requests. Note I have converted the input data to a here-document, and, like before, for efficiency no external tools are called.
#!/bin/bash
while read line
do
if [ "${line%%GENOME1_*}" = ">" ]; then
line="${line%_*}|HYPOTHETICAL PROTEIN ${line##*EIN }"
fi
echo "${line}"
done << etc
>GENOME1_00001 HYPOTHETICAL PROTEIN A
NQFTIAQSQVGLEDALLDL
>GENOME1_00002 HYPOTHETICAL PROTEIN B
NQFTIAQSQVGLEDALLDL
>GENOME1_00003 HYPOTHETICAL PROTEIN C
NQFTIAQSQVGLEDALLDL
etc
This is the output:
>GENOME1|HYPOTHETICAL PROTEIN A
NQFTIAQSQVGLEDALLDL
>GENOME1|HYPOTHETICAL PROTEIN B
NQFTIAQSQVGLEDALLDL
>GENOME1|HYPOTHETICAL PROTEIN C
NQFTIAQSQVGLEDALLDL
.*includes everything until the end, you want[0-9]*instead or[^ ]*