I believe that what you are looking for is a flex regular expression which will match a single shell token which does not contain quotes or other such complications.
Note that the characters which automatically terminate tokens are the following: ();<>&| and whitespace. (The bash manual says space and tab, but I'm pretty sure that newline also separate words.)
Such a regular expression is possible, but (imho) it is of little use, partly because it doesn't take quoting (or bracketing: a$(echo foo)b is a single word), and partly because the resulting word needs to be rescanned for escape characters. But whatever. Here's a sample flex regex:
([^();<>&|\\[:space:]]|\\(.|\n))+
That matches any number of consecutive instances of:
- anything other than a metacharacter or an escape character, or
- an escape character followed by any single character, or
- an escape character followed by a newline.
echo 123\< abc\\\ efgandecho "123\< abc\\\ efg"; in the firstbashhas a chance to process the escaped characters in the string before displaying them. So the question is, how are you calling your lexer in order to produce123< abc\ efg?