From the spec you link to, we have:
3.1 Lexical tokens
Verilog HDL source text files shall be a stream of lexical tokens. A lexical token shall consist of one or more
characters. The layout of tokens in a source file shall be free format; that is, spaces and newlines shall not be
syntactically significant other than being token separator
So the verilog file is just a free format arrangement of tokens. The question is how will they split. There is basically no looking back - it is simply a stream split up into tokens one by one.
Let's taken an example without whitespace:
a&&b
The parser will see a, then it finds a character which can't be part of that token (special character can't be in a name), so it starts parsing the next. It will run through bot && because the second special character can be part of the first. Then reaches b which is not special so starts the next token.
With whitespace, from the spec you link:
3.2 White space
White space shall contain the characters for spaces, tabs, newlines, and formfeeds. These characters shall be
ignored except when they serve to separate other lexical tokens.
Whitespace is ignored except for forcefully separating tokens, so your suggestion that:
When using unary reduction operators whitespace is not allowed between the operator and the operand.
Is not required. The following works perfectly well:
b = & a;
The whitespace has no impact on the token splitting. Context will infer it to be unary reduction. The only reason to not include the whitespace is purely stylistic - making it easier for us to read.
For your example:
a && b
The whitespace is unnecessary - it will force splitting the a, &&, and b as the tokens, but is not required. All three are unambiguous anyway. See later for why && itself is unambiguous.
If you did:
a & & b
Now the whitespace comes into force. It separates the two & into two different tokens. The parser can then work out based on the context that the first is bitwise and the second is reduction. Operator precedence means the unary operation occurs first.
An intersting one is:
a &&&& b
3.4 Operators
Operators are single-, double-, or triple-character sequences and are used in expressions. Clause 5 discusses
the use of operators in expressions
It seems however that there is a certain amount of sanity in the splitting which I can't find reference to yet, is that it checks the validity of tokens when splitting.
So the token splitting for those operators might theoretically result in the tokens &&& and & because they can't be more than three characters. Let's look further.
Theoretically this should work for:
a &&& b
The middle token &&& is slightly ambiguous in that for newer versions of Verilog that is a valid token, so a supporting parser will not try to decompose the middle token any further. Older parsers may well not know about it, so split it after the second ampersand into && and & - a logical then a unary AND.
So your four ampersands may decode into: a, &&, &&, and b on older parsers that didn't know about the triple ampersand operator. This would result in a syntax error because you can't have two logical ANDs with no other token between.
If we look at the case:
a ||| b
The sanity checks will decompose this into || and |, because ||| is not a valid token anywhere AFAIK, so it stops after || and moves on to the next.
For this more weird example:
a &&~| b
After && the next character ~ would result in an invalid token, so it stops and starts a new token, resulting in the decomposition into tokens: a, &&, ~|, and b. A logical AND followed by unary NOR.
The key thing is the tokeniser will not split until it either reaches whitespace or an invalid character for the current token, so you will not see for example && ever being split because the double ampersand is a valid token - unless you use whitespace to force it to happen.
Another example based on your reply in the comments:
b = &&a;
The token parsing is exactly the same rules as above. Run forward splitting whenever you reach either an invalid operator, or a character not valid for the current token.
You will get four tokens: b, =, && and a. The double && is still not split up, because && is a valid token and there was no white space forcing a new token to begin.
The result is your synthesis tool issues a syntax error during compile because the logical AND (&&) is not a unary operator.