shell and regex matching spaces

Question

Here's my question:

e.g

echo 123\<  abc\\\ efg

The output should be

123< abc\ efg

My regex in lex file is

[^\n ]*[\\]+[^\n]

If I use this regex, my output is going to be

 123< abc\  efg

which is wrong. Can anybody tell me how to match \(space) and regular (space) respectively?

Thanks!

Note that there is a difference between echo 123\< abc\\\ efg and echo "123\< abc\\\ efg"; in the first bash has a chance to process the escaped characters in the string before displaying them. So the question is, how are you calling your lexer in order to produce 123< abc\ efg? — chepner
– chepner, Commented Mar 16, 2013 at 18:35
I redirected 123\< abc\\\ efg into a file, thus bash would not process the backslash. Anyway, that's not the point. If I use my regex to match the string, (space) and regular space is the same. I don't know how to write the regex to match the two situation. — Lamian
– Lamian, Commented Mar 16, 2013 at 19:45

rici · Accepted Answer · 2013-03-16 22:32:49Z

I believe that what you are looking for is a flex regular expression which will match a single shell token which does not contain quotes or other such complications.

Note that the characters which automatically terminate tokens are the following: ();<>&| and whitespace. (The bash manual says space and tab, but I'm pretty sure that newline also separate words.)

Such a regular expression is possible, but (imho) it is of little use, partly because it doesn't take quoting (or bracketing: a$(echo foo)b is a single word), and partly because the resulting word needs to be rescanned for escape characters. But whatever. Here's a sample flex regex:

([^();<>&|\\[:space:]]|\\(.|\n))+

That matches any number of consecutive instances of:

anything other than a metacharacter or an escape character, or
an escape character followed by any single character, or
an escape character followed by a newline.

chepner · Accepted Answer · 2013-03-16 20:50:25Z

0

Your regex is correct. When you type at the prompt

echo 123\<  abc\\\ efg

the following happens:

bash replaces \< with < (without the backslash, bash would treat < as in input redirection operator.
bash replaces \\ with a single literal \
bash replaces '\ ` with a single literal space.
bash calls the echo command, passing it 2 arguments: 123< and abc\ efg.
echo produces the output 123< abc\ efg, a single string with a single space separating its two arguments.

Based on your regular expression, it looks like the string output in my step 5 above is what is stored in your file. From those 13 bytes, it would find 3 valid tokens: 123<, abc\, and efg. If it prints them to standard output as a single string with a space separating each token, you would see 123< abc\ efg. (There should be two spaces following that backslash; I can't seem to get multiple spaces to display.)

edited Mar 16, 2013 at 20:50

answered Mar 16, 2013 at 20:42

chepner

538k77 gold badges594 silver badges746 bronze badges

1 Comment

Lamian Over a year ago

Yes. The lexer will separate 123\< abc\\\ efg to three tokens which are 123<, abc\ , and efg. I don't know how to do to let lexer separate string like this to two tokens 123< and abc\ efg. Do you know how to deal with this?

Collectives™ on Stack Overflow

shell and regex matching spaces

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related