0

Here's my question:

e.g

echo 123\<  abc\\\ efg

The output should be

123< abc\ efg

My regex in lex file is

[^\n ]*[\\]+[^\n]

If I use this regex, my output is going to be

 123< abc\  efg

which is wrong. Can anybody tell me how to match \(space) and regular (space) respectively?

Thanks!

2
  • Note that there is a difference between echo 123\< abc\\\ efg and echo "123\< abc\\\ efg"; in the first bash has a chance to process the escaped characters in the string before displaying them. So the question is, how are you calling your lexer in order to produce 123< abc\ efg? Commented Mar 16, 2013 at 18:35
  • I redirected 123\< abc\\\ efg into a file, thus bash would not process the backslash. Anyway, that's not the point. If I use my regex to match the string, (space) and regular space is the same. I don't know how to write the regex to match the two situation. Commented Mar 16, 2013 at 19:45

2 Answers 2

1

I believe that what you are looking for is a flex regular expression which will match a single shell token which does not contain quotes or other such complications.

Note that the characters which automatically terminate tokens are the following: ();<>&| and whitespace. (The bash manual says space and tab, but I'm pretty sure that newline also separate words.)

Such a regular expression is possible, but (imho) it is of little use, partly because it doesn't take quoting (or bracketing: a$(echo foo)b is a single word), and partly because the resulting word needs to be rescanned for escape characters. But whatever. Here's a sample flex regex:

([^();<>&|\\[:space:]]|\\(.|\n))+

That matches any number of consecutive instances of:

  • anything other than a metacharacter or an escape character, or
  • an escape character followed by any single character, or
  • an escape character followed by a newline.
Sign up to request clarification or add additional context in comments.

Comments

0

Your regex is correct. When you type at the prompt

echo 123\<  abc\\\ efg

the following happens:

  1. bash replaces \< with < (without the backslash, bash would treat < as in input redirection operator.

  2. bash replaces \\ with a single literal \

  3. bash replaces '\ ` with a single literal space.

  4. bash calls the echo command, passing it 2 arguments: 123< and abc\ efg.

  5. echo produces the output 123< abc\ efg, a single string with a single space separating its two arguments.

Based on your regular expression, it looks like the string output in my step 5 above is what is stored in your file. From those 13 bytes, it would find 3 valid tokens: 123<, abc\, and efg. If it prints them to standard output as a single string with a space separating each token, you would see 123< abc\ efg. (There should be two spaces following that backslash; I can't seem to get multiple spaces to display.)

1 Comment

Yes. The lexer will separate 123\< abc\\\ efg to three tokens which are 123<, abc\ , and efg. I don't know how to do to let lexer separate string like this to two tokens 123< and abc\ efg. Do you know how to deal with this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.