12

I was having a look at the CSS syntax here and here and I was amazed to see both the token productions and the grammar littered with whitespace declarations. Normally whitespace is defined once in the lexer and skipped, never to be seen again. Ditto comments.

I imagine the orientation towards user-agents rather than true compilers is part of the motivation here, and also the requirement to proceed in the face of errors, but it still seems pretty odd.

Are real-life UAs that parse CSS really implemented according to this (these) grammars?

EDIT: reason for the question is actually the various LESS implementations. less.js doesn't understand consecutive comments, and lessc.exe doesn't understand comments inside selectors. In this respect they are not even able to parse CSS correctly, however that is defined. So I went to see what the actual grammar of CSS was and ...

3
  • You know how IE has strange bugs with whitespace and comments? Yeah. Commented Aug 17, 2011 at 7:15
  • 5
    Oh and I removed the [programming-languages] tag because CSS isn't one. But, never mind, I'll let you keep it. Commented Aug 17, 2011 at 7:17
  • 5
    @BoltClock that seems pretty pedantic to me and I am a compiler writer, pedantic by nature and training. CSS is a language used in association with computers, and it has many of the things that programming languages have, including in this case a formal grammar which can be implemented via the same tools that are used for 'real' programming languages. Which is the whole point of the question actually. Commented Aug 18, 2011 at 11:28

1 Answer 1

26

CSS, while similar to many programming languages, does have some rare instances where whitespace can be important.


Say we have the following base markup:

<html>
    <head>
        <style type="text/css">
            .blueborder { width:200px; height:200px; border: solid 2px #00f; }
            .redborder  { width:100px; height:100px; margin:50px; border: solid 2px #f00; }
        </style>
    </head>

    <body>
        <div class="blueborder">
            <div class="redborder"></div>
        </div>
    </body>

</html>

There's nothing special here, except a div inside of a div, with some styles on it so that you can see the difference between them.

Now lets add another class and an ID to the outer div:

<div class="blueborder MyClass" id="MyDiv">
    <div class="redborder"></div>
</div>

If I want to give a background to the outer div in the following manner:

.MyClass#MyDiv { background: #ccc; }

...then the whitespace becomes important. The rule above does style the outer div, as it is parsed differently than the following:

.MyClass #MyDiv { background: #ccc; }

...which does NOT style the outer div.

To see how these are parsed differently, you can look at how the selectors are tokenized:

Example1:

.MyClass#MyDiv -> DELIM IDENT HASH

Example2:

.MyClass #MyDiv -> DELIM IDENT S HASH

If we were to blindly ignore whitespace (as compilers usually do), we would miss this difference.


With that being said, I am not implying that this grammer is good. I also have a fair amount of experience in writing grammars, and I cringe when looking at this grammar. The easiest solution would have been to add the # and . symbols into the IDENT token and then everything else becomes much easier.

However they did not chose to do this, and the need for whitespace is an artifact of that decision.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.