35

I see the following HTML code used a lot to load jQuery from a content delivery network, but fall back to a local copy if the CDN is unavailable (e.g. in the Modernizr docs):

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.js"></script>
<script>window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">\x3C/script>')</script>

My question is, why is the last < character in the document.write() statement replaced with the escape sequence \x3C? < is a safe character in JavaScript and is even used earlier in the same string, so why escape it there? Is it just to prevent bad browser implementations from thinking the </script> inside the string is the real script end tag? If so are there really any browsers out there that would fail on this?

As a follow-on question, I've also seen a variant using unescape() (as given in this answer) in the wild a couple of times too. Is there a reason why that version always seems to substitute all the < and > characters?

1
  • Would this be true from within a script file or eval'd block? Commented Apr 10, 2015 at 20:13

2 Answers 2

69

When the browser sees </script>, it considers this to be the end of the script block (since the HTML parser has no idea about JavaScript, it can't distinguish between something that just appears in a string, and something that's actually meant to end the script element). So </script> appearing literally in JavaScript that's inside an HTML page will (in the best case) cause errors, and (in the worst case) be a huge security hole.

That's why you somehow have to prevent this sequence of characters to appear. Other common workarounds for this issue are "<"+"/script>" and "<\/script>" (they all come down to the same thing).

While some consider this to be a "bug", it actually has to happen this way, since, as per the specification, the HTML part of the user agent is completely separate from the scripting engine. You can put all kinds of things into <script> tags, not just JavaScript. The W3C mentions VBScript and TCL as examples. Another example is the jQuery template plugin, which uses those tags as well.

But even within JavaScript, where you could suggest that such content in strings could be recognized and thus not be treated as ending tags, the next ambiguity comes up when you consider comments:

<script type="text/javascript">foo(42); // call the function </script>

– what should the browser do in this case?

And finally, what about browsers that don't even know JavaScript? They would just ignore the part between <script> and </script>, but if you gave different semantics to the character sequence </script> based on the browsers knowledge of JavaScript, you'd suddenly have two different results in the HTML parsing stage.

Lastly, regarding your question about substituting all angle brackets: I'd say at least in 99% of the cases, that's for obfuscation, i.e. to hide (from anti-virus software, censoring proxies (like in your example (nested parens are awesome)), etc.) the fact that your JavaScript is doing some HTML-y stuff. I can't think of good technical reasons to hide anything but </script>, at least not for reasonably modern browsers (and by that, I mean pretty much anything newer than Mosaic).

Sign up to request clarification or add additional context in comments.

8 Comments

I just amended the question to that effect, probably as you were typing your answer. Are there really any modern browsers that would fail on this?
Most, I think. I know for sure that Chrome interprets it as a closing tag. (Checked yesterday)
Would this happen with a CDATA section wrapping the script?
@Mark — All browsers would "fail" on that, because </script> is supposed to be treated as an end tag.
@Marcin — Not if the document was served as application/xhtml+xml and parsed as XML, and not if the document was parsed using a real SGML parser … but it would with a tag soup or HTML 5 parser.
|
2

Some parsers handle the < version as the closing tag and interpret the code as

<script>
  window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">
</script>

\x3C is hexadecimal for <. Those are interchangable within the script.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.