21

EDIT: For future reference, I'm using non-xhtml content type definition <!html>

I'm creating a website using Django, and I'm trying to embed arbitrary json data in my pages to be used by client-side javascript code.

Let's say my json object is {"foo": "</script>"}. If I embed this directly,

<script type='text/javascript'>JSON={"foo": "</script>"};</script>

The first closes the json object. (also, it will make the site vulnerable to XSS, since this json object will be dynamically generated).

If I use django's HTML escape function, the resulting output is:

<script type='text/javascript'>JSON={&quot;foo&quot;: &quot;&lt;/script&gt;&quot;};</script> 

and the browser cannot interpret the <script> tag.

The question I have here is,

  1. Which characters am i suppose to escape / not escape in this situation?
  2. Is there automated way to perform this in Python / django?
4
  • You can use entity references (&lt;, &gt;) within <script> only if you are using XHTML. If you're using regular HTML, you can't HTML-escape the script. Instead, follow slebetman's advice and just make sure / is escaped. Commented Nov 14, 2010 at 7:19
  • @yonran, so, escaping only slashes by running string-replacement for / to \/ is good enough? Commented Nov 14, 2010 at 11:21
  • yes, that should be the case. For more information about how browsers parse the script tag, see HTML 5 tokenization: Commented Nov 14, 2010 at 22:27
  • Sorry, I was wrong. Let me clarify. Commented Nov 14, 2010 at 22:35

4 Answers 4

11

If you are using XHTML, you would be able to use entity references (&lt;, &gt;, &amp;) to escape any string you want within <script>. You would not want to use a <![CDATA[...]]> section, because the sequence "]]>" can't be expressed within a CDATA section, and you would have to change the script to express ]]>.

But you're probably not using XHTML. If you're using regular HTML, the <script> tag acts somewhat like a CDATA section in XML, except that it has even more pitfalls. It ends with </script>. There are also arcane rules to allow <!-- document.write("<script>...</script>") --> (the comments and <script> opening tag must both be present for </script> to be passed through). The compromise that the HTML5 editors adopted for future browsers is described in HTML 5 tokenization and CDATA Escapes

I think the takeaway is that you must prevent </script> from occurring in your JSON, and to be safe you should also avoid <script>, <!--, and --> to prevent runaway comments or script tags. I think it's easiest just to replace < with \u003c and --> with --\>

Sign up to request clarification or add additional context in comments.

1 Comment

I'll add that you need to escape the HTML characters <, >, & and = to make your json string safe to embed. According to google's gson library. google-gson.googlecode.com/svn/trunk/gson/docs/javadocs/…
6

I tried backslash escaping the forward slash and that seems to work:

<script type='text/javascript'>JSON={"foo": "<\/script>"};</script>

have you tried that?


On a side note, I am surprised that the embedded </script> tag in a string breaks the javascript. Couldn't believe it at first but tested in Chrome and Firefox.

3 Comments

embedded </script> breaking is kinda expected (i thought it was strange too), because that means js parsing must be done along the HTML parsing (html parser must be aware of the semantics of javascript text), which seems very complicated to me.
Yep, HTML parsers as a rule don't speak JavaScript. The contents of the script tags are passed to the interpreter only after the HTML is parsed, and HTML doesn't say anything about tags not being tags when they're between quotation marks!
Yes, that is expected - the usual trick to prevent it is the break up the tag into two - "</scr" + "ipt>"
0

I would do something like this:

<script type='text/javascript'>JSON={"foo": "</" + "script>"};</script>

Comments

0

For this case in python, I have opened a bug in the bug tracker. However the rules are indeed complicated, as <!-- and <script> play together in quite evil ways even in the adopted html5 parsing rules. BTW, ">" is not a valid JSON escape, so it would better be replaced with "\u003E", thus the absolutely safe escaping should be to escape \u003C and \u003E AND a couple other evil characters mentioned in the python bug...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.