1

I was trying to perform a Reflective XSS attack on a tutorial website. The webpage basically consists of a form with an input field and a submit button. On submitting the form, the content of the input field are displayed on the same webpage.

I figured out that the website is blacklisting script tag and some of the JavaScript methods in order to prevent an XSS attack. So, I decided to encode my input and then tried submitting the form. I tried 2 different inputs and one of them worked and the other one didn't.

When I tried:

<body onload="&#97lert('Hi')"></body>

It worked and an alert box was displayed. However, I when encoded some characters in the HTML tag, something like:

&#60body onload="&#97lert('Hi')"&#62&#60/body&#62

It didn't work! It simply printed <body onload="alert('Hi')"></body> as it is on the webpage!

I know that the browsers execute inline JavaScript as they parse an HTML document (please correct me if I'm wrong). But, I'm not able to understand why did the browser show different behavior for the different inputs that I've mentioned.

-------------------------------------------------------------Edit---------------------------------------------------------

I tired the same with a more basic XSS tutorial with no XSS protection. Again:

<script>alert("Hi")</script> -> Worked!

&#60s&#99ript&#62&#97lert("Hi")&#60/s&#99ript&#62 -> Didn't work! (Got printed as string on the Web Page)

So basically, if I encode anything in JavaScript, it works. But if I'm encoding anything that is HTML, it's not executing the JavaScript within that HTML!

2
  • "I know that the browsers execute inline JavaScript as they parse an HTML document" that is correct, but what you have isn't inline javascript, it's an onload event. <script>alert("foobar!")</script> would be inline javascript. Attributes do get converted to a string with html entities replaced with the actual characters, which is why your alert on page load works. Commented May 14, 2014 at 14:10
  • -1?! Too broad?! Seriously?! I don't think the question is too broad. I just need an answer with-respect to the case mentioned in the question. I'm not asking for the complete rendering process! Commented May 14, 2014 at 15:15

2 Answers 2

2

I can't come up with words to describe the properly, so i'll just give you an example. Lets say we have this string:

<div>Hello World! &lt;span id="foo"&gt;Foobar&lt;/span&gt;</div>

When this gets parsed, you end up with a div element that contains the text:

Hello World! <span id="foo">Foobar</span>

Note, while there is something that looks like html inside the text, it is still just text, not html. For that text to become html, it would have to be parsed again.

Attributes work a little bit differently, html entities in attributes do get parsed the first time.

tl;dr:

if the service you are using is stripping out tags, there's nothing you can do about it unless the script is poorly written in a way that results in the string getting parsed twice.

Demo: http://jsfiddle.net/W6UhU/ note how after setting the div's inner html equal to it's inner text, the span becomes an html element rather than a string.

Sign up to request clarification or add additional context in comments.

5 Comments

I understand that. But is there any difference in the way in which browser is parsing an HTML tag and an encoded HTML tag? Because, the alert in the non-encoded HTML body tag is getting executed, but the one in encoded tag is not!
the encoded html tag is left as is, as you have already found out by performing said test. attributes of html tags get parsed, strings that are not html tags do not.
Okay. I think I got it this time. Since, the tag is not encoded, its attribute will be parsed and this will execute the alert method. However, in the other case, since the tag is encoded, the it's leaving the tag as it is and because of this, the attribute is not being parsed. Please correct me if I'm wrong.
That is correct. the attribute isn't an attribute if the tag isn't a tag.
Thanks! Let me just wait for sometime more for more explanation from others. Will accept your answer in case I'm not able to find a better one.
1

When an HTML page says &#60body It treats it the same as if it said &lt;body

That is, it just displays the encoded characters, doesn't parse them as HTML. So you're not creating a new tag with onload attributes http://jsfiddle.net/SSfNw/1/

alert(document.body.innerHTML);
// When an HTML page says &lt;body It treats it the same as if it said &lt;body  

So in your case, you're never creating a body tag, just content that ends up getting moved into the body tag http://jsfiddle.net/SSfNw/2/

alert(document.body.innerHTML)
// &lt;body onload="alert('Hi')"&gt;&lt;/body&gt;  

In the case <body onload="&#97lert('Hi')"></body>, the parser is able to create the body tag, once within the body tag, it's also able to create the onload attribute. Once within the attribute, everything gets parsed as a string.

1 Comment

So, the problem is that the encoded HTML encoded characters are not being parsed? However, if I encode the alert method, it is able to detect it as JavaScript. That's what I find a bit confusing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.