My colleagues and I have been debating how to best protect ourselves from XSS attacks but still preserve HTML characters that get entered into fields in our software.
To me, the ideal solution is to accept the data (turn off ASP .NET request validation) as the user enters it, throw it in the database exactly as they entered it. Then, whenever you display the data on the web, HTML-encode it. The problem with this approach is that there's a high likelihood that a developer somewhere someday will forget to HTML-encode the display of a value somewhere. Bam! XSS vulnerability.
Another solution that was proposed was to turn request validation off and strip out any HTML users enter before it is stored in the database using a regex. Devs will still have to HTML-encode things for display, but since you've stripped out any HTML tags, even if a dev forgets, we think it would be safe. The drawback to this is that users can't enter HTML tags into descriptions and fields and things, even if they explicitly want to, or they may accidentally paste in an email address surrounded by < > and the regex doesn't pick it up...whatever. It screws with the data, and it's not ideal.
The other issue we have to keep in mind is that the system has been built in the fear of commitment to any one strategy around this. And at one point, some devs wrote some pages to HTML encode data before it gets entered into the database. So some data may be already HTML encoded in the database, some data is not - it's a mess. We can't really trust any data that comes from the database as safe for display in a browser.
My question is: What would be the ideal solution if you were building an ASP .NET web app from the ground up, and what would be a good approach for us, given our situation?