Thursday, 11 December 2008

Html/Url/Javascript Encoding made easy

When developing a web application one of the key security issues to remember is to make sure you apply the correct encoding to any text that is written to the page - either from a resources file, content management system or user generated input. Without this generally two problems can happen. Firstly any external content can produce invalid markup that at its worst can break the layout of your page or prevent JavaScript from executing. For example forgetting to escape a single quotation in a string used in a JavaScript function - such as in the word "won't" will start or end a string. That sounds pretty bad right, as your JavaScript will throw a compilation error and not run. 

 

The second problem you can face is much more sinister however. By making use of this problem a malicious attacker can post some information to your site that will in turn be viewed and executed by unsuspecting users. There are two prominent forms of attacks that developers need to be aware of - XSS (Cross site scripting) and XSRF or CSRF (Cross site request forgery). To very briefly surmise the XSS involves the attacker injecting script that is directed at stealing a users details of your website or otherwise manipulate the users browser. XSRF has been getting a bit of media attention lately, in the context of encoding it is a specialisation of XSS whereby the attacker exploits your website to execute requests on another site that the user may be a registered member of. In this sense your site becomes a vehicle for malicious attacks, and in much the same way that an open SMTP relay will get your mail server blacklisted, you may start to find your site flagged as potentially dangerous. Not good! 

 

Anyway I'm not going to dwell on these topics as there is plenty of good information on the net already - although the statics on the number of sites vulnerable and the time to patch aren't great. The sites that are most at risk are the ones that allow the users to firstly submit information that other readers can view - which is pretty much all the good ones!

 

So we need an easy way to prevent this, and the first level of protection is to encode all of your user input. Unfortunately ASP.NET is incredibly inconsistent in the way that text is encoded. In general myself and my team stay away from ASP.NET web controls (such as Label, Linkbutton etc), and instead favour clean html with the ASP.NET HtmlControls namespace (any standard tag with runat="server") and the odd <asp:Literal /> control. This works well and produces markup that is very similar the final output, which always helps when comparing rendered HTML to an ASPX page, or writing some CSS. 

 

 This takes us back to manual encoding (or using innerText of an HtmlControl). I was looking at the String.Format method and realised that a good way to implement encoding would be with an implementation of IFormatProvider for each of the encoding types. IFormatProvider allows the formatting of an object to a string – the standard ones are NumberFormatInfo, DateTimeFormatInfo and CultureInfo. You can build your own by implementing ICustomFormatter – which only requires one method implementation – Format().

 

My basic implementation makes use of a modified version of the AntiXSS 2.0 library (I have added extra Unicode characters that are safe). After getting it all up and running it is as simple as String.Format(EncodingInfo.JavascriptEncoder, "function() {alert('{0}')}", myString) - and voila – all your text is encoded.

 

There is only one problem however, String.toString(IFormatProvider) is a no-op! This makes sense really when you think about it – why would you need to format a string to string. Encoding text seems like the only instance I can think of so this fair enough but it would have been great to go "unsafe <script> string".ToString(EncodingInfo.HtmlEncoder)! So I’m left with probably creating an extension method for that case – along the lines of "unsafe <script> string".Encode(EncodingInfo.HtmlEncoder)- which is not too bad either.

No comments: