Java – HTML-Entity escaping to prevent XSS

escapinghtmljavaowaspxss

I have some user input. Within my code, I ensure that the following symbols are escaped:

& -> & 
< -> &lt; 
> -> &gt;

OWASP states that there are more chars to be escaped.

For attributes, I do another kind of escaping:

& -> &amp; 
" -> &quot;

This ensures that all attributes are enclosed by ". This makes me sure about my html-attributes, but not about HTML itself.

I wonder if my escaping is sufficient. I've read this post, but I'm still not sure about my concern.

(JavaScripts are escaped with the OWASP-Library)

Best Solution

I use the OWASP (ESAPI) library as well, to escape strings for different types of display, use :

String html = ESAPI.encoder().encodeForHTML("hello < how > are 'you'");
String html_attr = ESAPI.encoder().encodeForHTMLAttribute("hello < how > are 'you'");
String js = ESAPI.encoder().encodeForJavaScript("hello < how > are 'you'");

HTML (assume jsp)

<tag attr="<%= html_attr %>" onclick="alert('<%= js %>')"><%= html %></tag>

Update (2017)

As ESAPI Encoders are considered legacy, a better alternative has been created and is actively being maintained, I would strongly recommend using the OWASP Java Encoder instead.

If your project already uses ESAPI, an integration has been added that will allow you to use this library for encoding instead.

The usage is explained on their wiki page, but for the sake of completion, this is how you can use it to contextually encode your data:

// HTML Context
String html = Encoder.forHtml("u<ntrus>te'd'");

// HTML Attribute Context
String htmlAttr = Encoder.forHtmlAttribute("u<ntrus>te'd'");

// Javascript Attribute Context
String jsAttr = Encoder.forJavaScriptAttribute("u<ntrus>te'd'");

HTML (assume jsp)

<div data-attr="<%= htmlAttr %>" onclick="alert('<%= jsAttr %>')">
    <%= html %>
</div>

PS: more contexts exist and are supported by the library