Using regular expressions is probably the best way. You can see a bunch of tests here (taken from chromium)
function validateEmail(email) {
const re = /^(([^<>()[\]\\.,;:\s@"]+(\.[^<>()[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
return re.test(String(email).toLowerCase());
}
Here's the example of regular expresion that accepts unicode:
const re = /^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})$/i;
But keep in mind that one should not rely only upon JavaScript validation. JavaScript can easily be disabled. This should be validated on the server side as well.
Here's an example of the above in action:
function validateEmail(email) {
const re = /^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
return re.test(email);
}
function validate() {
const $result = $("#result");
const email = $("#email").val();
$result.text("");
if (validateEmail(email)) {
$result.text(email + " is valid :)");
$result.css("color", "green");
} else {
$result.text(email + " is not valid :(");
$result.css("color", "red");
}
return false;
}
$("#email").on("input", validate);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<label for=email>Enter an email address:</label>
<input id="email">
<h2 id="result"></h2>
I think you are attacking it from the wrong angle by trying to encode all posted data.
Note that a "<
" could also come from other outside sources, like a database field, a configuration, a file, a feed and so on.
Furthermore, "<
" is not inherently dangerous. It's only dangerous in a specific context: when writing strings that haven't been encoded to HTML output (because of XSS).
In other contexts different sub-strings are dangerous, for example, if you write an user-provided URL into a link, the sub-string "javascript:
" may be dangerous. The single quote character on the other hand is dangerous when interpolating strings in SQL queries, but perfectly safe if it is a part of a name submitted from a form or read from a database field.
The bottom line is: you can't filter random input for dangerous characters, because any character may be dangerous under the right circumstances. You should encode at the point where some specific characters may become dangerous because they cross into a different sub-language where they have special meaning. When you write a string to HTML, you should encode characters that have special meaning in HTML, using Server.HtmlEncode. If you pass a string to a dynamic SQL statement, you should encode different characters (or better, let the framework do it for you by using prepared statements or the like)..
When you are sure you HTML-encode everywhere you pass strings to HTML, then set ValidateRequest="false"
in the <%@ Page ... %>
directive in your .aspx
file(s).
In .NET 4 you may need to do a little more. Sometimes it's necessary to also add <httpRuntime requestValidationMode="2.0" />
to web.config (reference).
Best Solution
Shortly after I posted this answer I found xval which is a validation framework for ASP.NET MVC.