Ruby-on-rails – How to delete special characters

regexrubyruby-on-rails

I'm practicing with Ruby and regex to delete certain unwanted characters. For example:

input = input.gsub(/<\/?[^>]*>/, '')

and for special characters, example ☻ or ™:

input = input.gsub('&#', '')

This leaves only numbers, ok. But this only works if the user enters a special character as a code, like this:

&#153;

My question:
How I can delete special characters if the user enters a special character without code, like this:

™ ☻

Best Solution

First of all, I think it might be easier to define what constitutes "correct input" and remove everything else. For example:

input = input.gsub(/[^0-9A-Za-z]/, '')

If that's not what you want (you want to support non-latin alphabets, etc.), then I think you should make a list of the glyphs you want to remove (like ™ or ☻), and remove them one-by-one, since it's hard to distinguish between a Chinese, Arabic, etc. character and a pictograph programmatically.

Finally, you might want to normalize your input by converting to or from HTML escape sequences.