Perl – How to remove all non-word characters except the newline


I have a file like this:

my line - some words & text
oh lóok i've got some characters

I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:


I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.

I tried this:

cat file | perl -pe 's/\W//'

But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?

Best Solution

This removes characters that don't match \w or \n:

cat file | perl -C -pe 's/[^\w\n]//g'