Java – cross-platform Java method to remove filename special chars

cross platformfilenamesfilesystemsjava

I'm making a cross-platform application that renames files based on data retrieved online. I'd like to sanitize the Strings I took from a web API for the current platform.

I know that different platforms have different file-name requirements, so I was wondering if there's a cross-platform way to do this?

Edit: On Windows platforms you cannot have a question mark '?' in a file name, whereas in Linux, you can. The file names may contain such characters and I would like for the platforms that support those characters to keep them, but otherwise, strip them out.

Also, I would prefer a standard Java solution that doesn't require third-party libraries.

Best Answer

As suggested elsewhere, this is not usually what you want to do. It is usually best to create a temporary file using a secure method such as File.createTempFile().

You should not do this with a whitelist and only keep 'good' characters. If the file is made up of only Chinese characters then you will strip everything out of it. We can't use a whitelist for this reason, we have to use a blacklist.

Linux pretty much allows anything which can be a real pain. I would just limit Linux to the same list that you limit Windows to so you save yourself headaches in the future.

Using this C# snippet on Windows I produced a list of characters that are not valid on Windows. There are quite a few more characters in this list than you may think (41) so I wouldn't recommend trying to create your own list.

        foreach (char c in new string(Path.GetInvalidFileNameChars()))
        {
            Console.Write((int)c);
            Console.Write(",");
        }

Here is a simple Java class which 'cleans' a file name.

public class FileNameCleaner {
final static int[] illegalChars = {34, 60, 62, 124, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 58, 42, 63, 92, 47};
static {
    Arrays.sort(illegalChars);
}
public static String cleanFileName(String badFileName) {
    StringBuilder cleanName = new StringBuilder();
    for (int i = 0; i < badFileName.length(); i++) {
        int c = (int)badFileName.charAt(i);
        if (Arrays.binarySearch(illegalChars, c) < 0) {
            cleanName.append((char)c);
        }
    }
    return cleanName.toString();
}
}

EDIT: As Stephen suggested you probably also should verify that these file accesses only occur within the directory you allow.

The following answer has sample code for establishing a custom security context in Java and then executing code in that 'sandbox'.

How do you create a secure JEXL (scripting) sandbox?

Related Topic