3

Recently, we had someone upload a file that contained illegal characters in the name (double hyphen) which resulted in the inability to redownload the file. In this instance the file name was
Some name -- some other information

For the upload, the file name gets set by getting the original file name which is a business rule.

file.setFileName(file.getFile().getOriginalFilename());

This resulted in the double hyphen becoming two upside down question marks, and for whatever reason resulted in the inability to retrieve the file back from the server.

I'm wondering if there is a programmatic solution to check the original file name for situation like this.

For transparency, here is the code for uploading the file:

 public void saveOpcertCeuFile(OpcertCeuFileUpload file) {
        UmdContact user = secUtilService.getActiveUser();
        String username = user.getEmail();
        Date now = new Date();

        file.setCreatedTs(now);
        file.setLastUpdatedTs(now);
        file.setCreatedBy(username);
        file.setLastUpdatedBy(username);
        file.setFileName(file.getFile().getOriginalFilename());
        file.setIsApproved(Boolean.FALSE);
        file.setIsDeleted(Boolean.FALSE);

        try {
            file.setByteContents(file.getFile().getBytes());
        } catch (Exception ex) {
            log.info(ex);
            throw new RuntimeException(ex);
        }
        dao.insertOpcertCeuFileUpload(file);

        Path path = this.getOptcertCeuFilePath(file);
        String configF = envService.getServerUrl();
        file.setFilePath(String.valueOf(path));
        dao.updateOpcertCeuFilePath(file);

        try {
            File file1 = path.toFile();
            file1.getParentFile().mkdirs();
            Files.write(path, file.getByteContents(), StandardOpenOption.CREATE_NEW);
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    }
2
  • I'd double-check that those were actually hyphens; on any modern operating systems I can think of, the hyphen is an allowable filename character. (A search suggests that the hyphen only becomes a problem when burned to a CD-R.) It may be that those characters were really some other unicode character that merely looked like a hyphen. Commented Nov 24, 2020 at 15:52
  • 1
    Sanitize your inputs. The file is a vector of attack, so sanitize its content, indeed, but also check the filename, that's another vector of attack. Do whatever is necessary in a separate method sanitizeFilename(String) and use it like this: file.setFileName(sanitizeFilename(file.getFile().getOriginalFilename()));. Better: use your own naming system for your filesystem. If you want to retrieve the original filename later, store it as metadata instead of the filename itself. Commented Nov 24, 2020 at 15:53

4 Answers 4

3

Your filesystem, your rules

If you want to store files, name them according to whatever rules you want, but don't let the user dictate the name. Will there be name conflicts? Does a filename contain invalid characters? You never know.

So use your own naming conventions. But you say that there is some business rule to force you to keep the original filename. So just do that in another place.

For instance, you get the file Hello--World.txt, use the name 20201124-000001.uploaded on your filesystem, but then store in some metadata that the filename is Hello--World.txt. When somebody wants to download that filename, just provide the original filename as the download. This way you keep the metadata associated to your filename, but you keep your system secure.

Example in your code:

// Name on filesystem.
file.setFileName(date + "-" + orderingNumberForDate(date) + ".uploaded");     

// Name in the metadata (text or db)
file.setOriginalFileName(file.getFile().getOriginalFilename()); 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you - That seems like a no brainer. I will give it a try and see what happens.
1

Use whitelisting.

Make a list of characters you find acceptable. Then run the input file name through a filter. Make a StringBuilder and loop through each character:

  1. If it is on the whitelist, append the character to the builder.
  2. Otherwise, decide if you want to ignore it (append nothing), or append some placeholder such as an underscore, which is definitely fine in a filename on every major filesystem.

If you want to get real fancy you could make a much more involved system that attempts to map any character onto a filesystem-valid character, e.g. trying to map é onto e, a non-breaking space onto nothing (no character at all, an empty string), 'ß' into 'ss', and more. But that doesn't sound like a worthwhile effort here, and is in many ways literally impossible ('ö' in german becomes 'oe', in swedish it becomes 'o'. How do you know the name in the file is swedish or german? You don't, so there is no foolproof conversion possible in the first place).

NB: You could put in some effort and figure out which characters are and aren't legal on the filesystem you're on. But then you still end up with files with a name that may be acceptable on the system you're on (and many systems accept almost everything, even real bizarre characters, because filenames are mostly just bags o' bytes, and the only reason you can't put a slash in there is because various tools will interpret it as a separator) - but is hard to move around, and causes issues because browsers don't think such characters are valid for their systems even if they are. Thus, I advise whitelisting only the simple characters: Letters, digits, underscore, maybe dollars, dots, dashes, and spaces.

Comments

0
import java.nio.file.InvalidPathException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Locale;
import java.util.Set;
import java.util.regex.Pattern;

public class FileNameValidator {

    private static final int MAX_COMPONENT_LENGTH = 255;
    
    // Pattern to block control, Bidi, and general unsafe characters (like +, |, <, >)
    // Cntrl: ASCII 0x00-0x1F and 0x7F
    // \u202A-\u202E, \u2066-\u2069: Unicode Bi-Directional (Bidi) characters
    private static final Pattern UNSAFE_CHARS_PATTERN = 
        Pattern.compile("[\\p{Cntrl}\\u202A-\\u202E\\u2066-\\u2069+\\s\"*?:<>|\\\\/]");

    // Windows Reserved Device Names (case-insensitive)
    private static final Set<String> RESERVED_NAMES = Set.of(
        "CON", "PRN", "AUX", "NUL", 
        "COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9",
        "LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9"
    );

    /**
     * Validates a string to be safely used as a single file or directory name.
     * Throws an IllegalArgumentException if the name is invalid or unsafe.
     *
     * @param component The file name string to validate.
     */
    public static void validateFileNameForInvalidChars(String component) {
        if (component == null || component.isBlank()) {
            throw new IllegalArgumentException("File name must not be blank.");
        }
        
        // 1. Basic Length Check
        if (component.length() > MAX_COMPONENT_LENGTH) {
            throw new IllegalArgumentException("File name too long.");
        }

        // 2. Traversal and Control Character Checks (Pre-Normalization)
        String lower = component.toLowerCase(Locale.ROOT);
        // Block encoded separators and null byte
        if (lower.contains("%2f") || lower.contains("%5c") || lower.contains("%00")) {
            throw new IllegalArgumentException("Encoded separator/null not allowed.");
        }
        
        // Block control characters, Bidi, and other unsafe characters (using regex)
        if (UNSAFE_CHARS_PATTERN.matcher(component).find()) {
            throw new IllegalArgumentException("File name contains unsafe characters.");
        }
        
        // 3. Path Traversal and Dot Segment Check
        Path p;
        try {
            // Use Paths.get to check OS validity and implicitly block many path errors
            p = Paths.get(component);
        } catch (InvalidPathException e) {
            throw new IllegalArgumentException("Invalid path syntax.", e);
        }
        
        // Must be a single name component (no internal slashes) and not absolute
        if (p.isAbsolute() || p.getNameCount() != 1) {
             throw new IllegalArgumentException("Not a single file name component.");
        }
        
        // Check for '.' and '..' segments which are not resolved if used as the whole name
        if (!p.normalize().equals(p)) { 
            throw new IllegalArgumentException("Traversal/dot segments not allowed.");
        }
        
        String name = p.getFileName().toString();

        // 4. Windows Compatibility and Reserved Names
        
        // Windows semantics: Disallow trailing space or dot
        if (name.endsWith(" ") || name.endsWith(".")) {
            throw new IllegalArgumentException("Trailing space/dot not allowed.");
        }

        // Windows reserved device names (case-insensitive check)
        String upper = name.toUpperCase(Locale.ROOT);
        if (RESERVED_NAMES.contains(upper)) {
            throw new IllegalArgumentException("Reserved device name not allowed.");
        }
    }
}

1 Comment

Code-only posts are not answers. You will need to explain your code and the reasoning behind it. Even if your code works, without explanation it will not clarify the matter for the person who has asked about this.
0

The first solution is to have a validation method and refuse any filename that you don't want to allow. Someplace that method you would either check against -- or any other pattern you wish via regex or String methods and refuse any filename you don't want.

The second solution is to compare the filename after

file.setFileName(file.getFile().getOriginalFilename());

to what file.getFile().getOriginalFilename() was and if they differ, then you generate your own filename and somehow let the user know about the file renaming if the internal filename is to be communicated to them. You can also combine the two.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.