Sanitize File Path in PHP

Sanitize file path in PHP

realpath() will let you convert any path that may contain relative information into an absolute path...you can then ensure that path is under a certain subdirectory that you want to allow downloads from.

Sanitize file path in PHP without realpath()

Not sure why you wouldn't want to use realpath but path name sanitisation is a very simple concept, along the following lines:

  • If the path is relative (does not start with /), prefix it with the current working directory and /, making it an absolute path.
  • Replace all sequences of more than one / with a single one (a).
  • Replace all occurrences of /./ with /.
  • Remove /. if at the end.
  • Replace /anything/../ with /.
  • Remove /anything/.. if at the end.

The text anything in this case means the longest sequence of characters that aren't /.

Note that those rules should be applied continuously until such time as none of them result in a change. In other words, do all six (one pass). If the string changed, then go back and do all six again (another pass). Keep doing that until the string is the same as before the pass just executed.

Once those steps are done, you have a canonical path name that can be checked for a valid pattern. Most likely that will be anything that doesn't start with ../ (in other words, it doesn't try to move above the starting point. There may be other rules you want to apply but that's outside the scope of this question.


(a) If you're working on a system that treats // at the start of a path as special, make sure you replace multiple / characters at the start with two of them. This is the only place where POSIX allows (but does not mandate) special handling for multiples, in all other cases, multiple / characters are equivalent to a single one.

string sanitizer for filename

Instead of worrying about overlooking characters - how about using a whitelist of characters you are happy to be used? For example, you could allow just good ol' a-z, 0-9, _, and a single instance of a period (.). That's obviously more limiting than most filesystems, but should keep you safe.

How to sanitize $_GET variables to prevent path injection?

Here's what I would do after the lines you have there:

if (dirname($path) === $baseDir) {
//Safe
}

http://php.net/dirname

Basically, do a check before sending anything that the file is actually in that one path you support. Note, you will also have to add your own / before the filename (in $path) and remove it from your $baseDir definition, as dirname() won't leave a trailing path separator.

Do I need to sanitize input to file_exists?

It depends on what you're trying to protect against.

file_exists doesn't do any writing to disk, which means that the worst that can happen is that someone gains some information about your file system or the existence of files that you have.

In practice however, if you're doing something later on with the same file that was previously checked with file_exists, such as includeing it, you may wish to perform more stringent checks.

I'm assuming that you may be passing arbitrary values, possibly sourced from user input, into this function.

If that is the case, it somewhat depends on why you actually need to use file_exists in the first place. In general, for any filesystem function that the user can pass values directly into, I'd try to filter out the string as much as possible. This is really just being pedantic and on the safe side, and may be unnecessary in practice.

So, for example, if you only ever need to check the existence of a file in a single directory, you should probably strip out directory delimiters of all sorts.

From personal experience, I've only ever passed user input into a file_exists call for mapping to a controller file, in which case, I'd just strip out any non-alphanumeric + underscore character.

UPDATE: reading your comments recently added, no there aren't special characters as this isn't executed in a shell. Even \0 should be fine, at least on newer PHP versions (I believe older ones would cut the string before the \0 when sent to underlying filesystem calls).

Sanitizing strings to make them URL and filename safe?

Some observations on your solution:

  1. 'u' at the end of your pattern means that the pattern, and not the text it's matching will be interpreted as UTF-8 (I presume you assumed the latter?).
  2. \w matches the underscore character. You specifically include it for files which leads to the assumption that you don't want them in URLs, but in the code you have URLs will be permitted to include an underscore.
  3. The inclusion of "foreign UTF-8" seems to be locale-dependent. It's not clear whether this is the locale of the server or client. From the PHP docs:

A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.

Creating the slug

You probably shouldn't include accented etc. characters in your post slug since, technically, they should be percent encoded (per URL encoding rules) so you'll have ugly looking URLs.

So, if I were you, after lowercasing, I'd convert any 'special' characters to their equivalent (e.g. é -> e) and replace non [a-z] characters with '-', limiting to runs of a single '-' as you've done. There's an implementation of converting special characters here: https://web.archive.org/web/20130208144021/http://neo22s.com/slug

Sanitization in general

OWASP have a PHP implementation of their Enterprise Security API which among other things includes methods for safe encoding and decoding input and output in your application.

The Encoder interface provides:

canonicalize (string $input, [bool $strict = true])
decodeFromBase64 (string $input)
decodeFromURL (string $input)
encodeForBase64 (string $input, [bool $wrap = false])
encodeForCSS (string $input)
encodeForHTML (string $input)
encodeForHTMLAttribute (string $input)
encodeForJavaScript (string $input)
encodeForOS (Codec $codec, string $input)
encodeForSQL (Codec $codec, string $input)
encodeForURL (string $input)
encodeForVBScript (string $input)
encodeForXML (string $input)
encodeForXMLAttribute (string $input)
encodeForXPath (string $input)

https://github.com/OWASP/PHP-ESAPI
https://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API

PHP - Sanitizing input for file_exits() and include()

$path    = realpath($file);  // resolves all ../
$allowed = '/foo/bar/baz/';

if (strpos($path, $allowed) !== 0) {
die('not allowed');
}

if (strpos($path, '/foo/bar/baz/resources/') === 0) {
die('not allowed');
}

Undocumented sanitization in PHP file upload

You want to look at rfc1867.c, this seems the part you refer to:

SAPI_API SAPI_POST_HANDLER_FUNC(rfc1867_post_handler)

From the comment, it appears that basename() is used to get rid of spurious backslashes, which could actually be correct (I imagine perhaps "Hello\ World.txt"?). But this is based on IE's behaviour and the comment states that it might be removed in the future.

So you can't rely on this "sanitization" to keep on being there.

...

    /* The \ check should technically be needed for win32 systems only where
* it is a valid path separator. However, IE in all it's wisdom always sends
* the full path of the file on the user's filesystem, which means that unless
* the user does basename() they get a bogus file name. Until IE's user base drops
* to nill or problem is fixed this code must remain enabled for all systems. */

s = _basename(internal_encoding, filename TSRMLS_CC);
if (!s) {
s = filename;
}

Sanitize filepath string and only allow 1 trailing slash at the end

You can use preg_replace with arrays of patterns and replacements; the first to remove non-alphanumeric characters other than _, - and /, and the second to remove all but the last trailing /:

$string = 'controller_123/method///';
echo preg_replace(array('#[^\w/-]+#', '#/+$#'), array('', '/'), $string);

Output:

controller_123/method/

Demo on 3v4l.org

The regex can be improved by noting that we want to remove all / before the one at the end of the line, and using a positive lookahead to match those. Then all matches can simply be replaced with an empty string:

$string = 'contr*@&oller_123////method///';
echo preg_replace('#[^\w/-]+|/(?=/+$)#', '', $string);

Output:

controller_123////method/

Demo on 3v4l.org



Related Topics



Leave a reply



Submit