Php: How to Create Unicode Filenames

PHP: How to create unicode filenames

It can't currently be done on Windows (possibly PHP 5.4 will support this scenario). In PHP, you can only write filenames using the Windows set codepage. If the codepage, does not include the character , you cannot use it. Worse, if you have a file on Windows with such character in its filename, you'll have trouble accessing it.

In Linux, at least with ext*, it's a different story. You can use whatever filenames you want, the OS doesn't care about the encoding. So if you consistently use filenames in UTF-8, you should be OK. UTF-16 is however excluded because filenames cannot include bytes with value 0.

How to open file in PHP that has unicode characters in its name?

These are conclusions so far:

  1. PHP 5 can not open filename with unicode characters unless the source filename is unicode.
  2. PHP 5 (at least on windows XP) is not able to process PHP source in unicode.

Thus the conclusion this not doable in PHP 5.

PHP Unicode file name

It is not possible.

Here is the thread explaining why

Can a PHP file name (or a dir in its full path) have UTF-8 characters?

php creating zip file for files with unicode names

ZIP files don't have a specified encoding for filenames*. Consequently any use of non-ASCII characters is completely unreliable.

*: Not completely true: there is an extension to the format that allows UTF-8 filenames to be used, and the zip command will use it. But Windows's ZIP interface (“Compressed Folders”) doesn't support it, and always uses the default (“ANSI”) code page to interpret the filename bytes. If you know that your target audience all have Windows boxes with a particular locale then you can target that locale... otherwise, best stick to ASCII.

Can a PHP file name (or a dir in its full path) have UTF-8 characters?

I have come across the same problem and done some research and conclude the following. This is for php5 on Windows; it is probably true on other platforms but I haven't checked.

  1. ALL php file system functions (dir, is_dir, is_file, file, filemtime, filesize, file_exists etc) only accept and return file names in ISO-8859-1, irrespective of the default_charset set in the program or ini files.

  2. Where a filename contains a unicode character dir->read will return it as the corresponding ISO-8859-1 character if there is one, otherwise it will substitute a question mark.

  3. When referencing a file, e.g. in is_file or file, if you pass in a UTF-8 file name the file will not be found when the name contains any two-byte or more characters. However, is_file(utf8_decode($filename)) etc will work providing the UTF-8 character is representable in ISO-8859-1.

In other words, PHP5 is not capable of addressing files with multi-byte characters in their names at all.

If a UTF-8 URL with multibyte characters is requested and this corresponds directly to a file, PHP won't be able to open the file because it cannot address it.

If you simply want pretty URLs in your language the suggestion of using mod_rewrite seems like a good one.

But if you are storing and retrieving files uploaded and downloaded by users, this problem has to be resolved. One way is to use an arbitrary (non UTF-8) file name, such as an incrementing number, on the server and index the files in a database or XML file or some such. Another way is to store the files in the database itself as a BLOB. Another way (which is perhaps easier to see what is going on, and not subject to problems if your index gets corrupted) is to encode the filenames yourself - a good technique is to urlencode (sic) all your incoming filenames when storing on the server disk and urldecode them before setting the filename in the mime header for the download. All even vaguely unusual characters (except %) are then encoded as %nn and so any problems with spaces in file names, cross platform support and pattern matching are largely avoided.



Related Topics



Leave a reply



Submit