How do I use filesystem functions in PHP, using UTF-8 strings?
Just urlencode
the string desired as a filename. All characters returned from urlencode
are valid in filenames (NTFS/HFS/UNIX), then you can just urldecode
the filenames back to UTF-8 (or whatever encoding they were in).
Caveats (all apply to the solutions below as well):
- After url-encoding, the filename must be less that 255 characters (probably bytes).
- UTF-8 has multiple representations for many characters (using combining characters). If you don't normalize your UTF-8, you may have trouble searching with
glob
or reopening an individual file. - You can't rely on
scandir
or similar functions for alpha-sorting. You musturldecode
the filenames then use a sorting algorithm aware of UTF-8 (and collations).
Worse Solutions
The following are less attractive solutions, more complicated and with more caveats.
On Windows, the PHP filesystem wrapper expects and returns ISO-8859-1 strings for file/directory names. This gives you two choices:
Use UTF-8 freely in your filenames, but understand that non-ASCII characters will appear incorrect outside PHP. A non-ASCII UTF-8 char will be stored as multiple single ISO-8859-1 characters. E.g.
ó
will be appear asó
in Windows Explorer.Limit your file/directory names to characters representable in ISO-8859-1. In practice, you'll pass your UTF-8 strings through
utf8_decode
before using them in filesystem functions, and pass the entriesscandir
gives you throughutf8_encode
to get the original filenames in UTF-8.
Caveats galore!
- If any byte passed to a filesystem function matches an invalid Windows filesystem character in ISO-8859-1, you're out of luck.
- Windows may use an encoding other than ISO-8859-1 in non-English locales. I'd guess it will usually be one of ISO-8859-#, but this means you'll need to use
mb_convert_encoding
instead ofutf8_decode
.
This nightmare is why you should probably just transliterate to create filenames.
PHP and Linux filesystem names in utf-8
The import thing to remember is that in Linux, filenames don't have a character encoding and instead are just an 8bit strings.
For example, if you upload a file via FTP and the FTP server uses Windows-1252 character encoding, the filename will be 8bit Windows-1252. Trying to open the file using a UTF-8 characters will fail, no matter what the locale or LANG is.
This is unlike OS X, where the filename is always UTF-8, and Windows where the filename is always UTF-16.
As you'll probably found strings in PHP are also just 8bit strings, so it's impossible to know for sure what encoding is being used for a string - You can easily have two strings that are encoded to different character sets.
My advice is to ensure that you know the encoding for any string your read or output including form fields and filenames.
Therefore, make sure the filename on disk is UTF-8 and the filename value you put into the database is UTF-8. Then, when you pull the value from the DB, the file variable should be UTF-8 encoded already and will be ready to pass to the fopen
command.
PHP - Upload utf-8 filename
I'm on Windows 8 chinese version, and I deal with similar problem with this:
$filename = iconv("utf-8", "cp936", $filename);
cp
stands for Code page
and cp936
stands for Code page 936, which is the default code page of simplified chinese version of Windows.
So I think maybe your problem could be solved in a similar way:
$fn2 = iconv("UTF-8","cp1258", $base_dir.$fn);
I'm not quite sure whether the default code page of your OS is 1258
or not, you should check it yourself by opening command prompt and type in command chcp
. Then change 1258
to whatever the command give you.
UPDATE
It seems that PHP filesystem functions can only handle characters that are in system codepage, according to this answer. So you have 2 choices here:
Limit the characters in the filename to system codepage - in your case, it's
437
. But I'm pretty sure that code page 437 does not include all the vietnamese characters.Change your system codepage to the vietnamese one:
1258
and convert the filename tocp1258
. Then the filesystem functions should work.
Both choices are deficient:
Choice 1: You can't use vietnamese characters anymore, which is not what you want.
Choice 2: You have to change system code page, and filename characters are limited to code page 1258.
UPDATE
How to change system code page:
Go to Control Panel
> Region
> Administrative
> Change system locale
and select Vietnamese(Vietnam)
in the drop down menu.
PHP: How to create unicode filenames
It can't currently be done on Windows (possibly PHP 5.4 will support this scenario). In PHP, you can only write filenames using the Windows set codepage. If the codepage, does not include the character ♫
, you cannot use it. Worse, if you have a file on Windows with such character in its filename, you'll have trouble accessing it.
In Linux, at least with ext*, it's a different story. You can use whatever filenames you want, the OS doesn't care about the encoding. So if you consistently use filenames in UTF-8, you should be OK. UTF-16 is however excluded because filenames cannot include bytes with value 0.
Related Topics
Replace Keys in an Array Based on Another Lookup/Mapping Array
MySQL and PHP: Utf-8 With Cyrillic Characters
PHP: How to Send Http Response Code
How to Close a Connection Early
What Are the PHP Operators "" and ":" Called and What Do They Do
Multiple Returns from a Function
What Does the Variable $This Mean in PHP
Finding Cartesian Product With PHP Associative Arrays
How to Extract Data from CSV File in PHP
MySQLi Update Throwing Call to a Member Function Bind_Param() Error
How to Create Websockets Server in PHP
How to Get the Jquery $.Ajax Error Response Text
How to Get an Array of Specific "Key" in Multidimensional Array Without Looping