PHP - Upload utf-8 filename
I'm on Windows 8 chinese version, and I deal with similar problem with this:
$filename = iconv("utf-8", "cp936", $filename);
cp
stands for Code page
and cp936
stands for Code page 936, which is the default code page of simplified chinese version of Windows.
So I think maybe your problem could be solved in a similar way:
$fn2 = iconv("UTF-8","cp1258", $base_dir.$fn);
I'm not quite sure whether the default code page of your OS is 1258
or not, you should check it yourself by opening command prompt and type in command chcp
. Then change 1258
to whatever the command give you.
UPDATE
It seems that PHP filesystem functions can only handle characters that are in system codepage, according to this answer. So you have 2 choices here:
Limit the characters in the filename to system codepage - in your case, it's
437
. But I'm pretty sure that code page 437 does not include all the vietnamese characters.Change your system codepage to the vietnamese one:
1258
and convert the filename tocp1258
. Then the filesystem functions should work.
Both choices are deficient:
Choice 1: You can't use vietnamese characters anymore, which is not what you want.
Choice 2: You have to change system code page, and filename characters are limited to code page 1258.
UPDATE
How to change system code page:
Go to Control Panel
> Region
> Administrative
> Change system locale
and select Vietnamese(Vietnam)
in the drop down menu.
How can i upload a file with utf-8 name in php?
You need to specify the correct character set to iconv from which to convert the string. Something like this:
$fn = iconv("<persian-character-set>", "UTF-8", $file['name']);
You may want to add additional options to the output character set like TRANSLINT and/or IGNORE:
$fn = iconv("<persian-character-set>", "UTF-8//TRANSLIT//IGNORE", $file['name']);
See http://php.net/manual/en/function.iconv.php for details on these options.
UTF-8 characters in uploaded file name are jumbled on file upload
UPDATE
Indeed this is a PHP bug on Windows. There are workarounds like below, but the best solution I have seen is to use the WFIO extension. This extension provides a new protocol wfio://
for file streams and allows PHP to properly handle UTF-8 characters on the Windows file-system. wfio://
supports a number of PHP functions including fopen, scandir, mkdir, copy, rename, etc.
original solution
So this problem is related to a PHP bug on Windows: http://bugs.php.net/bug.php?id=47096
Unicode characters get mangled by PHP on move_upload_file - although I have also seen the issue with rename and ZipArchive so I think it's a general issue with PHP and Windows.
I have adapted a workaround from Wordpress found here. I have to store the file with the mangled file name and then sanitize it on download/email/display.
Here are the adapted methods I'm using in case it's of use to someone in future. This still isn't much use if you're trying to zip files before downloading/emailing or you need to write the files to a network share.
public static function sanitizeFilename($filename, $utf8 = true)
{
if ( self::seems_utf8($filename) == $utf8 )
return $filename;
// On Windows platforms, PHP will mangle non-ASCII characters, see http://bugs.php.net/bug.php?id=47096
if ( 'WIN' == substr( PHP_OS, 0, 3 ) ) {
if(setlocale( LC_CTYPE, 0 )=='C'){ // Locale has not been set and the default is being used, according to answer by Colin Morelli at http://stackoverflow.com/questions/13788415/how-to-retrieve-the-current-windows-codepage-in-php
// thus, we force the locale to be explicitly set to the default system locale
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, '' ), '.' ), '.' );
}
else {
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, 0 ), '.' ), '.' );
}
$charset = 'UTF-8';
if ( function_exists( 'iconv' ) ) {
if ( false == $utf8 ){
$filename = iconv( $charset, $codepage . '//IGNORE', $filename );
}
else {
$filename = iconv( $codepage, $charset, $filename );
}
} elseif ( function_exists( 'mb_convert_encoding' ) ) {
if ( false == $utf8 )
$filename = mb_convert_encoding( $filename, $codepage, $charset );
else
$filename = mb_convert_encoding( $filename, $charset, $codepage );
}
}
return $filename;
}
public static function seems_utf8($str) {
$length = strlen($str);
for ($i=0; $i < $length; $i++) {
$c = ord($str[$i]);
if ($c < 0x80) $n = 0; # 0bbbbbbb
elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
set charset when saving files with php
It sounds like your database doesn't have the correct collation. Make sure the tables/columns are using utf8_general_ci
for their collation.
Also extremely important when handling UTF8 is to use the following two MySQL lines for GET
requests...
SET time_zone = '+00:00'
SET CHARACTER SET 'utf8'
...and when you have a POST
request use the following two...
SET time_zone = '+00:00'
SET NAMES 'utf8'
These will help ensure that UTF8 characters are maintained correctly.
PHP Uploaded file name: Japanese character encoding
To qualify my answer (to the downvoter):
Q: I have heard that UTF-8 does not support some Japanese characters. Is this correct?
A: There is a lot of misinformation floating around about the support
of Chinese, Japanese and Korean (CJK) characters. The Unicode Standard
supports all of the CJK characters from JIS X 0208, JIS X 0212, JIS X
0221, or JIS X 0213, for example, and many more. This is true no
matter which encoding form of Unicode is used: UTF-8, UTF-16, or
UTF-32.Unicode supports over 80,000 CJK characters right now, and work is
underway to encode further additions. The International Standard
ISO/IEC 10646 and the Unicode Standard are completely synchronized in
repertoire and content. And that means that Unicode has the same
repertoire as GB 18030, since that also is synchronized with ISO 10646
— although with a different ordering and byte format.
From: The Unicode Consortium.
My Answer:
Rather than strpos
use mb_stripos
, from the PHP Multibyte string functions to find and replace characters. This should help your script detect and translate the non-latin characters.
If the uploaded file name ($_FILES['var']['name']
) is already incorrect in the PHP script (from output such as print_r($_FILES)
) then you need to ensure you are correctly encoding the HTML form with accept-charset='UTF-8'
(or SJIS, etc.). I would hope you're already well ahead of me on this.
Also it may be advisable to add a few preconditionals at the top of your code, again using the PHP mb_
functions add at the top of your PHP page:
mb_internal_encoding('UTF-8'); //or whatever character set works for you
mb_http_output('SJIS');
mb_http_input('UTF-8');
mb_regex_encoding('UTF-8');
Out of interest:
http://www.unicode.org/reports/tr37/
and
http://david.latapie.name/blog/shift-jis-utf-8/
move_uploaded_file does not support utf8 file name
The target filesystem has to support the encoding as well - this may have nothing to do with uploadify or PHP at all.
Related Topics
How to Prevent the "Confirm Form Resubmission" Dialog
Why Doesn't MySQL Support Millisecond/Microsecond Precision
Best Practice Multi Language Website
PHP Append One Array to Another (Not Array_Push or +)
Check Whether Image Exists on Remote Url
Smtp Connect() Failed. Message Was Not Sent.Mailer Error: Smtp Connect() Failed
Convert String into Slug With Single-Hyphen Delimiters Only
How to Use PHPexcel to Read Data and Insert into Database
PHP Post_Max_Size Overrides Upload_Max_Filesize
When and Why I Should Use Session_Regenerate_Id()
How to Linkify Urls in a String With PHP
PHPmailer - Ssl3_Get_Server_Certificate:Certificate Verify Failed
Create a Folder If It Doesn't Already Exist
PHP How to Get Local Ip of System