PHP - Upload Utf-8 Filename

PHP - Upload utf-8 filename

I'm on Windows 8 chinese version, and I deal with similar problem with this:

$filename = iconv("utf-8", "cp936", $filename);

cp stands for Code page and cp936 stands for Code page 936, which is the default code page of simplified chinese version of Windows.


So I think maybe your problem could be solved in a similar way:

$fn2 = iconv("UTF-8","cp1258", $base_dir.$fn);

I'm not quite sure whether the default code page of your OS is 1258 or not, you should check it yourself by opening command prompt and type in command chcp. Then change 1258 to whatever the command give you.

UPDATE

It seems that PHP filesystem functions can only handle characters that are in system codepage, according to this answer. So you have 2 choices here:

  1. Limit the characters in the filename to system codepage - in your case, it's 437. But I'm pretty sure that code page 437 does not include all the vietnamese characters.

  2. Change your system codepage to the vietnamese one: 1258 and convert the filename to cp1258. Then the filesystem functions should work.

Both choices are deficient:

Choice 1: You can't use vietnamese characters anymore, which is not what you want.

Choice 2: You have to change system code page, and filename characters are limited to code page 1258.

UPDATE

How to change system code page:

Go to Control Panel > Region > Administrative > Change system locale and select Vietnamese(Vietnam) in the drop down menu.

How can i upload a file with utf-8 name in php?

You need to specify the correct character set to iconv from which to convert the string. Something like this:

$fn = iconv("<persian-character-set>", "UTF-8", $file['name']);

You may want to add additional options to the output character set like TRANSLINT and/or IGNORE:

$fn = iconv("<persian-character-set>", "UTF-8//TRANSLIT//IGNORE", $file['name']);

See http://php.net/manual/en/function.iconv.php for details on these options.

UTF-8 characters in uploaded file name are jumbled on file upload

UPDATE
Indeed this is a PHP bug on Windows. There are workarounds like below, but the best solution I have seen is to use the WFIO extension. This extension provides a new protocol wfio:// for file streams and allows PHP to properly handle UTF-8 characters on the Windows file-system. wfio:// supports a number of PHP functions including fopen, scandir, mkdir, copy, rename, etc.

original solution

So this problem is related to a PHP bug on Windows: http://bugs.php.net/bug.php?id=47096

Unicode characters get mangled by PHP on move_upload_file - although I have also seen the issue with rename and ZipArchive so I think it's a general issue with PHP and Windows.

I have adapted a workaround from Wordpress found here. I have to store the file with the mangled file name and then sanitize it on download/email/display.

Here are the adapted methods I'm using in case it's of use to someone in future. This still isn't much use if you're trying to zip files before downloading/emailing or you need to write the files to a network share.

public static function sanitizeFilename($filename, $utf8 = true)
{
if ( self::seems_utf8($filename) == $utf8 )
return $filename;

// On Windows platforms, PHP will mangle non-ASCII characters, see http://bugs.php.net/bug.php?id=47096
if ( 'WIN' == substr( PHP_OS, 0, 3 ) ) {
if(setlocale( LC_CTYPE, 0 )=='C'){ // Locale has not been set and the default is being used, according to answer by Colin Morelli at http://stackoverflow.com/questions/13788415/how-to-retrieve-the-current-windows-codepage-in-php
// thus, we force the locale to be explicitly set to the default system locale
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, '' ), '.' ), '.' );
}
else {
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, 0 ), '.' ), '.' );
}
$charset = 'UTF-8';
if ( function_exists( 'iconv' ) ) {

if ( false == $utf8 ){
$filename = iconv( $charset, $codepage . '//IGNORE', $filename );
}
else {
$filename = iconv( $codepage, $charset, $filename );
}
} elseif ( function_exists( 'mb_convert_encoding' ) ) {
if ( false == $utf8 )
$filename = mb_convert_encoding( $filename, $codepage, $charset );
else
$filename = mb_convert_encoding( $filename, $charset, $codepage );
}
}

return $filename;

}

public static function seems_utf8($str) {
$length = strlen($str);
for ($i=0; $i < $length; $i++) {
$c = ord($str[$i]);
if ($c < 0x80) $n = 0; # 0bbbbbbb
elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;

}

set charset when saving files with php

It sounds like your database doesn't have the correct collation. Make sure the tables/columns are using utf8_general_ci for their collation.

Also extremely important when handling UTF8 is to use the following two MySQL lines for GET requests...

SET time_zone = '+00:00'
SET CHARACTER SET 'utf8'

...and when you have a POST request use the following two...

SET time_zone = '+00:00'
SET NAMES 'utf8'

These will help ensure that UTF8 characters are maintained correctly.

PHP Uploaded file name: Japanese character encoding

To qualify my answer (to the downvoter):

Q: I have heard that UTF-8 does not support some Japanese characters. Is this correct?

A: There is a lot of misinformation floating around about the support
of Chinese, Japanese and Korean (CJK) characters. The Unicode Standard
supports all of the CJK characters from JIS X 0208, JIS X 0212, JIS X
0221, or JIS X 0213, for example, and many more. This is true no
matter which encoding form of Unicode is used: UTF-8, UTF-16, or
UTF-32.

Unicode supports over 80,000 CJK characters right now, and work is
underway to encode further additions. The International Standard
ISO/IEC 10646 and the Unicode Standard are completely synchronized in
repertoire and content. And that means that Unicode has the same
repertoire as GB 18030, since that also is synchronized with ISO 10646
— although with a different ordering and byte format.

From: The Unicode Consortium.

My Answer:

Rather than strpos use mb_stripos, from the PHP Multibyte string functions to find and replace characters. This should help your script detect and translate the non-latin characters.

If the uploaded file name ($_FILES['var']['name']) is already incorrect in the PHP script (from output such as print_r($_FILES)) then you need to ensure you are correctly encoding the HTML form with accept-charset='UTF-8' (or SJIS, etc.). I would hope you're already well ahead of me on this.

Also it may be advisable to add a few preconditionals at the top of your code, again using the PHP mb_ functions add at the top of your PHP page:

mb_internal_encoding('UTF-8'); //or whatever character set works for you
mb_http_output('SJIS');
mb_http_input('UTF-8');
mb_regex_encoding('UTF-8');

Out of interest:

http://www.unicode.org/reports/tr37/

and

http://david.latapie.name/blog/shift-jis-utf-8/

move_uploaded_file does not support utf8 file name

The target filesystem has to support the encoding as well - this may have nothing to do with uploadify or PHP at all.



Related Topics



Leave a reply



Submit