PHP Readdir Problem with Japanese Language File Name

Working with Japanese filenames in PHP 5.3 and Windows Vista?

You can do it in PHP. Write a small C program to read directories and call that program from PHP.

See also:
http://en.literateprograms.org/Directory_listing_(C,_Windows)
http://www.daniweb.com/forums/thread74944.html
http://forums.devshed.com/c-programming-42/reading-a-directory-in-windows-36169.html

PHP - Read non-latin character dir/file name

Actually my previous answer was not right. The problem is that PHP5 does not support UTF-8 for file operations.

A work around would be to use something like WFIO, which exposes it's own protocol for file streams and allows PHP to handle UTF-8 characters in file operations. You can see in the README that the syntax would be:

scandir("wfio://directory")

Good luck!

PHP readdir with european characters

According to the 0xFFFD marks (which appears in Firefox as diamonds with a question mark inside) you already aren't reading them using the correct encoding (which would be Unicode / UTF-8). As far I found this bug, it seems to be related.

Here's another SO topic about that: php readdir problem with japanese language file name

To the point, wait until they get PHP6 stable and then use it.

Unrelated to the problem: the Normalizer is a better tool to get rid of diacritical marks.

PHP - Windows - filename incorrect after upload (ü saved as ü etc.)

In the end I solved it with the following approach:

  1. When uploading the files I urlencode the names with rawurlencode()
  2. When fetching the files from server they are obviously URL encoded so I use urldecode($filename) to print correct names
  3. Links in a href are automatically translated, so for example "%20" becomes a " " and URL ends up being incorrect since it links to incorrect filename. I decided to encode them back and print them ending up with something like this: print $dirReceived.rawurlencode($file); ($dirReceived is the directory where received files are stored, defined earlier in the code)
  4. I also added download attribute with urldecode($filename) to save the file with UTF-8 name when needed.

Thanks to this I have files saved on the server with url encoded names. Can open them in browser (very important as most of them are *.pdf) and can download them with correct name which lets me upload and download even files with names written in Arabic, Cyrillic, etc.

So far I tested it and looks good. I am thinking of implementing it in production code. Any concerns/thoughts on it?

EDIT.

Since there are no objections I select my answer as the one that solved my problem. After doing some testing everything looks good on client and server side. When saving the files on server they are URL encoded, when downloading them they are decoded and saved with correct names.

At the beginning I was using the code:

    for($i=0;$i<count($_FILES['file']['name']);$i++) 
{
move_uploaded_file($_FILES['file']['tmp_name'][$i],
"../filepath/" . $_FILES['file']['name'][$i]);
}

This method caused the problem upon saving file and replaced every UTF-8 special character with cp1252 encoded one (ü saved as ü etc.), so I added one line and replaced that code with the following:

for($i=0;$i<count($_FILES['file']['name']);$i++) 
{
$fname= rawurlencode($_FILES['file']['name'][$i]);
move_uploaded_file($_FILES['file']['tmp_name'][$i],
"../filepath/" . $fname);
}

This allows me to save any filename on server using URL encoding (% and two hexadecimals) which is compatible with both cp1252 and UTF-8.

To list the saved files I use filepaths I have saved in DB and list them for files. I was using the following code:

    if (is_dir($dir)){
if ($dh = opendir($dir)){
while (($file = readdir($dh)) !== false){
if(is_file($dir . $file)){

echo "<li><a href='".$dir.$file."' download='".$file ."'>".$file."</a></li><br />";

}
}
closedir($dh);
}
}

Since URL encoded filenames were decoded automatically I changed it to:

    if (is_dir($dir)){
if ($dh = opendir($dir)){
while (($file = readdir($dh)) !== false){
if(is_file($dir . $file)){
echo "<li><a href='";
print $dir.rawurlencode($file);
echo "' download='" . urldecode($file) ."'>".urldecode($file)."</a></li><br />";
}
}
closedir($dh);
}
}

I don't know if this is the best way to solve it but works perfectly, also I am aware that it is generally a good practice not to use php to generate html tags but at the moment I have some critical bugs that need addressing so first that and then I'll have to work on the appearance of the code itself.

EDIT2

Also the great thing is I do not have to change names of the already uploaded files which in my case is a big advantage.

how to iterate over non-English file names in PHP

This is not possible. It's a limitation of PHP. PHP uses the multibyte versions of Windows APIs; you're limited to the characters your codepage can represent.

See this answer.

Directory contents:


D:\Users\Cataphract\Desktop\teste2>dir
Volume in drive D is GRANDEDISCO
Volume Serial Number is 945F-DB89

Directory of D:\Users\Cataphract\Desktop\teste2

01-06-2010 17:16 .
01-06-2010 17:16 ..
01-06-2010 17:15 0 coptic small letter shima follows ϭ.txt
01-06-2010 17:18 86 teste.php
2 File(s) 86 bytes
2 Dir(s) 12.178.505.728 bytes free

Test file contents:

<?php
exec('pause');
foreach (new DirectoryIterator(".") as $v) {
echo $v."\n";
}

Test file results:


.
..
coptic small letter shima follows ?.txt
teste.php

Debugger output:

Call stack (PHP 5.3.0):


> php5ts_debug.dll!readdir_r(DIR * dp=0x02f94068, dirent * entry=0x00a7e7cc, dirent * * result=0x00a7e7c0) Line 80 C
php5ts_debug.dll!php_plain_files_dirstream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int count=260, void * * * tsrm_ls=0x028a15c0) Line 820 + 0x17 bytes C
php5ts_debug.dll!_php_stream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int size=260, void * * * tsrm_ls=0x028a15c0) Line 603 + 0x1c bytes C
php5ts_debug.dll!_php_stream_readdir(_php_stream * dirstream=0x02b94280, _php_stream_dirent * ent=0x02b9437c, void * * * tsrm_ls=0x028a15c0) Line 1806 + 0x16 bytes C
php5ts_debug.dll!spl_filesystem_dir_read(_spl_filesystem_object * intern=0x02b94340, void * * * tsrm_ls=0x028a15c0) Line 199 + 0x20 bytes C
php5ts_debug.dll!spl_filesystem_dir_open(_spl_filesystem_object * intern=0x02b94340, char * path=0x02b957f0, void * * * tsrm_ls=0x028a15c0) Line 238 + 0xd bytes C
php5ts_debug.dll!spl_filesystem_object_construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0, long ctor_flags=0) Line 645 + 0x11 bytes C
php5ts_debug.dll!zim_spl_DirectoryIterator___construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0) Line 658 + 0x1f bytes C
php5ts_debug.dll!zend_do_fcall_common_helper_SPEC(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0) Line 313 + 0x78 bytes C
php5ts_debug.dll!ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0) Line 423 C
php5ts_debug.dll!execute(_zend_op_array * op_array=0x02b93888, void * * * tsrm_ls=0x028a15c0) Line 104 + 0x11 bytes C
php5ts_debug.dll!zend_execute_scripts(int type=8, void * * * tsrm_ls=0x028a15c0, _zval_struct * * retval=0x00000000, int file_count=3, ...) Line 1188 + 0x21 bytes C
php5ts_debug.dll!php_execute_script(_zend_file_handle * primary_file=0x00a7fad4, void * * * tsrm_ls=0x028a15c0) Line 2196 + 0x1b bytes C
php.exe!main(int argc=2, char * * argv=0x028a14c0) Line 1188 + 0x13 bytes C
php.exe!__tmainCRTStartup() Line 555 + 0x19 bytes C
php.exe!mainCRTStartup() Line 371 C

Is it really a question mark?


dp->fileinfo
{dwFileAttributes=32 ftCreationTime={...} ftLastAccessTime={...} ...}
dwFileAttributes: 32
ftCreationTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
ftLastAccessTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
ftLastWriteTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
nFileSizeHigh: 0
nFileSizeLow: 0
dwReserved0: 3435973836
dwReserved1: 3435973836
cFileName: 0x02f9409c "coptic small letter shima follows ?.txt"
cAlternateFileName: 0x02f941a0 "COPTIC~1.TXT"
dp->fileinfo.cFileName[34]
63 '?'

Yes! It's character #63.

How to open file in PHP that has unicode characters in its name?

These are conclusions so far:

  1. PHP 5 can not open filename with unicode characters unless the source filename is unicode.
  2. PHP 5 (at least on windows XP) is not able to process PHP source in unicode.

Thus the conclusion this not doable in PHP 5.



Related Topics



Leave a reply



Submit