Working with Japanese Filenames in PHP 5.3 and Windows Vista

Working with Japanese filenames in PHP 5.3 and Windows Vista?

You can do it in PHP. Write a small C program to read directories and call that program from PHP.

See also:
http://en.literateprograms.org/Directory_listing_(C,_Windows)
http://www.daniweb.com/forums/thread74944.html
http://forums.devshed.com/c-programming-42/reading-a-directory-in-windows-36169.html

how to iterate over non-English file names in PHP

This is not possible. It's a limitation of PHP. PHP uses the multibyte versions of Windows APIs; you're limited to the characters your codepage can represent.

See this answer.

Directory contents:


D:\Users\Cataphract\Desktop\teste2>dir
Volume in drive D is GRANDEDISCO
Volume Serial Number is 945F-DB89

Directory of D:\Users\Cataphract\Desktop\teste2

01-06-2010 17:16 .
01-06-2010 17:16 ..
01-06-2010 17:15 0 coptic small letter shima follows ϭ.txt
01-06-2010 17:18 86 teste.php
2 File(s) 86 bytes
2 Dir(s) 12.178.505.728 bytes free

Test file contents:

<?php
exec('pause');
foreach (new DirectoryIterator(".") as $v) {
echo $v."\n";
}

Test file results:


.
..
coptic small letter shima follows ?.txt
teste.php

Debugger output:

Call stack (PHP 5.3.0):


> php5ts_debug.dll!readdir_r(DIR * dp=0x02f94068, dirent * entry=0x00a7e7cc, dirent * * result=0x00a7e7c0) Line 80 C
php5ts_debug.dll!php_plain_files_dirstream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int count=260, void * * * tsrm_ls=0x028a15c0) Line 820 + 0x17 bytes C
php5ts_debug.dll!_php_stream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int size=260, void * * * tsrm_ls=0x028a15c0) Line 603 + 0x1c bytes C
php5ts_debug.dll!_php_stream_readdir(_php_stream * dirstream=0x02b94280, _php_stream_dirent * ent=0x02b9437c, void * * * tsrm_ls=0x028a15c0) Line 1806 + 0x16 bytes C
php5ts_debug.dll!spl_filesystem_dir_read(_spl_filesystem_object * intern=0x02b94340, void * * * tsrm_ls=0x028a15c0) Line 199 + 0x20 bytes C
php5ts_debug.dll!spl_filesystem_dir_open(_spl_filesystem_object * intern=0x02b94340, char * path=0x02b957f0, void * * * tsrm_ls=0x028a15c0) Line 238 + 0xd bytes C
php5ts_debug.dll!spl_filesystem_object_construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0, long ctor_flags=0) Line 645 + 0x11 bytes C
php5ts_debug.dll!zim_spl_DirectoryIterator___construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0) Line 658 + 0x1f bytes C
php5ts_debug.dll!zend_do_fcall_common_helper_SPEC(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0) Line 313 + 0x78 bytes C
php5ts_debug.dll!ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0) Line 423 C
php5ts_debug.dll!execute(_zend_op_array * op_array=0x02b93888, void * * * tsrm_ls=0x028a15c0) Line 104 + 0x11 bytes C
php5ts_debug.dll!zend_execute_scripts(int type=8, void * * * tsrm_ls=0x028a15c0, _zval_struct * * retval=0x00000000, int file_count=3, ...) Line 1188 + 0x21 bytes C
php5ts_debug.dll!php_execute_script(_zend_file_handle * primary_file=0x00a7fad4, void * * * tsrm_ls=0x028a15c0) Line 2196 + 0x1b bytes C
php.exe!main(int argc=2, char * * argv=0x028a14c0) Line 1188 + 0x13 bytes C
php.exe!__tmainCRTStartup() Line 555 + 0x19 bytes C
php.exe!mainCRTStartup() Line 371 C

Is it really a question mark?


dp->fileinfo
{dwFileAttributes=32 ftCreationTime={...} ftLastAccessTime={...} ...}
dwFileAttributes: 32
ftCreationTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
ftLastAccessTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
ftLastWriteTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
nFileSizeHigh: 0
nFileSizeLow: 0
dwReserved0: 3435973836
dwReserved1: 3435973836
cFileName: 0x02f9409c "coptic small letter shima follows ?.txt"
cAlternateFileName: 0x02f941a0 "COPTIC~1.TXT"
dp->fileinfo.cFileName[34]
63 '?'

Yes! It's character #63.

UTF-8, PHP, Win7 - Is there a solution now to save UTF-8-filenames on Win 7 using php?

PHP starting with 7.1.0alpha2 supports UTF-8 filenames on Windows.

Thanks.

PHP mb_strpos fails for Greek strings

The graphemes may render the same or similar but they are not represented the same way. For example:

  • ά is represented here as Unicode Character 'GREEK SMALL LETTER ALPHA WITH TONOS' (U+03AC)
  • ά is represented here as Unicode Character 'GREEK SMALL LETTER ALPHA' (U+03B1) followed by Unicode Character 'COMBINING ACUTE ACCENT' (U+0301)

These were copied directly from your comment.


In order to compare them you should first use normalizer_normalize() on both strings to obtain them in their normalized forms. Which type of normalization form to use is ultimately up to you. There are four:

  1. NFD (Canonical Decomposition)
  2. NFC (Canonical Decomposition, followed by Canonical Composition)
  3. NFKD (Compatibility Decomposition)
  4. NFKC (Compatibility Decomposition, followed by Canonical Composition)

Because this normalization is being used completely internally just ignore NFC and NFKC, there's no need to recompose. This leaves you with the option of either NFD or NFKD - canonical or compatible. The names give you a bit of a clue on how strict they are regarding equivalence.


1.1 Canonical and Compatibility Equivalence:

Canonical equivalence is a fundamental equivalency between characters or sequences of characters that represent the same abstract character, and when correctly displayed should always have the same visual appearance and behavior.

Compatibility equivalence is a weaker equivalence between characters or sequences of characters that represent the same abstract character, but may have a different visual appearance or behavior.


For searching I would go with the latter.

Example:

$foo = "παράρτημα";
$bar = "παράρτημα";
var_dump($foo === $bar);
var_dump(
normalizer_normalize($foo, Normalizer::FORM_KD) ===
normalizer_normalize($bar, Normalizer::FORM_KD)
);

Output:

bool(false)
bool(true)


Related Topics



Leave a reply



Submit