Elegant way to search for UTF-8 files with BOM?
What about this one simple command which not just finds but clears the nasty BOM? :)
find . -type f -exec sed '1s/^\xEF\xBB\xBF//' -i {} \;
I love "find" :)
Warning The above will modify binary files which contain those three characters.
If you want just to show BOM files, use this one:
grep -rl $'\xEF\xBB\xBF' .
How do I remove  from the beginning of a file?
Three words for you:
Byte Order Mark (BOM)
That's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.
To automatize the BOM's removal you can use awk
as shown in this question.
As another answer says, the best would be for PHP to actually interpret the BOM correctly, for that you can use mb_internal_encoding()
, like this:
<?php
//Storing the previous encoding in case you have some other piece
//of code sensitive to encoding and counting on the default value.
$previous_encoding = mb_internal_encoding();
//Set the encoding to UTF-8, so when reading files it ignores the BOM
mb_internal_encoding('UTF-8');
//Process the CSS files...
//Finally, return to the previous encoding
mb_internal_encoding($previous_encoding);
//Rest of the code...
?>
How to avoid tripping over UTF-8 BOM when reading files
With ruby 1.9.2 you can use the mode r:bom|utf-8
text_without_bom = nil #define the variable outside the block to keep the data
File.open('file.txt', "r:bom|utf-8"){|file|
text_without_bom = file.read
}
or
text_without_bom = File.read('file.txt', encoding: 'bom|utf-8')
or
text_without_bom = File.read('file.txt', mode: 'r:bom|utf-8')
It doesn't matter, if the BOM is available in the file or not.
You may also use the encoding option with other commands:
text_without_bom = File.readlines(@filename, "r:utf-8")
(You get an array with all lines).
Or with CSV:
require 'csv'
CSV.open(@filename, 'r:bom|utf-8'){|csv|
csv.each{ |row| p row }
}
What makes a file UTF-8?
Text is UTF-8 because it's valid as UTF-8 and the author decides it is.
How that decision by the author is communicated to the consumer is a different question, which involves convention, guessing, and various schemes for in-band- or out-of-band-signalling, like HTTP or HTML charset, BOM (which enhances guessing), some envelope / embedding Format, additional data-streams, file-naming, and many more.
UTF-8 BOM added to downloaded file
You should check for a BOM in the script file too. Usually if your IDE saves UTF-8 files with BOM, it is before the opening <?php
tag, so php treats as output.
The files format which CodeSmith generated are UTF-8 with BOM, how to change it to UTF-8 without BOM?
There are two properties you can set to control this property on the Code Template directive Encoding
and ResponseEncoding
attributes will control how the template is rendered and saved.
https://codesmith.atlassian.net/wiki/display/Generator/The+CodeTemplate+Directive
Related Topics
List All Files in One Directory PHP
Convert a Comma-delimited String into Array of Integers
PHP Get All Subdirectories of a Given Directory
Blocking Comment Spam Without Using Captcha
How to Install MySQLi on Macos
Multiple Files Upload in Codeigniter
What's the Net::Err_Http2_Protocol_Error About
Facebook Graph API, How to Get Users Email
How to Run a PHP File in a Scheduled Task (Windows Task Scheduler)
PHP Sort Array by Two Field Values
Need to Write At Beginning of File With PHP
PHP Json_Encode Class Private Members
Difference Between a Language Construct and a "Built-In" Function in PHP
How to PHP-Unserialize a Jquery-Serialized Form
Implode an Array With ", " and Add "And " Before the Last Item