Convert Charset from a Entire Project to Utf-8

Convert charset from a entire project to utf-8

In your project's root directory, use find(1) to list all *.php files and combine that with recode(1) to convert those files in place:

find . -type f -name '*.php' -exec recode windows1252..utf8 \{} \;

As an alternative to recode(1), you could also use iconv(1) to do the conversion (for usage with above find command: iconv -f windows-1252 -t utf-8 -o \{} \{}).

You need to have either recode or iconv installed for the above to work. Both should be easily installable via a package manager on most modern systems.

Save all files in Visual Studio project as UTF-8

Since you're already in Visual Studio, why not just simply write the code?

foreach (var f in new DirectoryInfo(@"...").GetFiles("*.cs", SearchOption.AllDirectories)) {
string s = File.ReadAllText(f.FullName);
File.WriteAllText (f.FullName, s, Encoding.UTF8);
}

Only three lines of code! I'm sure you can write this in less than a minute :-)

Encoding for project set to UTF-8, default charset returns windows-1252

I've spend few hours trying to find the best solution.

First of all this is an issue of maven which picks up platform encoding and uses it even though you've specified different encoding to be used. Maven doesn't seem to care (it even prints to console that it's using UTF-8 but when you run a file with the code above, it won't display properly).

I've managed to tackle this issue by setting a system variable:

JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

There should be another option instead of setting system variables and that is to set it as additional compiler parameter.

like javac -Dfile.encoding=UTF8

Convert any encoding to UTF 8 in Go

I'm using the go-charset project to do this: https://code.google.com/p/go-charset/

It's pretty straightforward, you create a reader from a charset and it translates to utf-8 automatically. example from the library:

r, err := charset.NewReader(strings.NewReader("\xa35 for Pepp\xe9"), "latin1")
if err != nil {
log.Fatal(err)
}
result, err := ioutil.ReadAll(r)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s\n", result) //outputs £5 for Peppé

Now, in my case I know the charset because it comes from web pages and I read the headers/meta tags. If you need to detect the charset automatically by heuristics, you'll need another library for that, such as this one: https://github.com/saintfish/chardet

I haven't used it but it also looks pretty simple to use:

detector := chardet.NewTextDetector()
result, err := detector.DetectBest(some_text)
if err == nil {
fmt.Printf(
"Detected charset is %s, language is %s",
result.Charset,
result.Language)
}

convert all C# files to utf-8 encoding

The only files you need to encode are your ASPX files. If you're using master pages in your project, just specify the encoding there, and it will carry through to all pages using that master page.

Project conversion from ISO 8859-1 to UTF-8

You should try using the shell command iconv to encode the php files from latin1 (ISO-8859-1) to UTF-8.

After that you should be sure that PHP uses UTF-8 as the default encoding (default_encoding variable in php.ini if I recall correctly). If not, then you can set it with ini_set() for your project.

After that you should convert your database to UTF-8 or use a quickfix like this (for MySQL):

mysql_query("SET NAMES 'utf8'");

Of course you just substitute mysql_query() for whatever framework you use (if you use any).
Put it into your primary file which includes all the classes and stuff.

How to convert an entire MySQL database characterset and collation to UTF-8?

Use the ALTER DATABASE and ALTER TABLE commands.

ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Or if you're still on MySQL 5.5.2 or older which didn't support 4-byte UTF-8, use utf8 instead of utf8mb4:

ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Convert all *.cs files to unicode in VisualStudio

Yes, it's possible.

Force UTF-8 on all files

Use .editorconfig as @Richard previously mentioned. Starting from Visual Studio v15.3, .editorconfig support was fixed and improved. This simple .editorconfig at the solution level would be enough to ensure each *.cs is saved in UTF-8 without BOM:

root = true

[*.cs]
charset = utf-8

Moreover, it converts any existing file manually opened and saved by Visual Studio.

Convert all existing code files to UTF-8

I tested some answers from the thread Save all files in Visual Studio project as UTF-8 and they worked badly: non-Latin characters (Cyrillic in my case) had been converted into unreadable glyphs. On the contrary, Visual Studio itself does the "open-save" conversion flawlessly.

To automatically open and re-save all code files in a solution, use a simple R# trick:

  1. Set any R# code style rule appllicable to all your files to the value which strictly denies your company's code conventions. For example, braces layout is an obvious choice.
  2. Apply it to the whole solution using a Code Cleanup feature (Ctrl+E,C by default). Choose a simplest built-in "Reformat Code" template to minimize changes.
  3. After all files have been formatted and saved, revert the R# rules back to their originals and run Code Cleanup once again.

All your *.cs files should be saved in UTF-8 after that (the same idea for another file types supported by R#). Pretty formatting as a bonus.



Related Topics



Leave a reply



Submit