How to Change Encoding in Many Files

How to change encoding in many files?

Try this:

find . -type f -print -exec iconv -f iso8859-2 -t utf-8 -o {}.converted {} \; -exec mv {}.converted {} \;

It will use temp file with '.converted' suffix (extension) and then will move it to original name, so be careful if you have files with '.converted' suffixes (I don't think you have).

Also this script is not safe for filenames containing spaces, so for more safety you should double-quote: "{}" instead of {} and "{}.converted" instead of {}.converted

How to change encoding of all files at once with Jetbrains IDE like WebStorm

Close every project you have opened by clicking File > Close Project.

Once they are all closed, the startup window will show up (The window with the latest opened projects with the options to open new project, etc..) In that window, at the bottom right corner you have settings.

In Settings > Editor > File Encodings.You can set up all the File Encoding options to UTF-8, it will be added to the new projects from now on.

If you want it for old projects, do the same steps, but access the settings once you have the project open.

I hope this help!

How do I correct the character encoding of a file?

EDIT: A simple possibility to eliminate before getting into more complicated solutions: have you tried setting the character set to utf8 in the text editor in which you're reading the file? This could just be a case of somebody sending you a utf8 file that you're reading in an editor set to say cp1252.

Just taking the two examples, this is a case of utf8 being read through the lens of a single-byte encoding, likely one of iso-8859-1, iso-8859-15, or cp1252. If you can post examples of other problem characters, it should be possible to narrow that down more.

As visual inspection of the characters can be misleading, you'll also need to look at the underlying bytes: the § you see on screen might be either 0xa7 or 0xc2a7, and that will determine the kind of character set conversion you have to do.

Can you assume that all of your data has been distorted in exactly the same way - that it's come from the same source and gone through the same sequence of transformations, so that for example there isn't a single é in your text, it's always ç? If so, the problem can be solved with a sequence of character set conversions. If you can be more specific about the environment you're in and the database you're using, somebody here can probably tell you how to perform the appropriate conversion.

Otherwise, if the problem characters are only occurring in some places in your data, you'll have to take it instance by instance, based on assumptions along the lines of "no author intended to put ç in their text, so whenever you see it, replace by ç". The latter option is more risky, firstly because those assumptions about the intentions of the authors might be wrong, secondly because you'll have to spot every problem character yourself, which might be impossible if there's too much text to visually inspect or if it's written in a language or writing system that's foreign to you.

Change text encoding for multiple files at once in Eclipse

I've just solved this problem (eclipse 3.5.2)

Two steps required:

  1. Change text file encoding on the folder property page(.setting/org.eclipse.core.resources.prefs file has been created in my project)
  2. Change default encoding on *.html file content type on the Preference page General/Content Types

Change files' encoding recursively on Windows?

You could easily achieve something like this using Windows PowerShell. If you got the content for a file you could pipe this to the Out-File cmdlet specifying UTF8 as the encoding.

Try something like:

Get-ChildItem *.txt -Recurse | ForEach-Object {
$content = $_ | Get-Content

Set-Content -PassThru $_.Fullname $content -Encoding UTF8 -Force}

Notepad++ convert to UTF-8 multiple files

Got my mistake. My notepad is in german. So take care if it's called "Encoding" or in my case "Kodierung" and "Convert to UTF-8 without BOM" is "Konvertiere zu UTF-8 ohne BOM"

That helped me out!

Change encoding multiple file with iconv in bash

You must put either \; or + at the end of the -exec action.

find . -type f -name '*.php' -print -exec iconv -f euc-kr -t utf-8 {} -o {}.utf8 \;

PhpStorm: Converting folders encoding to another

AFAIK it's not possible to do this for whole folder at a time .. but it can be done for multiple files (e.g. all files in certain folder):

  1. Select desired files in Project View panel
  2. Use File | File Encoding
  3. When asked -- make sure you choose "convert" and not just "read in another encoding".

You can repeat this procedure for each subfolder (still much faster than doing this for each file individually).

Another possible alternative is to use something like iconv (or any other similar tool) and do it in terminal/console.

Change encoding on a per file or per extension basis

An option to handle the encoding of all files of a given extension on a per open basis can be configured in the Options dialog. See MSDN page on Options, Text Editor, File Extension.

Navigate to Tools > Options > Text Editor > File Extension.

For the bat extension, I selected Source Code (Text) Editor with Encoding. The with Encoding part means that the user will be given options as to what encoding to use when opening the file. The default in this mode is Auto-detect, which preserves the ANSI encoding, if that is what the file already uses. Otherwise, one can explicitly designate it for the individual file.

Unfortunately, it doesn't seem to remember the setting last used when opening a file, and will thus prompt for an encoding setting every time a file is opened.

Related Topics

Leave a reply
