Best way to convert text files between character sets?

2018-06-08 06:53:54

What is the fastest, easiest tool or method to convert text files between character sets?

Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa.

Everything goes: one-liners in your favorite scripting language, command-line tools or other utilities for OS, web sites, etc.

Best solutions so far:

On Linux/UNIX/OS X/cygwin:

Gnu iconv suggested by Troels Arvin is best used as a filter . It seems to be universally available. Example:

$ iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt

As pointed out by Ben, there is an online converter using iconv.

Gnu recode (manual) suggested by Cheekysoft will convert one or several files in-place . Example:

$ recode UTF8..ISO-8859-15 in.txt

This one uses shorter aliases:

$ recode utf8..l9 in.txt

Recode also supports surfaces which can be used to convert between different line ending types and encodings:

Convert newlines from LF (Unix) to CR-LF (DOS):

$ recode ../CR-LF in.txt

Base64 encode file:

$ recode ../Base64 in.txt

You can also combine them.

Convert a Base64 encoded UTF8 file with Unix line endings to Base64 encoded Latin 1 file with Dos line endings:

$ recode utf8/Base64..l1/CR-LF/Base64 file.txt

On Windows with Powershell (Jay Bazuzi):

PS C:> gc -en utf8 in.txt | Out-File -en ascii out.txt

(No ISO-8859-15 support though; it says that supported charsets are unicode, utf7, utf8, utf32, ascii, bigendianunicode, default, and oem.)

Edit

Do you mean iso-8859-1 support? Using "String" does this eg for vice versa

gc -en string in.txt | Out-File -en utf8 out.txt

Note: The possible enumeration values are "Unknown, String, Unicode, Byte, BigEndianUnicode, UTF8, UTF7, Ascii".

CsCvt - Kalytta's Character Set Converter is another great command line based conversion tool for Windows.

独立的实用程序方法

iconv -f UTF-8 -t ISO-8859-1 in.txt > out.txt

-f ENCODING  the encoding of the input
-t ENCODING  the encoding of the output

Try VIM

If you have vim you can use this:

Not tested for every encoding.

The cool part about this is that you don't have to know the source encoding

vim +"set nobomb | set fenc=utf8 | x" filename.txt

Be aware that this command modify directly the file

Explanation part!

+ : Used by vim to directly enter command when opening a file. Usualy used to open a file at a specific line: vim +14 file.txt

| : Separator of multiple commands (like ; in bash)

set nobomb : no utf-8 BOM

set fenc=utf8 : Set new encoding to utf-8 doc link

x : Save and close file

filename.txt : path to the file

" : qotes are here because of pipes. (otherwise bash will use them as bash pipe)

Under Linux you can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues. recode -l will show you all of the formats and encodings that the tool can convert between. It is likely to be a VERY long list.

链接地址: http://www.djcxy.com/p/25122.html

上一篇: 如何将默认编码更改为UTF

下一篇: 在字符集之间转换文本文件的最佳方法？