Best way to convert text files between character sets?

What is the fastest, easiest tool or method to convert text files between character sets?

Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa.

Everything goes: one-liners in your favorite scripting language, command-line tools or other utilities for OS, web sites, etc.

Best solutions so far:

On Linux/UNIX/OS X/cygwin:

  • Gnu iconv suggested by Troels Arvin is best used as a filter . It seems to be universally available. Example:

    $ iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt
    

    As pointed out by Ben, there is an online converter using iconv.

  • Gnu recode (manual) suggested by Cheekysoft will convert one or several files in-place . Example:

    $ recode UTF8..ISO-8859-15 in.txt
    

    This one uses shorter aliases:

    $ recode utf8..l9 in.txt
    

    Recode also supports surfaces which can be used to convert between different line ending types and encodings:

    Convert newlines from LF (Unix) to CR-LF (DOS):

    $ recode ../CR-LF in.txt
    

    Base64 encode file:

    $ recode ../Base64 in.txt
    

    You can also combine them.

    Convert a Base64 encoded UTF8 file with Unix line endings to Base64 encoded Latin 1 file with Dos line endings:

    $ recode utf8/Base64..l1/CR-LF/Base64 file.txt
    
  • On Windows with Powershell (Jay Bazuzi):

  • PS C:> gc -en utf8 in.txt | Out-File -en ascii out.txt

    (No ISO-8859-15 support though; it says that supported charsets are unicode, utf7, utf8, utf32, ascii, bigendianunicode, default, and oem.)

  • Edit

    Do you mean iso-8859-1 support? Using "String" does this eg for vice versa

    gc -en string in.txt | Out-File -en utf8 out.txt
    

    Note: The possible enumeration values are "Unknown, String, Unicode, Byte, BigEndianUnicode, UTF8, UTF7, Ascii".

  • CsCvt - Kalytta's Character Set Converter is another great command line based conversion tool for Windows.

  • 独立的实用程序方法

    iconv -f UTF-8 -t ISO-8859-1 in.txt > out.txt
    
    -f ENCODING  the encoding of the input
    -t ENCODING  the encoding of the output
    

    Try VIM

    If you have vim you can use this:

    Not tested for every encoding.

    The cool part about this is that you don't have to know the source encoding

    vim +"set nobomb | set fenc=utf8 | x" filename.txt
    

    Be aware that this command modify directly the file


    Explanation part!

  • + : Used by vim to directly enter command when opening a file. Usualy used to open a file at a specific line: vim +14 file.txt
  • | : Separator of multiple commands (like ; in bash)
  • set nobomb : no utf-8 BOM
  • set fenc=utf8 : Set new encoding to utf-8 doc link
  • x : Save and close file
  • filename.txt : path to the file
  • " : qotes are here because of pipes. (otherwise bash will use them as bash pipe)

  • Under Linux you can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues. recode -l will show you all of the formats and encodings that the tool can convert between. It is likely to be a VERY long list.

    链接地址: http://www.djcxy.com/p/25122.html

    上一篇: 如何将默认编码更改为UTF

    下一篇: 在字符集之间转换文本文件的最佳方法?