Charsets of files Back

  1. How to check the charset of a specified file?

    REF: https://stackoverflow.com/questions/805418/how-can-i-find-encoding-of-a-file-via-a-script-on-linux

    REF: https://stackoverflow.com/a/34766140/5698182

    • On Linux/UNIX/OS X/cygwin:

        file -i xxx
      

      Or using uchardet - An encoding detector library ported from Mozilla, which can detect some Chinese standard charsets like GB18030, etc. Various Linux distributions (Debian, Ubuntu, openSUSE, Pacman, etc.) provide binaries.

        uchardet xxx
      
    • On Windows:

      The (Linux) command-line tool 'file' is available on Windows via GnuWin32:

      http://gnuwin32.sourceforge.net/packages/file.htm

      If you have git installed, it's located in C:\Program Files\git\usr\bin.

  2. How to convert the charset of a specified file in the Linux?

    REF: https://stackoverflow.com/q/64860/5698182

    • On Linux/UNIX/OS X/cygwin:

      • Gnu iconv suggested by Troels Arvin is best used as a filter. It seems to be universally available. Example:

          $ iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt
        

        As pointed out by Ben, there is an online converter using iconv.

      • Gnu recode (manual) suggested by Cheekysoft will convert one or several files in-place. Example:

          $ recode UTF8..ISO-8859-15 in.txt
        

        This one uses shorter aliases:

          $ recode utf8..l9 in.txt
        

        Recode also supports surfaces which can be used to convert between different line ending types and encodings:

        Convert newlines from LF (Unix) to CR-LF (DOS):

          $ recode ../CR-LF in.txt
        

        Base64 encode file:

          $ recode ../Base64 in.txt
        

        You can also combine them.

        Convert a Base64 encoded UTF8 file with Unix line endings to Base64 encoded Latin 1 file with Dos line endings:

          $ recode utf8/Base64..l1/CR-LF/Base64 file.txt
        
    • On Windows with Powershell (Jay Bazuzi):

      • PS C:\> gc -en utf8 in.txt | Out-File -en ascii out.txt

        (No ISO-8859-15 support though; it says that supported charsets are unicode, utf7, utf8, utf32, ascii, bigendianunicode, default, and oem.)

Empty Comments
Sign in GitHub

As the plugin is integrated with a code management system like GitLab or GitHub, you may have to auth with your account before leaving comments around this article.

Notice: This plugin has used Cookie to store your token with an expiration.