Name

ltguessencoding — Try to determine the encoding of a file

Synopsis

ltguessencoding [ filename ]

Description

ltguessencoding attempts to determine the encoding of a file. The name of the file may be given as an argument, otherwise standard input is used. The programs prints one of the following to standard output:

ascii

if the file contains no bytes with values greater than 127.

utf-8

if the file contains bytes greater than 127, and all are in legal UTF-8 sequences. This is quite a reliable indication that the encoding really is UTF-8.

binary

if the file contains null bytes, and the number of bytes greater than 127 is sufficiently high.

windows

if none of the above apply, and there are bytes which would be C1 controls if the encoding were one of the ISO Latin encodings. (Several windows encodings use these values for non-control characters.) No attempt is made to distinguish the various Windows encodings.

iso-latin-1

otherwise. No attempt is made to distinguish the other ISO Latin encodings.

Bugs

The detection algorithm is very simplistic. It is really only useful for distinguishing UTF-8 from ISO Latin-1.