[oclug] unicode? file

Adrian Irving-Beer wisq-oclug at wisq.net
Tue Feb 22 10:43:00 EST 2005


On Tue, Feb 22, 2005 at 10:31:21AM -0500, Stephen M. Webb wrote:

> UTF-16 is an encoding scheme that would allow Unicode text to be
> represented in a sequence of 16-bit values
[...]
> The BE is 'big-endian.'

Ah!  Then that explains why iconv and vim were disagreeing when I said
'utf16' alone.  Vim assumes big-endian, iconv assumes little-endian.

> UCS-2 is another encoding scheme that represents a subset of Unicode
> as a stream of 16-bit values and is used by Microsoft software.  I
> believe that all UCS-2 strings are a proper subset of UTF-16, just
> as ASCII strings are a proper subset of UTF-8, but I could be wrong.

No, you're correct, according to vim:

	ucs-2       16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1)
	ucs-2le     like ucs-2, little endian
	utf-16      ucs-2 extended with double-words for more characters
	utf-16le    like utf-16, little endian

And I was wrong about it being an alias for ucs2... that's just what
it autodetects some files as.  Presumably ones that lack UTF-16
extensions.

> Anyways, using iconv would be a good solution assuming the utf16be
> locales are installed on the poster's system.

I think this may be separate from locales.  I have massive conversion
capabilities, but I only have UTF-8, ISO8859-1, and EUC-JP locales
installed.

Libraries for conversion are glibc-installed and are in /usr/lib/gconv
on my system.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://tux.oclug.on.ca/pipermail/oclug/attachments/20050222/b11170a0/attachment.bin


More information about the OCLUG mailing list