How to get system character encoding

Discuss building things with or for the Mozilla Platform.
Post Reply
baconbutty
Posts: 2
Joined: September 24th, 2008, 5:09 am

How to get system character encoding

Post by baconbutty »

Hi

I have been reading this: http://developer.mozilla.org/en/Reading_textual_data.

It says:-

Determining the character encoding of data
...

When reading from a file, the question is harder to answer. Using the system character encoding may work (XXX insert text how to get it), or again the default character encoding from preferences.


Does anyone know how to get your system's default character encoding, particularly if I am using XPCOM through JavaScript.

I have found this:-

The native character encoding is determined using platform specific methods. As of Mozilla 1.7, it is UTF-8 on Mac OS X. On Linux and other UNIX platforms, it is the value returned from nl_langinfo (CODESET), which usually corresponds to the value of the LC_ALL, LC_CTYPE and LANG environment variables (with the precedence the same as the order they're enumerated). On Win32 platforms, it is the currently selected ANSI codepage (specified by CP_ACP).


But I have no idea how to get nl_langinfo or CP_ACP when using JavaScript and XPConnect.

Java has http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/Charset.html#defaultCharset()

Microsoft's Scripting.FileSystemObject does it automatically if you have TristateUseDefault set.

It would be useful if I could ensure consistency when using XPCOM

Thanks

Julian
Torisugari
Posts: 1634
Joined: November 4th, 2002, 8:34 pm
Location: Kyoto, Nippon (GMT +9)
Contact:

Re: How to get system character encoding

Post by Torisugari »

Well, I agree that it's nice to have a scriptable way to get file-system charset. But it's impossible, afaik. However, there are almost always ways to bypass such situation, so can you let me know what your exact problem is? Why you need file system charset?
baconbutty
Posts: 2
Joined: September 24th, 2008, 5:09 am

Re: How to get system character encoding

Post by baconbutty »

Thank you for your reply. I suspected as much.

For my own use it is not really a problem. I have developed an "outliner" (text editor) application using HTML, contentEditable etc, and I need to load and save the "outlines" as text files.

I want to use an 8-bit character encoding (CP1252), as UTF-16 with western european has a lot of NULL bytes.

I currently use Internet Explorer only with the ActiveX component - Scripting.FileSystemObject.

This has 3 options for saving - ASCII, UTF-16, and System Default (which is CP1252 for me).

I wanted to port my application to firefox, and can do so relatively easily, as I know there are charset converters for CP1252 in XPCOM.

I then thought about sharing my application on the web, and thought it would be nice if it could auto-detect the system default charset (as Scripting.FileSystemObject does with its System Default option, and Java's Charset class has a method which detects the system default), to save the user having to specify their preferred charset.

However, I guess this is a minor inconvenience.
Post Reply