I have been reading this: http://developer.mozilla.org/en/Reading_textual_data.
Determining the character encoding of data
When reading from a file, the question is harder to answer. Using the system character encoding may work (XXX insert text how to get it), or again the default character encoding from preferences.
I have found this:-
The native character encoding is determined using platform specific methods. As of Mozilla 1.7, it is UTF-8 on Mac OS X. On Linux and other UNIX platforms, it is the value returned from nl_langinfo (CODESET), which usually corresponds to the value of the LC_ALL, LC_CTYPE and LANG environment variables (with the precedence the same as the order they're enumerated). On Win32 platforms, it is the currently selected ANSI codepage (specified by CP_ACP).
Java has http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/Charset.html#defaultCharset()
Microsoft's Scripting.FileSystemObject does it automatically if you have TristateUseDefault set.
It would be useful if I could ensure consistency when using XPCOM