Hi
I'd like to know how to read an input file, which may have different encodings ?...
++
how to read file in UTF8 or WEISO ?
-
- Posts: 1168
- Joined: September 16th, 2007, 8:01 am
Re: how to read file in UTF8 or WEISO ?
const {TextDecoder, TextEncoder, OS} = Cu.import('resource://gre/modules/osfile.jsm', {});
var myDecoder = TextDecoder();
myDecoder.decode();
OS.File.read('file path', {encoding:'utf-8'});
etc
I havent ever done non-utf8 so please share how you do use these.
var myDecoder = TextDecoder();
myDecoder.decode();
OS.File.read('file path', {encoding:'utf-8'});
etc
I havent ever done non-utf8 so please share how you do use these.
-
- Posts: 3664
- Joined: September 15th, 2010, 9:03 am
Re: how to read file in UTF8 or WEISO ?
Presumably reading the file the old fashioned way with an nsIFile or nsILocalFile? There you just get the raw bytes and no simple way to convert them to anything except plain ASCII. XHR gives you some more options, including ArrayBuffer which can be worked with slightly more easily than a plain string.
-
- Posts: 1168
- Joined: September 16th, 2007, 8:01 am
Re: how to read file in UTF8 or WEISO ?
Heres someone using TextDecoder for utf-16: http://stackoverflow.com/q/31968246/1828637
I tried utf16 here too and it seems to work awesomely: https://github.com/Noitidart/MailtoWebm ... ap.js#L452
I think we need some better docs on all the encodings that are supported by OS.File and TextDecoder/Encoder
I don't know if it works without it, but when writing the file with writeAtomic I prepend somethined called a "BOM" not sure what it is (i got it from the stack topic above) but things are working as expected
https://github.com/Noitidart/MailtoWebm ... ap.js#L552
I tried utf16 here too and it seems to work awesomely: https://github.com/Noitidart/MailtoWebm ... ap.js#L452
I think we need some better docs on all the encodings that are supported by OS.File and TextDecoder/Encoder
I don't know if it works without it, but when writing the file with writeAtomic I prepend somethined called a "BOM" not sure what it is (i got it from the stack topic above) but things are working as expected
https://github.com/Noitidart/MailtoWebm ... ap.js#L552
-
- Posts: 3664
- Joined: September 15th, 2010, 9:03 am
Re: how to read file in UTF8 or WEISO ?
The Byte Order Mark is a short sequence of characters designed to identify the endianness of a file. As a side effect, they also allow a unicode file to be more reliably (still not 100%) identified. The standard recommends not using it, but it does appear to help some applications read files in some encodings.
-
- Posts: 1168
- Joined: September 16th, 2007, 8:01 am
Re: how to read file in UTF8 or WEISO ?
lithopsian wrote:The Byte Order Mark is a short sequence of characters designed to identify the endianness of a file. As a side effect, they also allow a unicode file to be more reliably (still not 100%) identified. The standard recommends not using it, but it does appear to help some applications read files in some encodings.
Thanks litho! The BOM I prepended Im not sure what it relates to, but it works
-
- Posts: 3664
- Joined: September 15th, 2010, 9:03 am
Re: how to read file in UTF8 or WEISO ?
The BOM can flag readers that the file contains unicode when it might not otherwise know. Unfortunately, it is a poor solution compared to specifying the correct encoding because the same characters could actually have been a valid part of the document (albeit a slightly unusual set of characters).