how to read file in UTF8 or WEISO ?

Talk about add-ons and extension development.
Post Reply
poleta33
Posts: 120
Joined: October 14th, 2004, 2:06 pm

how to read file in UTF8 or WEISO ?

Post by poleta33 »

Hi

I'd like to know how to read an input file, which may have different encodings ?...

++
Noitidart
Posts: 1168
Joined: September 16th, 2007, 8:01 am

Re: how to read file in UTF8 or WEISO ?

Post by Noitidart »

const {TextDecoder, TextEncoder, OS} = Cu.import('resource://gre/modules/osfile.jsm', {});


var myDecoder = TextDecoder();

myDecoder.decode();


OS.File.read('file path', {encoding:'utf-8'});

etc

I havent ever done non-utf8 so please share how you do use these.
lithopsian
Posts: 3664
Joined: September 15th, 2010, 9:03 am

Re: how to read file in UTF8 or WEISO ?

Post by lithopsian »

Presumably reading the file the old fashioned way with an nsIFile or nsILocalFile? There you just get the raw bytes and no simple way to convert them to anything except plain ASCII. XHR gives you some more options, including ArrayBuffer which can be worked with slightly more easily than a plain string.
Noitidart
Posts: 1168
Joined: September 16th, 2007, 8:01 am

Re: how to read file in UTF8 or WEISO ?

Post by Noitidart »

Heres someone using TextDecoder for utf-16: http://stackoverflow.com/q/31968246/1828637

I tried utf16 here too and it seems to work awesomely: https://github.com/Noitidart/MailtoWebm ... ap.js#L452

I think we need some better docs on all the encodings that are supported by OS.File and TextDecoder/Encoder


I don't know if it works without it, but when writing the file with writeAtomic I prepend somethined called a "BOM" not sure what it is (i got it from the stack topic above) but things are working as expected

https://github.com/Noitidart/MailtoWebm ... ap.js#L552
lithopsian
Posts: 3664
Joined: September 15th, 2010, 9:03 am

Re: how to read file in UTF8 or WEISO ?

Post by lithopsian »

The Byte Order Mark is a short sequence of characters designed to identify the endianness of a file. As a side effect, they also allow a unicode file to be more reliably (still not 100%) identified. The standard recommends not using it, but it does appear to help some applications read files in some encodings.
Noitidart
Posts: 1168
Joined: September 16th, 2007, 8:01 am

Re: how to read file in UTF8 or WEISO ?

Post by Noitidart »

lithopsian wrote:The Byte Order Mark is a short sequence of characters designed to identify the endianness of a file. As a side effect, they also allow a unicode file to be more reliably (still not 100%) identified. The standard recommends not using it, but it does appear to help some applications read files in some encodings.

Thanks litho! The BOM I prepended Im not sure what it relates to, but it works :P
lithopsian
Posts: 3664
Joined: September 15th, 2010, 9:03 am

Re: how to read file in UTF8 or WEISO ?

Post by lithopsian »

The BOM can flag readers that the file contains unicode when it might not otherwise know. Unfortunately, it is a poor solution compared to specifying the correct encoding because the same characters could actually have been a valid part of the document (albeit a slightly unusual set of characters).
Post Reply