Fonts for different languages

Discussion of general topics about Seamonkey
Post Reply
User avatar
Pim
Posts: 2215
Joined: May 17th, 2004, 2:04 pm
Location: Netherlands

Fonts for different languages

Post by Pim »

Can somebody explain, or is there a resource somewhere you can point me to, about the different fonts for different languages? I mean in the Edit → Preferences → Appearance → Fonts dialog.

For example, how does the browser decide which font to use? Sometimes, a HTML file that has a language defined (with, say, lang="ja") is displayed in the expected font, but not always. And why does this depend on the charset?

Also, why do Firefox and SeaMonkey have different languages to choose from? Firefox has "Kannada", "Oriya", "Sinhala" and "Tibetan" that SeaMonkey doesn't have, but SeaMonkey has an extra "User Defined". By the way, I haven't found out how to trigger that last one; unknown languages always trigger the font for "Unicode". So am I missing something?
Then there's the Yoruba language. SeaMonkey displays texts in this language with the font for "Western", but Firefox uses the font for "Other Languages". No matter if I include lang="yo" or not. Again, am I missing something?

And HTML files without language identifiers always display in the "Western" font, while emails without language identifiers display in the "Unicode" font. Is that by design (and if so, why?), or is it a bug?
Groetjes, Pim
mgagnonlv
Posts: 848
Joined: February 12th, 2005, 8:33 pm

Re: Fonts for different languages

Post by mgagnonlv »

I have a partial answer for that.

Fonts and language-related fonts are a remnant of the old days and something that should not be relied upon in web page design. In other words, a well-designed web page would tell you that it is using "lang=en-US" or lang="fr-FR" and "charset=UTF-8" or ISO-8859-1 (for Western Europe). It probably should give you a few choices of fonts (that's the current trend), although some purists would say it should not be done.

If the web page you are loading correctly advertises its language and character set, then that information will be used to display the page. Language information is an essential information for screen readers. Try to pronounce the words on this page as if it were French or Spanish phonetics and you'll see what I mean. And if the page provides you with font information, the suggested fonts will be used – in that order. For example, this site asks for "Verdana,Helvetica,Arial,sans-serif". If you have Verdana, or Helvetica, or Arial, then they will be used (Verdana being the preferred one here). But if your system – say on Linux – has neither of them installed, then your default sans-serif will be used.

But what happens if you load a web page that has only partial information?
The browser – sometimes with help of Google (via the stylesheet, dictionary and assumed location) – will guess what language it should be. If nothing can be decided, the the default "Character encoding" you have defined in the Display menu will be used.
Likewise, the default serif and sans-serif fonts for the language will be chosen from what you have defined under Options –> Contents –> Fonts and Colours.

What are the ideal settings?
For web browsers, character encoding progressed from mostly ISO-8859-1 about 10-12 years ago to mostly UTF-8. ISO-8859-x was fairly language-specific, so there were 16 different variants for Western Europe (+ North and South America), Russia, Greece, etc. UTF-8 can be used for all European and American languages and, I think Arabic and Asiatic languages.
In a nutshell:
– If you use English only, set default encoding to whatever you fancy; it won't make a difference.
– If you use one of the languages used in Europe, North or South America, then set default encoding to UTF-8 or leave it to the default "Automatic Detection". I am aware that the European community recommends UTF-8 coding.
In real life, I notice that with texts in French and occasionally in Spanish, I have less problems with UTF-8 than with any other option. And if you see a page with strange-looking characters in lieu of normal accented letters, change the encoding.
– If you use Arabic or an Asiatic language, default encoding should probably be UTF-8, but I don't really know.

As for the default fonts, I would say it's a totally personal choice. I like Calibri as a sans-serif font for web pages, so I have set it as default. Likewise, if you also read pages in Japanese, select one that you like (and obviously that has Japanese characters in it!).

For email readers: in theory, UTF-8 should be the current standard. Again, a good email software should tag all its emails (except ASCII-only emails, I think) with encoding identification (UTF-8, ISO-8859-1...). But e-mail softwares tend to be old-school, and I find that most "unidentified" emails come with ISO-8859-1 encoding; some versions of Outlook still send Windows-1252 encoding!
To make a long story short, I have set Thunderbird to assume that messages I receive use ISO-8859-1. That way, I almost never receive a message with funny-looking accents.
Michel Gagnon
Montréal (Québec, Canada)
Post Reply