Punctuation problem - Diamond with question mark

Discuss how to use and promote Web standards with the Mozilla Gecko engine.
Post Reply
LTHWriter
Posts: 2
Joined: September 8th, 2005, 4:46 pm

Punctuation problem - Diamond with question mark

Post by LTHWriter »

I've seen a few people here post questions about the problem with Firefox not translating punctuation properly in some web pages and as a result, the "offending" punctuation marks look like black diamonds with a white question mark. It seems to be suggested that it's something to do with improper character encoding, and while that may be the technical explanation, what I want to know is - what's Firefox going to DO about it?

Because realistically speaking, Firefox, while a great browser in other respects, is coming in late in the game. There are millions of existing web pages out there that people have created, properly or improperly, and the reality is that many of these web pages have content that was copied and pasted out of an MS Word file (which is what I believe is creating the problem). MS Word's character set probably IS different than the HTML standard, but since the majority of the people on the planet use Word to write up and store their documents - including the web page copy they are writing - and they then copy and paste that text into whatever HTML editor they are using.

I'm a professional writer, as well as being a web designer since 1992, and other people hire me to write their web copy. I use Word to do this and to deliver the content to the client. If I am not the one actually loading the web copy into their web pages, I have no control over how they get the copy into their pages, but you can bet they or their web designer are doing a copy/paste straight out of the Word file. It's standard operating procedure and no one is about to change that. Plus, when you consider that a lot of small businesses use Frontpage or other template-based web site design services for "do-it-yourself" web sites, they are not going to know enough to know how to fix "character set encoding" -- all they want to do is drop their text into their web pages and get out again. They don't want to know how it all works.

Since a fair percentage of people do this copy and paste thing from Word, wouldn't it make sense for Firefox to be a bit more forgiving in its ability to translate and display punctuation properly - even if it's not coded exactly right in the first place? If IE can do it, why not Firefox? Let's not be so elitist! It's darned annoying to see all those stupid little diamonds all over the pages I'm reading when I'm surfing the web. I can control my own web site but not everyone else's!

Hope someone from Firefox tech support is reading this and taking the HINT!
User avatar
the-edmeister
Posts: 32249
Joined: February 25th, 2003, 12:51 am
Location: Chicago, IL, USA

Post by the-edmeister »

Forgive me for being so blunt, but any so-called professional who uses Word for creating webpages is a hack. True there are many pages on the web that are done in the manner that you describe, but they aren't "real web-pages" until they conform to existing W3C standards; they are merely text files with HTML coding.

I disagree with your statement about Firefox "coming out late in the game", Mozilla Firefox is a direct descendant of the original Netscape browser which I believe was the second internet browser.

There is a revolution starting, open source is "in" and proprietary software is going to be displaced. Without the authors' of open source following established standards "we" would be back in the pre-Windows days where different pieces of software wouldn't be able to communicate with each other.

Adapt or watch your established business slowly disappear to writer's and web designer's who did adapt.

Ed
A mind is a terrible thing to waste. Mine has wandered off and I'm out looking for it.
LTHWriter
Posts: 2
Joined: September 8th, 2005, 4:46 pm

Post by LTHWriter »

I beg YOUR pardon but you made a big assumption and clearly did not read my words. I did not say I use Word to create web PAGES, I said I use Word to write up the TEXT that goes into a web site for my clients. It's not the same thing. And there are many NON professionals who are creating their own web pages - that's the reality of the world now and the technology is out there to allow them to do that. I know Firefox is a descendant of Netscape - but Netscape doesn't have this punctuation problem and Firefox does. And just because a web designer may not be an expert in all the intricacies and bugs of a browser does not make him or her a "so-called" professional. Get off your high horse and deal in the real world.

What I asked for was someone from Firefox to consider figuring out a solution. I did not, however, recall requesting a dissertation or an attack from a pompous ass (anyone putting "meister" after his name clearly has some serious ego issues). Before you spout off your opinions about the capabilities and credentials of people you don't know, I suggest you tread a bit more cautiously next time. Your own ignorance is showing.

If anyone else other than "edmeister" has anything constructive to add, I'd be more than happy to hear it.
User avatar
Rishi M.
Folder@Home
Posts: 1294
Joined: April 29th, 2005, 7:36 pm
Location: Toronto, Canada
Contact:

Post by Rishi M. »

The solution is to conform to W3C standards. Two wrongs do not make a right.
Quidquid latine dictum sit, altum sonatur.
Folding for Team MozillaZine (No. 39340) with 32.4GHz of power. Your machine can make a difference! Join now.
Lost User 36785
Posts: 0
Joined: December 31st, 1969, 5:00 pm

Post by Lost User 36785 »

Kinda weird. I've been using Firefox since 0.6 and I can't remember seeing the problem that you describe, LTHWriter....or maybe I just never paid attention.
User avatar
James
Moderator
Posts: 27999
Joined: June 18th, 2003, 3:07 pm
Location: Made in Canada

Post by James »

Relax guys...

Moving to Web Dev forum.
User avatar
jqp
Posts: 5070
Joined: November 17th, 2004, 10:56 am
Location: In a box
Contact:

Post by jqp »

Firefox has a solution: you can manually chang the rendered character encoding from the view menu.

Firefox is going to obey HTTP standard by obeying the character encoding header that is sent to it.
Firefox is going to obey HTML standards by obeying a META tag, but only when the corresponding HTTP header was not sent by the server.

If you're a professional designer and you don't understand the point of character encoding, then you have learning to do.

If you're not a professional designer and you create/edit pages and are confronted by character encoding issues, then you need to get better tools. Don't blame Firefox for something that Microsoft Word does wrong. Get the right tool for the job.

Of course, odd characters (characters with accents, non-english punctuation) can be fixed by changing about 5 characters on your server to read "UTF-8" instead of what it already says.

Why adhere so strictly to standards? Because if everyone continues to bend the rules, the web is going to continue to depend on browsers that are willing to bend these rules. Then, we end up with browsers that don't agree on how and how much to bend the rules. Heck, some browsers might even make up their own rules. Next thing you know, it's 1997 again. When browsers start to follow the rules, webpages start to follow the rules. When webpages start to follow the rules it ceases to matter what browser you use - making everyone's life easier.

As for all the whining and moaning about Front Page, Word, and others... If you use the wrong tools for a job, you're going to get the wrong results. I have an appliance here at home that uses manufacturer-proprietary screws. I don't have a screwdriver that fits these screws. The manufacturer intentionally made these screws so that they can only be removed by the manufacturer's screwdrivers. Some screws are made so that they can't be removed at all without destroying them. Do I bitch at my flatheads, hexheads and phillips drivers? No. I bitch at the manufacturer for not following the standards of design. There are decent web authoring tools that make appliances compatible with all sort of browsers and screwdrivers out there, but none of them are made by Microsoft.
User avatar
jscher2000
Posts: 11742
Joined: December 19th, 2004, 12:26 am
Location: Silicon Valley, CA USA
Contact:

Re: Punctuation problem - Diamond with question mark

Post by jscher2000 »

LTHWriter wrote:I'm a professional writer, as well as being a web designer since 1992, and other people hire me to write their web copy. I use Word to do this and to deliver the content to the client. If I am not the one actually loading the web copy into their web pages, I have no control over how they get the copy into their pages, but you can bet they or their web designer are doing a copy/paste straight out of the Word file. It's standard operating procedure and no one is about to change that.

Do you use bold or italics in your writing? It seems unlikely that the publisher-intermediary really would be pasting the content as plain text. If they are, and then styling after the fact, you could record a few "find and replace" macros to convert the most common special characters to their HTML entity equivalents: "smart quotes", the curly apostrophe, and em and en dashes are the ones that come to mind. You could deliver that "converted" version alongside the original to optimize the display of your writing.

As for the end-user experience, the problem does not seem to affect very many "English-US" pages, but I do see it now and then on "Central European" or "Unicode" pages. The manual end-user fix is, as you point out, somewhat inconvenient (View>Character Encoding>Western), but as mentioned by someone else in this thread, if the author of the page did specify a particular encoding, it seems wrong for the browser to pick and choose certain characters to be exceptions.
User avatar
jqp
Posts: 5070
Joined: November 17th, 2004, 10:56 am
Location: In a box
Contact:

Post by jqp »

Thanks for pointing out that we're talking about curly apostrophes and quotes. That's obviously what the OP is talking about, though he has failed to mention it so far. I know what you're talking about now... That is something that can be disabled in Word if it's a real problem.

What character set should be used if we're using those curly-quote characters? I, too, have people email me things that they obviously wrote in Word only to discover after posting stuff to the site that the quotes need to be changed. Of course, I don't blame firefox... I just fix the quotes. Should I be using a differenct character set or is that not the answer for this kind of problem? Right now, I'm using UTF-8
User avatar
jscher2000
Posts: 11742
Joined: December 19th, 2004, 12:26 am
Location: Silicon Valley, CA USA
Contact:

Post by jscher2000 »

jonnyq wrote:I just fix the quotes. Should I be using a differenct character set or is that not the answer for this kind of problem?

I usually use the Western European encoding, but I'm sure there are sites and languages that demand other encodings... Anyway, for special characters, I think the best bet is to use character entities such as:
    curly apostrophe (right single quotation mark) ’ => ’ or &#8217 ;
    curly left "opening" quotation mark “ => “ or &#8220 ;
    curly right "closing" quotation mark ” => ” or &#8221 ;
    "em" dash — => — or &#8212 ;
Must be a handy list of these somewhere on the web. ;-)

[Note: had to add spaces before the semicolons to prevent the board from "converting" the character]
User avatar
jqp
Posts: 5070
Joined: November 17th, 2004, 10:56 am
Location: In a box
Contact:

Post by jqp »

Looks like this string looks fine in UTF-8 but not in ISO-8859-1 “This is a test”
So looks like UTF-8 is what I want in the first place. (I believe ISO-8859-1 is a subset of UTF-8 )

No idea why it looks right on this page using ISO-8859-1 and no character entity.
User avatar
jhaygood86
Posts: 372
Joined: July 18th, 2003, 11:58 am
Location: Marietta, GA
Contact:

Post by jhaygood86 »

jonnyq wrote:I believe ISO-8859-1 is a subset of UTF-8


Close :-D. ISO-8859-1 is roughly (or is? don't know the details offhand) equivalent to ANSI / ASCII 8-bit text, and UTF-8 is a superset of said encoding, at least according to what I know, and my Comp Sci professors.
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
Post Reply