MozillaZine

Getting the right Headers for receiving and sending e-mails

User Help for Mozilla Thunderbird
ramasaig
 
Posts: 478
Joined: September 14th, 2004, 3:04 pm
Location: Isle of Mull, Scotland

Post Posted September 5th, 2013, 2:41 pm

I have a web site which allows guests to make on line bookings for our B&B. The booking system sends us (and the guest) a 'plain text' e-mail. This has worked well for several years. I did not specify a charset in the e-mail headers.
The UK pound sign '£' seemed to display without difficulty (at least in the copies I received).

Recently I've updated the site. It's coded in PHP I have read that one should specify a charset, to ensure characters are correctly displayed (particularly on machines using non-ASCII charsets). So for the headers for 'mail' I have:
$headers = "From: $sender" . "\r\n" . "Reply-to: $sender" . "\r\n" . "Content-Type: text/plain; charset='UTF-8'";

...and further down...

mail ($email, $subject, $message, $headers);

That works OK for the copies I receive from the web site when someone books. The '£' sign is correctly displayed.
At first I didn't have the single quotes around the charset itself, but I found that the '£' sign was either replaced by a black diamond with question mark inside (UTF-8) or preceded by an upper case 'A' with a small circle above it (ISO-8859-1). The single quotes took care of that and the e-mails rendered correctly when they arrived on my computer (using Thunderbird as my e-mail client).

Sorted, I thought, until I responded to a new booking (using 'Reply' in Thunderbird) and discovered that the e-mail that I was sending was blank. The culprit seems to be the single quotes. Thunderbird will send a message with "Content-Type: text/plain; charset=UTF-8" (no single quotes) as normal, but can't cope with "Content-Type: text/plain; charset='UTF-8'", and sends a blank message (although the subject line appears). Thunderbird seems to be using the charset of the incoming e-mail for the outgoing, rather than using a default. I don't know if it's possible to change that, I haven't yet found a way yet.

I've read a lot of stuff on the Internet, where I've found a fair bit of conflicting advice, but everyone seems to recommend specifying a charset.

Can anyone suggest a solution, please ?
Ramasaig
Maolbhuidhe, Isle of Mull

tanstaafl
Moderator

User avatar
 
Posts: 39094
Joined: July 30th, 2003, 5:06 pm
Location: Massachusetts

Post Posted September 5th, 2013, 5:02 pm

New messages that contain Content-Type: text/plain; charset=UTF-8 or Content-Type: text/plain; charset="UTF-8" seem to work well on my copy of Thunderbird. I've never seen one with single quotes. I don't know if single quotes are legal or not in that header, though I have seen them in X- headers such as X-Spam-charsets: to='UTF-8', subject='UTF-8', plain='UTF-8'. I suggest you drop the single quotes in your PHP code.

Tools -> Options -> Display -> Formatting -> Advanced lets you specify the character encoding for both incoming and outgoing mail. Mine are both set to Western ISO-8859-15 (not ISO-8859-1 like yours). There is an adjacent checkbox for "when possible use the default character encoding in replies. I leave that unchecked. You might experiment with checking it.

ISO 8859-15 adds support for a few European characters such as the € and drops support for some characters such as ¼ that are rarely typed at the keyboard . Both -1 and -15 support the £ . http://en.wikipedia.org/wiki/ISO/IEC_8859-15

ramasaig
 
Posts: 478
Joined: September 14th, 2004, 3:04 pm
Location: Isle of Mull, Scotland

Post Posted September 6th, 2013, 12:19 am

Thank you, Tanstaafl.

I've now found the default character settings in TB, They were the same as yours (Western ISO-8859-1).

I'd didn't know whether single quotes are legal either, but they appeared to work when tried. After more tests and reading your message, I now think this was only because they were invalid and forced TB to use the default for reading. However, when it came to sending the invalid setting was used, resulting in blank e-mail.

When I set TB default charset to UTF-8, and omitted the settings in PHP I got the UTF-8 'unknown' (black diamond with white '?') on my message from the web site. But a direct message (TB to TB) to myself showed the '£' correctly.

Setting the TB defaults back to ISO-8859-1 rendered the £ correctly on the web site message (including ones received earlier, from which I deduce that the settings are applied at the time of reading, not arrival).

All very confusing for me, let alone someone else reading this.

I think my provisional conclusion at this point has to be that the problem lies in the PHP (or the mail server), not with TB. The best solution FOR ME appears to be NOT to specify the charset in the PHP, but to rely on TB's default ISO-8859-1 setting. However, that means I've no idea what is being seen by others, and is contrary to recommended practice.

Fortunately perhaps (in this context) most of our guests are from English speaking countries or Europe.
No wonder people fall back on using GBP.
Ramasaig
Maolbhuidhe, Isle of Mull

ramasaig
 
Posts: 478
Joined: September 14th, 2004, 3:04 pm
Location: Isle of Mull, Scotland

Post Posted September 6th, 2013, 2:29 am

Another possibility occurs to me: Is it possible Thunderbird doesn't display UTF-8 correctly ? (referring specifically to the '£' sign, which shows as 'unknown')

The e-mail sent from our web site (localhost, Apache on Win7) has the following in the message source :
QUOTE
From: trhdawson@gmail.com
Reply-to: linda@maolbhuidhe.co.uk
Content-Type: text/plain; charset="UTF-8"
UNQUOTE

I've not added any quotation marks of my own in the above, to avoid possible confusion. The charset coding looks correct to me (albeit with double quotes, as inserted by me in the PHP).
'From' seems to have been changed by gmail, but that shouldn't affect anything else.

TB default is Western ISO-8859-1, but 'View > Character Encoding' shows UTF-8, so it has been detected correctly.

We know TB displays the '£' correctly in ISO-8859-1, so I have now tried specifically setting that in the PHP.
First tests (localhost) look OK, so I'll now try it on the remote (public-facing) web site, and report back.
TB 'View > Character Encoding' shows Western ISO-8859-1, and the message source contains:
QUOTE
From: trhdawson@gmail.com
Reply-to: linda@maolbhuidhe.co.uk
Content-Type: text/plain; charset="ISO-8859-1"
UNQUOTE

I HAD tried it before, but probably with the single quotes, because it didn't seem to work then.

I'm hopeful my problem is resolved, but if so there's still the suggestion that TB isn't displaying the '£' sign in UTF-8 correctly.
Ramasaig
Maolbhuidhe, Isle of Mull

ramasaig
 
Posts: 478
Joined: September 14th, 2004, 3:04 pm
Location: Isle of Mull, Scotland

Post Posted September 6th, 2013, 3:34 am

No, I didn't work from the remote site. In the TB message source I get:
QUOTE
From: tim@ramasaig.com
Reply-to: linda@maolbhuidhe.co.uk
Content-Type: text/plain; charset="ISO-8859-1"
UNQUOTE

Which is the same as before. And TB 'View > Character Encoding' shows Western ISO-8859-1.

But the '£' displays as '£' (that's A circumflex). (I'm sure it came as A with circle above in an earlier test, but not now).

It's really puzzling that the routing of the messages seems to make this difference. The localhost ones come off my computer, and I'm using gmail for my SMTP. The remote ones come off my remote host's server, and get forwarded via another ISP. On the face of it, that makes a complete nonsense of trying to set the Character Encoding at all.

I shall continue testing, but it more and more looks as if I'll be forced back on 'GBP', which is reliable.
Ramasaig
Maolbhuidhe, Isle of Mull

tanstaafl
Moderator

User avatar
 
Posts: 39094
Joined: July 30th, 2003, 5:06 pm
Location: Massachusetts

Post Posted September 6th, 2013, 5:50 am

Try copying the entire codepage layout section from http://en.wikipedia.org/wiki/ISO/IEC_8859-1 into a HTML message and see if that is displayed correctly in a new message. I have a message with that and Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit headers and it looks fine, when I do a side by side comparison with the original web page.

http://stackoverflow.com/questions/2477 ... ce-of-this talks about one scenario where †is displayed instead of ' due to the data actually being encoded using CP-1252 instead of UTF-8. Telling the client to display it using charset=UTF-8 won't fix that.

I suggest you save one of the messages that is displayed incorrectly to a file and then use a hex editor to look at the actual characters that were in the file.

ramasaig
 
Posts: 478
Joined: September 14th, 2004, 3:04 pm
Location: Isle of Mull, Scotland

Post Posted September 6th, 2013, 7:07 am

Hello tanstaafl, Thank you for your reply and suggestions.

tanstaafl wrote:Try copying the entire codepage layout section from http://en.wikipedia.org/wiki/ISO/IEC_8859-1 into a HTML message and see if that is displayed correctly in a new message.

I'm only dealing with a plain text message, but I'll try to do as you suggest (maybe if I drop the table the test will be valid in plain text).
I tried the above (pasting in the data copied from the page, without the HTML table), but NONE of it came through. This may have something to do with the security measures on the textarea, which is designed to prevent code injection. I should explain that the on-line booking form has a textarea for guest comments, and that was where I tried pasting the code. I'll have to look for a cleaner solution.

http://stackoverflow.com/questions/2477 ... ce-of-this talks about one scenario where †is displayed instead of ' due to the data actually being encoded using CP-1252 instead of UTF-8. Telling the client to display it using charset=UTF-8 won't fix that.

The '£' signs aren't encoded (e.g. in a database), they are only in the PHP code constructing the e-mails. The PHP files are all UTF-8, as are the web pages. I'll re-read this article, but it's really about display on web pages.

I suggest you save one of the messages that is displayed incorrectly to a file and then use a hex editor to look at the actual characters that were in the file.

The line (from the received e-mail) reads: "Deposit: £25.00 due on confirmation"
The relevant Hex chars are 00 C2 (for the A) and 00 A3 (for the £), which are (of course) correct for those symbols. In the text to the right of the hex block the 00 is rendered as '.' (full stop or period). I did try copying and pasting the entire hex code, but then I only got the text part when I pasted.

The PHP (curiously ?) shows '£' in text mode but in hex mode shows C2 A3 (and A£ in the text block (that's A circumflex)), but there are no '00' symbols. I wonder if it could be the text editor that's causing the problem ? I've never considered that before. I'm using UltraEdit (IDM Computer Solutions, Hamilton, OH). I'll have to ask them, but they don't work weekends, so it'll be Monday now before I get a response.

I have reverted to 'GBP' for the time being.

I'll have to build some on-line testing files, because I can't keep fiddling with my live code and it seems to work OK in the localhost version, as commented previously.

I'm going to work on the text editor possibility next, and will come back to the other stuff if that doesn't produce the answer. Who'd have thought that such an apparently trivial thing could produce so many words.
Ramasaig
Maolbhuidhe, Isle of Mull

ramasaig
 
Posts: 478
Joined: September 14th, 2004, 3:04 pm
Location: Isle of Mull, Scotland

Post Posted September 19th, 2013, 11:49 am

Thank you for your help with this problem.

I eventually traced it to an encoding problem with the uploaded PHP files. Although they appeared to be UTF-8 when viewed in my text editor, locally when uploaded they were either ASCII or ANSI Latin1. Hence the inconsistencies.
The clue was the difference in results between messages sent from this computer and those sent from the live web site.
Ramasaig
Maolbhuidhe, Isle of Mull

Return to Thunderbird Support


Who is online

Users browsing this forum: No registered users and 6 guests