MozillaZine

2004-03-12 Weekly Build Now Available (Win32)

Discussion about official Mozilla Thunderbird builds
Michael Buschbeck
 
Posts: 66
Joined: January 19th, 2004, 6:06 am
Location: Frankfurt/Main, Germany

Post Posted March 13th, 2004, 2:53 pm

Michael Buschbeck wrote:Perhaps the "original" non-IDLE auto-checking code is conflicting with the new IDLE code...?

I'll try disabling "Check for new messages every ... minutes" (I just didn't think of that before) and I'll see whether it helps.

No, that's not it... :(

Two minutes ago a new message was delivered to my mail server, causing Thunderbird to (duly) show the mail notification and (unduly) highlight a folder that hasn't received new mail for weeks while leaving the folder which actually received the new mail un-highlighted.

Something still doesn't really work there...

djbrock
 
Posts: 79
Joined: May 2nd, 2003, 8:21 pm

Post Posted March 13th, 2004, 3:03 pm

MScott,
I took your advise and reread the post.
(1) I sorted my trash into SPAM and HAM folders.
(2) SPAM folder had 725 messages.
(3) Reset my training.dat
(4) deleted my MSF files
(5) Ran Junk Mail filter on SPAM and it caught 720 leaving only 5.
(6) Marked those 5 by hand. then relocated all the Junk to the SPAM box.
(7) HAM folder had 1808 in it (plenty not on my White List, i.e. forum alert mails, newsletters and such) Ran filter. It caught 967. I unmarked them.
(8) Ran the filter on my Inbox with 25 messages (all HAM or White list stuff, including Postmaster alerts and forum alerts) the filter did not mark any of them, so I consider that a whopping success.
My Training.dat file is 904K now.
(9) I decided to "check" my work, so I deleted the MSF files for my HAM and SPAM folders and I reran the filter on the SPAM folder of 725 messages. It caught only 2.
(10) I then reran the filter on the HAM folder, and it did not misidentify any of those 1808 messages.
(11) I then deleted the MSF file for the Inbox and reran the filter, it worked fine and did not misidentify any of the 25 messages.

The only thing that suprised me is that it didn't identify the SPAM on the second try. I went back and reviewed your other posts and then selected the whole batch and marked them as Junk. That boosted the size of my Training Dat to 1,562K.

I have since received a couple of new Spams in my Inbox, and the filter has not caught them. Unless I just screwed up by attempting step 9, then it looks like I still have the problem of the filter not working well after manual correction of misidentified mail.

I appreciate all your work, believe me.

Shaddow
 
Posts: 25
Joined: January 20th, 2003, 10:42 am

Post Posted March 13th, 2004, 3:18 pm

djbrock wrote:MScott,
I took your advise and reread the post.
(1) I sorted my trash into SPAM and HAM folders.
(2) SPAM folder had 725 messages.
(3) Reset my training.dat
(4) deleted my MSF files
(5) Ran Junk Mail filter on SPAM and it caught 720 leaving only 5.
(6) Marked those 5 by hand. then relocated all the Junk to the SPAM box.
(7) HAM folder had 1808 in it (plenty not on my White List, i.e. forum alert mails, newsletters and such) Ran filter. It caught 967. I unmarked them.
(8) Ran the filter on my Inbox with 25 messages (all HAM or White list stuff, including Postmaster alerts and forum alerts) the filter did not mark any of them, so I consider that a whopping success.
My Training.dat file is 904K now.
(9) I decided to "check" my work, so I deleted the MSF files for my HAM and SPAM folders and I reran the filter on the SPAM folder of 725 messages. It caught only 2.
(10) I then reran the filter on the HAM folder, and it did not misidentify any of those 1808 messages.
(11) I then deleted the MSF file for the Inbox and reran the filter, it worked fine and did not misidentify any of the 25 messages.

The only thing that suprised me is that it didn't identify the SPAM on the second try. I went back and reviewed your other posts and then selected the whole batch and marked them as Junk. That boosted the size of my Training Dat to 1,562K.

I have since received a couple of new Spams in my Inbox, and the filter has not caught them. Unless I just screwed up by attempting step 9, then it looks like I still have the problem of the filter not working well after manual correction of misidentified mail.

As for the NSPR logging, I followed the link you gave and I assumed that what what you wanted me to do. I have searched this thread and the site and have not found a reference to the NSPR from you. Could you direct me to it?

I appreciate all your work, believe me.
I get similiar results as you. I mark about 350 messages as junk, and about 350 as not junk, and everything seems to work except for a few false positives. As soon as I correct the false positives, then go back and run the controls against a known set of junk, it hardly marks any of them as junk. Doing this with spambayes imap filter has always resulted in *very* accurate results for me in the past. I haven't yet tried this with the logging turned on to see if I can tell anything from the new probabilities.

The settings for the NSPR are,
create a batch file in your TB directory and put this in it
set NSPR_LOG_MODULES=BayesianFilter:5
set NSPR_LOG_FILE=c:\imap\bayes.log
start thunderbird.exe

it will create a logfile with a lot to go and read through :)

djbrock
 
Posts: 79
Joined: May 2nd, 2003, 8:21 pm

Post Posted March 13th, 2004, 3:21 pm

Thanks. Glad I'm not the only odd duck out here. I did figure out the NSPR setting and I do have a log of it now. I can't say it changed anything in the way the program functions. I am going to delete the MSF on the SPAM folder and remark those again, see if that helps.

I did that and ran the Junk filter again, it caught 484 and left 228 of my Spam. Previously on the very first run it caught 720 out of 725. Think I will try this a couple of more times and see if it improves the accuracy.

mscott

User avatar
 
Posts: 2516
Joined: April 2nd, 2003, 4:10 pm
Location: Thunderbird Research Center, CA

Post Posted March 13th, 2004, 3:33 pm

no the log won't change the results. It will show you the scores of messages. You can see the numbers it is calculating.

You've got a nice good trainining.dat file now. Great. Now delete the log (otherwise it gets too big). Restart. And run the junk controls on a couple of messages it failed to classify that you think it should have. Now open the log, got to the bottom and look at the scores being generated for these messages. Are they close to .90? Lower, higher?
Thunderbirds are Go!

Shaddow
 
Posts: 25
Joined: January 20th, 2003, 10:42 am

Post Posted March 13th, 2004, 3:35 pm

There are people in the original junk mail version test thread who have reported similiar things about things working untill the false positives come into play. I am trying just classifying mail as it comes in and not pretraining it and seeing how that goes for me, I got the original mozilla bayes stuff working very good that way, so I'm going to see how that goes.

The NSPR thing should'n't change how anything is behaving, only log what it is doing to the file so you can go back and see some of numbers that result from the logic it is using, rather than just a junk icon or not.

Mscott, any chance of getting some sort of "Spam Suspects" behavior with TB like with spambayes, I always liked that, it seemed to help train the filters better, since you got an idea of what was creeping closer to the spam threshold, and classify accordingly.

djbrock
 
Posts: 79
Joined: May 2nd, 2003, 8:21 pm

Post Posted March 13th, 2004, 3:52 pm

MScott,

As I told Shaddow, I delted the SPAM folder MSF file and continued training on those 725 messages. After two more trainings it got 518 and left 214.

As per your request, I deleted a new piece of Spam that the filter had missed concerning university degrees. I think the line you're interested in is
probability = (0.896811) HAM SCORE:0.197794 SPAM SCORE:0.991416

I can see that it barely beat the threshold of 90. That is a helpful bit of information.

mscott

User avatar
 
Posts: 2516
Joined: April 2nd, 2003, 4:10 pm
Location: Thunderbird Research Center, CA

Post Posted March 13th, 2004, 3:54 pm

djbrock wrote:MScott,

As I told Shaddow, I delted the SPAM folder MSF file and continued training on those 725 messages. After two more trainings it got 518 and left 214.

As per your request, I deleted a new piece of Spam that the filter had missed concerning university degrees. I think the line you're interested in is
probability = (0.896811) HAM SCORE:0.197794 SPAM SCORE:0.991416

I can see that it barely beat the threshold of 90. That is a helpful bit of information.
Yeah, so if a lot of the messages it doesn't mark as spam are just below the threshold of 90 like this one you could tweak your settings and lower it to 88 or maybe 85.
Thunderbirds are Go!

Shaddow
 
Posts: 25
Joined: January 20th, 2003, 10:42 am

Post Posted March 13th, 2004, 6:19 pm

mscott wrote:
djbrock wrote:MScott,

As I told Shaddow, I delted the SPAM folder MSF file and continued training on those 725 messages. After two more trainings it got 518 and left 214.

As per your request, I deleted a new piece of Spam that the filter had missed concerning university degrees. I think the line you're interested in is
probability = (0.896811) HAM SCORE:0.197794 SPAM SCORE:0.991416

I can see that it barely beat the threshold of 90. That is a helpful bit of information.
Yeah, so if a lot of the messages it doesn't mark as spam are just below the threshold of 90 like this one you could tweak your settings and lower it to 88 or maybe 85.
mscott,
I'm now having good luck with my filters, but I'm now running into one slight problem. Is it possible that the filter can get overloaded in some cases? I have about a dozen emails that I get results like this, and wonder if maybe there are too many tokens or something.

imap-message://myemailaddress%40mydomain.com@mymailhost.mydomain.com/testing#1?fetchCompleteMessage=true is junk probability = (0.500000) HAM SCORE:0.000000 SPAM SCORE:0.000000

I can email you the entire adding tokens and probabilities parts but they are far too long to post here :)

gerbig
 
Posts: 54
Joined: January 23rd, 2003, 4:37 pm

Post Posted March 14th, 2004, 2:18 am

djbrock wrote:MScott,
The only thing that suprised me is that it didn't identify the SPAM on the second try. I went back and reviewed your other posts and then selected the whole batch and marked them as Junk. That boosted the size of my Training Dat to 1,562K.


Did you also manually mark all your ham as Not Junk? If not, that might help the accuracy of the filter.

mjc

User avatar
 
Posts: 95
Joined: December 2nd, 2002, 8:55 am

Post Posted March 14th, 2004, 6:54 am

Michael Buschbeck wrote:Two minutes ago a new message was delivered to my mail server, causing Thunderbird to (duly) show the mail notification and (unduly) highlight a folder that hasn't received new mail for weeks while leaving the folder which actually received the new mail un-highlighted.

Something still doesn't really work there...


I've been seeing the same thing in recent IDLE-enabled builds... that is, rarely-used folders are occasionally highlighted even though no new mail is in them.

- Mike

mscott

User avatar
 
Posts: 2516
Joined: April 2nd, 2003, 4:10 pm
Location: Thunderbird Research Center, CA

Post Posted March 14th, 2004, 11:28 am

Shaddow wrote:
imap-message://myemailaddress%40mydomain.com@mymailhost.mydomain.com/testing#1?fetchCompleteMessage=true is junk probability = (0.500000) HAM SCORE:0.000000 SPAM SCORE:0.000000

I can email you the entire adding tokens and probabilities parts but they are far too long to post here :)


That means that none of the words in the message occurred in the training set. Seems like that case would be pretty rare. Right above the line that reports the final score, you should see some lines that show the tokens that were found for this message. An easy way to do this is to remove your log file. Re-run. Run the junk mail controls on JUST one of these messages (otherwise the log file just gets too big). Quit and then go to the bottom of the log file. You should see this score there. Look up above the score at some of the tokens and the scores those tokens generated.
Thunderbirds are Go!

Shaddow
 
Posts: 25
Joined: January 20th, 2003, 10:42 am

Post Posted March 14th, 2004, 11:35 am

mscott wrote:
Shaddow wrote:
imap-message://myemailaddress%40mydomain.com@mymailhost.mydomain.com/testing#1?fetchCompleteMessage=true is junk probability = (0.500000) HAM SCORE:0.000000 SPAM SCORE:0.000000

I can email you the entire adding tokens and probabilities parts but they are far too long to post here :)


That means that none of the words in the message occurred in the training set. Seems like that case would be pretty rare. Right above the line that reports the final score, you should see some lines that show the tokens that were found for this message. An easy way to do this is to remove your log file. Re-run. Run the junk mail controls on JUST one of these messages (otherwise the log file just gets too big). Quit and then go to the bottom of the log file. You should see this score there. Look up above the score at some of the tokens and the scores those tokens generated.


That is basically what I've done, there are 488 tokens it is running probabilities on, all of them have a score.

Such as
0[293ef0]: token.mProbability (chung) is 0.969799
0[293ef0]: token.mProbability (copied,) is 0.934783
0[293ef0]: token.mProbability (conjunction) is 0.822240
0[293ef0]: token.mProbability (forms) is 0.881183
0[293ef0]: token.mProbability (represent) is 0.958716
0[293ef0]: token.mProbability (seeking) is 0.922578
0[293ef0]: token.mProbability (north) is 0.724323
0[293ef0]: token.mProbability (killing) is 0.969799
0[293ef0]: token.mProbability (patient) is 0.902119
0[293ef0]: token.mProbability (htds) is 0.969799

and then it ends with the HAM 0.000 SPAM 0.000 stuff.

Two interesting notes, I have marked this stuff as spam, all 12-15 of them, and more interesting is that they all have something to do with stocks or the stock market.

JoeS

User avatar
 
Posts: 2337
Joined: June 8th, 2003, 9:15 am

Post Posted March 14th, 2004, 12:17 pm

I think this is a problem in the Moz trunk build
and not specifically Tbird related.
It has to do with dynamically using javascript with
clip: rect
The image is not displayed at all.
http://bugzilla.mozilla.org/show_bug.cgi?id=237447

Below info from post I made in wrong thread
----------------------------------------------------
I am pretty sure this is unrelated to this test build.
So anyhow, I have a javascript that uses the following
mouse event handlers:
href="#" onclick="return true" onmouseover="status=1;show()"
onmouseout="status=0;hide()" which displays an expanding image clip.

Now the expanding clip is not visible

I suspected this had to do with bugfix for Bug 205893 which made a major
change in image optimization on windows systems.
so I attempted to install an earlier version of Thunderbird from the trunk

Question # 1 None of the following versions will unzip properly.

2004-03-09-00-trunk/ 09-Mar-2004 01:49 -
2004-03-10-00-trunk/ 10-Mar-2004 01:49 -
2004-03-11-00-trunk/ 11-Mar-2004 01:51 -
2004-03-11-08-trunk/ 11-Mar-2004 09:39 -
2004-03-11-09-trunk/ 11-Mar-2004 11:20 -

Are these valid Thunderbird builds or not.

I had to go all the way back to Scott's last weekly 03/05 to get a good d/l
BTW the script works as expected with 0.5+ (20040305)
Last edited by JoeS on March 15th, 2004, 4:13 pm, edited 1 time in total.
JoeS Testing current Thunderbird trunk builds WinXP SP2+
news://news.mozilla.org.mozilla.test.multimedia How to Post

avbohemen
 
Posts: 292
Joined: April 25th, 2003, 6:50 am
Location: The Netherlands

Post Posted March 14th, 2004, 12:55 pm

Thunderbird still crashes when I close thunderbird after installing an extension. Restarting is no problem, and the extension is installed fine but the crash should not occur.

By the way, when is Thunderbird swithing to using the new extension installer that Firefox uses? bug #229600 seems fixed now.

Return to Thunderbird Builds


Who is online

Users browsing this forum: No registered users and 1 guest