Test Win32 Build Available with Junk Mail Filter Changes

Discussion about official Mozilla Thunderbird builds
Locked
User avatar
wgianopoulos
Posts: 1746
Joined: July 23rd, 2003, 8:15 am

Post by wgianopoulos »

On an unrelated to junk filtering note, I find with this build that if I install an extension, Thunderbird always crashes when I close it to restart it to activate the extension. Both Thunderbird and the extension appear to work fine though. On subsequesnt closes it does not crash. Is this a known issue?

EDIT: I could have checked this myself in Bugzilla but that appears to be broken today as well. :-(

EDIT2: Bugzilla appears to be fixed now, but I could not find a bug relating to this issue.
mgl
Posts: 29
Joined: August 22nd, 2003, 9:57 am

I guess I've screwed up, then...

Post by mgl »

Tiger Bob wrote:I have approximately 200 junk mails. I marked them all as not junk, installled the new software and deleted the training file. I then ran the junk mail control on the folder from the tools menu. 100% of the items was identified as junk. A great start!!


Looks like my elaborate routine involving a 4,000-message junk corpus has failed--TB is catching only a small minority of my incoming junk, about 3 out of 11 this morning. Before this test build, I was getting only 10-15% false negatives; now it's almost reversed itself. Since I installed this build, I've had about 50 spams, and less than 10 have been properly identified.

Maybe I'll wipe training.dat and try again without the 4,000 message corpus, though I can't imagine how that would make the filter more effective.
User avatar
christos
Posts: 126
Joined: September 5th, 2003, 5:42 pm
Location: Geneva

Post by christos »

Hi,

I decide to give the new build a shot today. I was reluctant to do so, because my TB0.5 junk filtering has been working exceptionally well and I do recall that I had to learn through trial & error how to properly train the bayesian filter (I agree that a short description of how to do the training, e.g. give the filter at least 50 spam emails, and at least 200 good emails before you start filtering, would be very useful for all the newbies; an automated process, e.g. through a wizard, would be even better if it's easy to implement)

Not much to report yet; I have 1/1 junk emails and 0/0 good emails filtered (with about 25-30 good emails surviving w/o problems). Will post another note if I see problems.

On a side note: I noticed that the test new-junk build does not display the date on the message pane any more:
Date missing from message pane

Compare this with the 0.5 build where the date is there:
Date shown in message pane

Is this a problem in recent builds?

Cheers,

--Christos
User avatar
wgianopoulos
Posts: 1746
Joined: July 23rd, 2003, 8:15 am

Post by wgianopoulos »

christos wrote:On a side note: I noticed that the test new-junk build does not display the date on the message pane any more


Very strange, because it does display for me.
mscott
Posts: 2516
Joined: April 2nd, 2003, 4:10 pm
Location: Thunderbird Research Center, CA
Contact:

Post by mscott »

wgianopoulos@yahoo.com wrote:
christos wrote:On a side note: I noticed that the test new-junk build does not display the date on the message pane any more


Very strange, because it does display for me.
lol....good find... I see this too just never noticed. I'll fix it in myjunk mail patch
Thunderbirds are Go!
Kerrick
Posts: 202
Joined: May 30th, 2003, 7:58 am

Post by Kerrick »

While i neglected to have a large spam and ham corpus onhand to immedaitely train up the filter, over the past few days I have trained it on a few hundred spam messages and a significantly lower supply of ham (perhaps a hundred at most)

Thus far I have had no false positives. There is a steady but smalll trickle of false negatives coming in from a particular mailbox, which in the past was ALWAYS the one that gave me the most trouble with oddly-formatted spam that the filter had issues catching.

So far I am pleased with its performance. I can't say statistically whether it is better or not, but It seems to be doing better, especially in recognizing the mailing list i am on and so forth.
Pussycat
Posts: 182
Joined: June 21st, 2003, 8:34 am
Location: Between The Netherlands and Germany

Post by Pussycat »

Thanks for working on the spam filter, it needed it. I'm now testing it.

Perhaps a good time to fix the bug in the spam preferences settings as well?
http://bugzilla.mozilla.org/show_bug.cgi?id=230110
User avatar
peaveyman
Posts: 341
Joined: June 1st, 2003, 6:24 pm

Post by peaveyman »

I thought this build was working pretty good, but now the training seems to have no effect on the junk mail filter. Everytime I run the junk mail controls on my junk mail folder it is marking the same ones as not junk everytime. I manually mark them as junk, then run the control again, but it is still marking serveral as not junk.
mgl
Posts: 29
Joined: August 22nd, 2003, 9:57 am

Post by mgl »

Yeah, I'm having rotten luck with this build. I've tried training it on a huge corpus of junk, training it only on false positives and negatives, and several points in between, and I just can't get it to recognize more than a small fraction of incoming spam. It only does an initial spate of false positives each time after I delete training.dat, but always has >70% false negatives.

The junk filter in the 0.5 build was much better performing for me, though I may be doing something wrong. I think I may go back to that one, since I saved the training.dat file from that build.
User avatar
christos
Posts: 126
Joined: September 5th, 2003, 5:42 pm
Location: Geneva

Post by christos »

mgl wrote:Yeah, I'm having rotten luck with this build. I've tried training it on a huge corpus of junk, training it only on false positives and negatives, and several points in between, and I just can't get it to recognize more than a small fraction of incoming spam.


Did you also give it a large amount of good emails? This is important, and not advertised often enough...

--C
mgl
Posts: 29
Joined: August 22nd, 2003, 9:57 am

Post by mgl »

And here's something else weird ... sometimes TB doesn't create or update training.dat. For instance, I delete training.dat, then open TB and futz around with the Junk Mail settings, marking and unmarking messages etc., but none of this is recorded because TB doesn't create a training.dat. Similarly, if training.dat exists, I can mess around in TB with "Run Junk Mail Controls on Folder" etc, but training.dat isn't updated, so none of it changes anything.

This could help to explain some difficulties. Using win32 build on XP Pro.
mgl
Posts: 29
Joined: August 22nd, 2003, 9:57 am

Post by mgl »

christos wrote:
mgl wrote:Yeah, I'm having rotten luck with this build. I've tried training it on a huge corpus of junk, training it only on false positives and negatives, and several points in between, and I just can't get it to recognize more than a small fraction of incoming spam.


Did you also give it a large amount of good emails? This is important, and not advertised often enough...

--C


I tried giving it all my Inbox and sub-folder messages as Not Junk, and it didn't appear to improve things at all. But this could be because on my machine, training.dat is only haphazardly updating with changes to junk mail settings.
User avatar
christos
Posts: 126
Joined: September 5th, 2003, 5:42 pm
Location: Geneva

Post by christos »

mgl wrote:I tried giving it all my Inbox and sub-folder messages as Not Junk, and it didn't appear to improve things at all. But this could be because on my machine, training.dat is only haphazardly updating with changes to junk mail settings.


What is the size of training.dat? I've noticed that it has to get > 120 kB before it becomes effective (maybe not the same case for everyone though)

Another thing is that if all the good emails you train your filter with are from people on your address book, the filter *may* not consider them for spam (I'm not really sure about the implementation, just speculating here), so you are not really training it! Just another thing to consider...

--C
mgl
Posts: 29
Joined: August 22nd, 2003, 9:57 am

Post by mgl »

Nah, training.dat immediately jumps to about 217 kB when it starts working. And I train it on all the messages in my Inbox and sub-folders, not just from people in my address book.

In any case, some of us had major problems with earlier builds when we fed the junk mail filter a set of non-junk e-mails; in a lot of cases, it would cause a properly working filter to go haywire and mess things up. On my good junk filter prior to this build, I only trained it on false negatives (spam mistakenly identified as good) and a small handful of false positives I got over a few months. I never gave it a substantial corpus of non-junk, for fear it would break things. There are several discussions of these issues on the TB Bugs and Features forums.

What really comes through to me is that people seem to get wildly different results with the same junk filters, and that there are no good answers to why this is so. Everybody (it seems) uses a different training method, and so the filter works for some and not for others. We really need a standardized, idiot-proof step-by-step guide to this thing.
User avatar
DurianCS
Posts: 767
Joined: June 5th, 2003, 6:17 am
Location: The Netherlands

Post by DurianCS »

mgl wrote:Another thing is that if all the good emails you train your filter with are from people on your address book, the filter *may* not consider them for spam (I'm not really sure about the implementation, just speculating here), so you are not really training it! Just another thing to consider...


That's an interesting point. As far as I now from SpamBayes, whitelisting should not be necessary for this type of Bayesian filter, possibly on the contrary. So, possibly the whitelist option should not be checked.

CS
Durian, King of Fruits
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8a6) Gecko/20041227 Firefox/1.0+ (bangbang023)
TB version 1.0 (20041225)
Locked