Test Win32 Build Available with Junk Mail Filter Changes
-
- Posts: 79
- Joined: May 2nd, 2003, 8:21 pm
- Contact:
-
- Posts: 478
- Joined: July 21st, 2003, 4:45 am
- Location: Nottingham, UK
- Contact:
- Moonwolf
- Posts: 531
- Joined: December 7th, 2003, 2:50 pm
- Location: Hertfordshire, England
- Contact:
Yesterday it was working at around 95% for me. Then I marked a false positive as not junk. This morning it missed 95% of my spam. Still some work to do I think.
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1
Thunderbird 1.0 (20041206)
EMbuttons: Buttons & options for the Extension Manager. Easy Get Mail Button is here too.
Thunderbird 1.0 (20041206)
EMbuttons: Buttons & options for the Extension Manager. Easy Get Mail Button is here too.
-
- Posts: 478
- Joined: July 21st, 2003, 4:45 am
- Location: Nottingham, UK
- Contact:
- Moonwolf
- Posts: 531
- Joined: December 7th, 2003, 2:50 pm
- Location: Hertfordshire, England
- Contact:
Yes, it has been checked into the trunk.
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1
Thunderbird 1.0 (20041206)
EMbuttons: Buttons & options for the Extension Manager. Easy Get Mail Button is here too.
Thunderbird 1.0 (20041206)
EMbuttons: Buttons & options for the Extension Manager. Easy Get Mail Button is here too.
-
- Posts: 478
- Joined: July 21st, 2003, 4:45 am
- Location: Nottingham, UK
- Contact:
-
- Posts: 478
- Joined: July 21st, 2003, 4:45 am
- Location: Nottingham, UK
- Contact:
-
- Posts: 202
- Joined: May 30th, 2003, 7:58 am
Scott:
I notice in one particualr mailbox i get lots and lots of spam that TB steadfastly refuses to catch. Would getting your hands on a few hundred (or thousand, wouldnt take long) pieces of it help you at all in analyzing why the filter wont catch it? I'd be more than pleased to post it or something if it would help your analysis
I notice in one particualr mailbox i get lots and lots of spam that TB steadfastly refuses to catch. Would getting your hands on a few hundred (or thousand, wouldnt take long) pieces of it help you at all in analyzing why the filter wont catch it? I'd be more than pleased to post it or something if it would help your analysis
-
- Posts: 29
- Joined: August 22nd, 2003, 9:57 am
Kylotan wrote:Whereas I just got another 4 spams that Thunderbird totally missed and Popfile picked up. How can people get such totally different results?
This is a classic issue with TB's junk mail filter, as detailed in several threads in the TB forums. Some people get POPFile-quality filtering from TB with very little effort, while others (like you, and me when I had this special build installed) consistently fail to achieve even a mild level of success with it. I suspect part of the answer has to do with the filter being very sensitive to the training method used--do it even slightly wrong, and you're all messed up.
For instance, I thought that training TB on my collected Junk Mail corpus of about 4,000 messages would be sure to help it learn what spam is. After all, the corpus contains just about every permutation of "Viagra" and "mortgage" you can imagine, so what better way to teach the filter than to provide with thousands of messages which are all unambiguously junk? Turns out it doesn't work that way, for some reason: it <b>still</b> failed to pick up even the most obvious incoming spam.
I deleted training.dat several times--I tried different strategies, marking all my good messages as Not Junk, then I tried not doing that, <i>then</i> I tried marking only false positives. I tried these things without training the new build on the large corpus, but only on incoming spam. None of it helped--the filter was still failing to pick up most of the incoming junk.
So I went back to the nightly, restored my old training.dat, and I'm back to 85-90% performance--good enough for me, but not competitive with POPFile etc. This thing is a mystery to me.
-
- Posts: 2516
- Joined: April 2nd, 2003, 4:10 pm
- Location: Thunderbird Research Center, CA
- Contact:
-
- Posts: 478
- Joined: July 21st, 2003, 4:45 am
- Location: Nottingham, UK
- Contact:
Yeah. As I understand it, a Bayesian classifier trained with 4000 junk mails should easily catch <i>every</i> junk mail you throw at it, except for utterly novel ones. The problem then would be false positives, but you'd quickly fix that by correcting the mistakes. The probabilistic nature of it should mean that only about 10 fixed false positives should be enough to correct this.
It's also interesting that we're encouraged to train the filter on previous mails, when in fact these systems tend to work better when only trained on errors.
It's also interesting that we're encouraged to train the filter on previous mails, when in fact these systems tend to work better when only trained on errors.
-
- Posts: 478
- Joined: July 21st, 2003, 4:45 am
- Location: Nottingham, UK
- Contact:
- DizzyWeb
- Posts: 637
- Joined: March 27th, 2003, 9:56 am
mscott wrote:btw, none of this is checked into the trunk. It's an experiment.
Also, there were some bugs in the new algorithm that someone is fixing for me.
Those bugs would be the ones where it doesn't catch anything after you mark anything as junk or not junk?
The author can never, in no way, be held responsible for any harm caused, mental or physical, by reading this post.