Test Win32 Build Available with Junk Mail Filter Changes

Discussion about official Mozilla Thunderbird builds
Locked
mgl
Posts: 29
Joined: August 22nd, 2003, 9:57 am

Post by mgl »

mscott wrote:hopefully you aren't using all 4,000 messages as indicators of junk. that'd be a pretty big training set. :)

junk status doesn't get updated "live" if you change the status of junk messages by hand while you are inside the view. Toggling back to all and then back to the not junk view should reset things.

your issue with the filter un-marking messages that are marked as junk because it thinks it is a false negative is worrisome.


Well, I did use all 4,000 junk messages to train it, and ran the "Run Junk Mail Controls on Folder" repeatedly until it returned no more false negatives. But yes, I'd manually mark them all as Junk, then TB would un-mark a (decreasing) number of them each time. But I'm probably doing it wrong--I got 9 spam messages this morning, and TB only caught one or two. Perhaps by the time we get to release 1.0, someone could put some detailed instructions into the release notes: how to train most efficiently, whether to mark non-spam messages as Not Junk, etc. From reading these forums over the last six months or so, it seems like everyone has a slightly different method, with quite different success rates.

Thanks on the junk status button--I noticed that it seemed to fix things by leaving the folder and coming back to it.
User avatar
DurianCS
Posts: 767
Joined: June 5th, 2003, 6:17 am
Location: The Netherlands

Post by DurianCS »

You should run the junk mail controls also on a sufficient amount of good mails (ham). This is important for a good training. I only used a small set of spam (500) and ham (300), but the score on the next set of spam was rather good: missed one of 500. For the score on new good mails, I don't have sufficient information (no errors in about 50 mails).

CS
Durian, King of Fruits
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8a6) Gecko/20041227 Firefox/1.0+ (bangbang023)
TB version 1.0 (20041225)
alecf
Posts: 11
Joined: November 22nd, 2003, 5:54 pm
Location: San Francisco, CA
Contact:

Post by alecf »

is there a bug which describes some of the token-parsing enhancements? I have a few ideas I'd like to contribute...
sasquatch
Posts: 6022
Joined: November 25th, 2003, 8:56 am

Post by sasquatch »

Dumb question: Are SPAM and HAM acronyms for something? Thanks.
jedbro
Posts: 1899
Joined: November 10th, 2002, 12:35 pm
Location: Mexico / Boulder Co.
Contact:

Post by jedbro »

Nice, thank you Scott.
I know I bugged you about this during pre-0.5, thanks for letting us test it out :)

Thanks again!
User avatar
humpton
Posts: 38
Joined: September 7th, 2003, 8:38 am

SPAM & HAM

Post by humpton »

sasquatch wrote:Dumb question: Are SPAM and HAM acronyms for something? Thanks.


SPAM & HAM are not acroynms for anything.

In reality Spam is a really really crapy version of Ham. Most would argue Ham really isn't involved in Spam at all. It's definitely in England and Australia, I have to confess to not seeing (or hearing) of it in the States.

Spam is called Spam, thanks to Monty Python and the Flying Circus series. There is a skit in which I think it's fair to say the word 'Spam' is used about 95% of the time - along the lines of "Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, everybody loves Spam" to a nice little tune. It can be very trying to hear the same thing over and over and over and over - the point of the skit. Some computer geeky types who also were Monty Python geeky types notice the similarity, hence Spam = spam.

I believe HAM is a very recent addition to show the opposite of SPAM, and doesn't have Monty Python linked to it.

A simple answer for you...

Stay JOLLY!
H
I got nothing clever to put here :(
Jaska
Posts: 113
Joined: January 6th, 2004, 12:54 pm

Post by Jaska »

I was just thinking earlier today, that Scott should contact the Spambayes people. They have really got the Bayesian filtering work. It was nice to read that Scott has already done that.
User avatar
R4F
Posts: 999
Joined: December 7th, 2003, 12:13 pm
Location: Netherlands

Post by R4F »

djst wrote:Ah, this proves that I'm full of bright ideas and at the same time very ignorant. biggrin:
Let me think up a suitable reply for this one..
[size=9:0df8c2550c]Firefox Help | Nederlands
Thunderbird Help | Nederlands[/size:0df8c2550c]
Jubijub
Posts: 135
Joined: April 11th, 2003, 10:25 am
Location: Lyon, France
Contact:

I think a very good point has been pointed out

Post by Jubijub »

-->maybe the filter is already perfect, but we're all using it in a wrong way.

I don't know at all how a bayesian filter works, I don't even know what bayesian is :D
I'm also ready to bet that I'm not the only one.

Since most of us don't know the principles of this filter, it's hard to make sure we're using it well. So maybe a short note on how to train properly this kind of filter could be a good idea, especially to make sure the filter is really working as intented...plus, it could be a perfect opportunity to make sure the UI is self-explanatory enough so that even perfect noobs when it comes to mail clients can use it properly too...
chuonthis
Posts: 519
Joined: July 23rd, 2003, 10:17 am

Post by chuonthis »

mgl wrote:Well, I did use all 4,000 junk messages to train it, and ran the "Run Junk Mail Controls on Folder" repeatedly until it returned no more false negatives. But yes, I'd manually mark them all as Junk, then TB would un-mark a (decreasing) number of them each time. But I'm probably doing it wrong--I got 9 spam messages this morning, and TB only caught one or two.

Maybe you can try lowering your junk_threshold pref.

2) For those who actually understand the math behind bayesian based junk algorithms, you can fine tune the sensitivity of the filter by changing the following pref. Otherwise I suggest leaving it alone.

// the probablilty threshold over which messages are classified as junk
// this number is divided by 100 before it is used. The classifier can be fine tuned
// by changing this pref. Typical values are .99, .95, .90, etc.
pref("mail.adaptivefilters.junk_threshold", 90);

be warned that the lower the number, the higher the false positive rate (messages that are incorrectly marked as junk) although the higher percentage of spam it catches.
avbohemen
Posts: 292
Joined: April 25th, 2003, 6:50 am
Location: The Netherlands

Post by avbohemen »

Results after 1 day of training and testing:

Trained the filter with about 30-35 spam mails, all I got yesterday after installing this build.
Trained the filter with about 80 non-spam mails, a small subset of my normal inbox, containing various types of mails (long, short, html, plaintext, etc).

Result this day:

91 spam mails received (which btw all got caugt by my ISP's spam filter, Brightmail) of which:
73 immediately marked as spam
18 still left unharmed.

That's about the same result as the "old" spam filter, which I trained every day with some of the spam that still made it through my inbox.

By the way, I haven't run the spam filter against my normal inbox to see if that would result in false positives. I won't do that eiter. In my opinion, spam should be caught at the time I receive a new message, when it's already in my inbox it's too late, I can manually delete it or mark it as spam. The filter doesn't need to do that for me.

I'd say, after the first day, this looks very promising. I think I trained the filter rather effectively. I tried several ways of training before, marking different groups and quantities of mail and this way had the best effect on me. With the new filter, it only took one day of training, compared to several weeks on the old filter. Let's see what tomorrow brings.

Maybe it's an idea to make an article on how to train the spam filter for best effectiveness on texturizer.net?
mscott
Posts: 2516
Joined: April 2nd, 2003, 4:10 pm
Location: Thunderbird Research Center, CA
Contact:

Post by mscott »

that's encouraging news!! Thanks for the feedack.
Thunderbirds are Go!
User avatar
fishbert
Posts: 941
Joined: November 29th, 2002, 12:02 am

Post by fishbert »

You know, it's times like these when I'm sorry that I only get a couple of spam emails a month.
User avatar
Tiger Bob
Posts: 53
Joined: November 29th, 2003, 12:53 pm
Location: South Africa

Post by Tiger Bob »

I have approximately 200 junk mails. I marked them all as not junk, installled the new software and deleted the training file. I then ran the junk mail control on the folder from the tools menu. 100% of the items was identified as junk. A great start!!
mlcampbe
Posts: 46
Joined: May 21st, 2003, 7:45 am

Post by mlcampbe »

Ok, something odd is going on here.

I have 2 email accounts - 1 is work-related and is IMAP the other is personal and is a yahoo account. I have setup TB to connect directly to my IMAP server and use yahoopops to download the yahoo email.

I have enabled 'adaptive junk mail' on both accounts with totally mixed results.

For the IMAP account it seems to be catching ALL of the junk mail. I've not see anything get through with the newest junk filter.

For the yahoopops account it seems that very little is getting tagged as spam. I have enabled the junk mail log and can see that somethings are properly tagged but a lot is not.

Is there something that would cause the bayes filtering not to work after mail is retrieved via yahoopops
Locked