Test Win32 Build Available with Junk Mail Filter Changes

Discussion about official Mozilla Thunderbird builds
Locked
avbohemen
Posts: 292
Joined: April 25th, 2003, 6:50 am
Location: The Netherlands

Post by avbohemen »

OK, did that. After a bit of training, it is working better than I expected. Every new message is marked as spam (or not) correctly. However, the same thing happened the first time I ever used Thunderbird. It will probably take a week before I can really come to a conclusion. To be continued...
SarCaSM
Posts: 14
Joined: October 21st, 2003, 12:19 pm

Post by SarCaSM »

ok, I deleted training.dat, now shouldn't thunderbird not be moving junk to my junk folder right now? shouldn't I have to train it all over again before it starts doing this again?
mgl
Posts: 29
Joined: August 22nd, 2003, 9:57 am

Post by mgl »

Well, TB's Junk filter has been working OK for me recently, but I'm giving this new version a try. I'm currently training it on my Junk corpus of about 4,000 messages, and have already told it that my Inbox messages are Not Junk.

I'm never sure if this is the right method, but this is what I do to train the filter:

1) Select "Run Junk Mail Controls on Folder" for my Junk folder. The Junk filter will often un-mark some junk messages (false negatives).
2) Re-mark the false negatives, then select "Run Junk Mail Controls on Folder" again.
3) Do this until no false neagtives appear.

Using the new TB build, I'm noticing that it is extremely unwilling to accept about one-third of my junk mail corpus as junk, and keeps marking them as Not Junk, even after numerous re-markings. I will persist until it gets the idea, I guess.

Another thing is that the "Sort by Junk Status" button doesn't work very well on my PC (WinXP SP1). After more than one re-marking cycle, it mixes them all up. I'm not very good at navigating Bugzilla, so I'm not sure if this has been listed.

But thanks, Scott, for working on improving the filter.
mscott
Posts: 2516
Joined: April 2nd, 2003, 4:10 pm
Location: Thunderbird Research Center, CA
Contact:

Post by mscott »

hopefully you aren't using all 4,000 messages as indicators of junk. that'd be a pretty big training set. :)

junk status doesn't get updated "live" if you change the status of junk messages by hand while you are inside the view. Toggling back to all and then back to the not junk view should reset things.

your issue with the filter un-marking messages that are marked as junk because it thinks it is a false negative is worrisome.

So far the filter is running at 100% for me over the last 3 days, but I've only gotten ~ 50 spam messages over the course of that time.
Thunderbirds are Go!
yusufg
Posts: 88
Joined: December 2nd, 2002, 11:13 pm
Location: Hong Kong

Post by yusufg »

mscott, what's the bug number where the new algorithm is being discussed, have you talked to the bogofilter/spambayes guy to see if their tokenisation/classification stuff can be incorporated. the bogofilter guys have done a lot of work on building test harnesses and looking at various corpora
mscott
Posts: 2516
Joined: April 2nd, 2003, 4:10 pm
Location: Thunderbird Research Center, CA
Contact:

Post by mscott »

all of this work is based on what the spam bayes folks did. Including the new algorithm (the same one they use) and the new tokenizer (still a work in progress)
Thunderbirds are Go!
Roti
Posts: 40
Joined: July 17th, 2003, 4:33 am

Post by Roti »

Hi!

Is there any handling of messages where the header contains flase X-Mailer like:
X-Mailer: khdehz ljrbo brdeq xhxfwtm
These are all spam.
User avatar
DurianCS
Posts: 767
Joined: June 5th, 2003, 6:17 am
Location: The Netherlands

Post by DurianCS »

mscott wrote:all of this work is based on what the spam bayes folks did. Including the new algorithm (the same one they use) and the new tokenizer (still a work in progress)


That's great. I'm pretty impressed by the results I get with Spambayes.

CS
Durian, King of Fruits
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8a6) Gecko/20041227 Firefox/1.0+ (bangbang023)
TB version 1.0 (20041225)
User avatar
djst
Moderator
Posts: 2826
Joined: November 5th, 2002, 1:34 am
Location: Sweden
Contact:

Post by djst »

I just deleted training.dat and started the test build. Let's see if this one is any better.

One nice feature I have been thinking about is to automatically move a message to the junk folder if you manually mark it as junk. Right now I always have to 1) mark it as junk and 2) move it to trash. It would be much more convenient if Thunderbird did step 2 automatically (but moved to junk folder instead of trash). The same goes for the other way around with false positives: If you are in the junk folder and mark a message as not junk, it would be nice if that message was moved back to the inbox (and maybe run through the mail control filters so it was placed correctly). Just a thought.
ferdinand
Posts: 87
Joined: August 17th, 2003, 1:30 pm
Location: Netherlands

Re: Warning: random thought

Post by ferdinand »

nicklott wrote:Would it be worthwhile to tune the junk threshold automatically d'you think? Perhaps by measuring the number of messages corrected by the user (so every corrected false positive nudges the value up and every spam missed nudges it down)?

Maybe make an intelligent slider in the spammenu? With a safe default setting that calculates when you would get false positives(real mail marked as spam). So I don't mean a slider with 0.0 to 1.0. You could set it to never, sometimes, often or something.
Also it would be nice if a spam has a high score it would be put in de junkmail folder AND be 'trained" by the junkmail filter. A maybe-spam folder would be nice too :)

2) For those who actually understand the math behind bayesian based junk algorithms, you can fine tune the sensitivity of the filter by changing the following pref. Otherwise I suggest leaving it alone.

+// the probablilty threshold over which messages are classified as junk
+// this number is divided by 100 before it is used. The classifier can be fine
tuned
+// by changing this pref. Typical values are .99, .95, .90, etc.
+pref("mail.adaptivefilters.junk_threshold", 90);

Where? How?

Add the last line of your quote (without the "+" sign) to a file called "user.js" in your profile.

Thanks :)
Last edited by ferdinand on February 17th, 2004, 5:25 am, edited 1 time in total.
lauwersw
Posts: 22
Joined: July 2nd, 2003, 1:01 am

Post by lauwersw »

djst wrote:One nice feature I have been thinking about is to automatically move a message to the junk folder if you manually mark it as junk. Right now I always have to 1) mark it as junk and 2) move it to trash. It would be much more convenient if Thunderbird did step 2 automatically (but moved to junk folder instead of trash).


It does! Check in Tools -> Junk Mail Controls: Settings: Handling. There's a checkbox saying "When I manually mark messages as Junk:" and then the abilitiy to choose your folder.

djst wrote:The same goes for the other way around with false positives: If you are in the junk folder and mark a message as not junk, it would be nice if that message was moved back to the inbox (and maybe run through the mail control filters so it was placed correctly). Just a thought.


I second that. Now you unmark it, drag it back to your Inbox and it appears there even twice! Once marked as read, once as unread.
avbohemen
Posts: 292
Joined: April 25th, 2003, 6:50 am
Location: The Netherlands

Post by avbohemen »

Ferdinand: Add the last line of your quote (without the "+" sign) to a file called "user.js" in your profile. :)
User avatar
Quark
Posts: 173
Joined: December 10th, 2002, 8:19 am

Post by Quark »

I have a question about tricking the filter. At my school, we have a filter that gives mail it's own spam rating, then creates a header labelled X-Perlmx-Spam. Under this it rates how high % the chance of the message is spam, and if it's high enough the filter modifies the subject with something like "High Spam Probability 73%:::: Subject". Occassionally Thunderbird misses these ones, is it possible to trick it into catching them? Will this particular build help that more?
ferdinand
Posts: 87
Joined: August 17th, 2003, 1:30 pm
Location: Netherlands

Post by ferdinand »

Quark wrote:I have a question about tricking the filter. At my school, we have a filter that gives mail it's own spam rating, then creates a header labelled X-Perlmx-Spam. Under this it rates how high % the chance of the message is spam, and if it's high enough the filter modifies the subject with something like "High Spam Probability 73%:::: Subject". Occassionally Thunderbird misses these ones, is it possible to trick it into catching them? Will this particular build help that more?

Maybe you could just make a filter that reads the header?

Ah, this proves that I'm full of bright ideas and at the same time very ignorant.

It would have been a bright idea to look for een option you want before asking for it....
Last edited by ferdinand on February 17th, 2004, 6:43 am, edited 1 time in total.
User avatar
djst
Moderator
Posts: 2826
Joined: November 5th, 2002, 1:34 am
Location: Sweden
Contact:

Post by djst »

lauwersw wrote:
djst wrote:One nice feature I have been thinking about is to automatically move a message to the junk folder if you manually mark it as junk. Right now I always have to 1) mark it as junk and 2) move it to trash. It would be much more convenient if Thunderbird did step 2 automatically (but moved to junk folder instead of trash).


It does! Check in Tools -> Junk Mail Controls: Settings: Handling. There's a checkbox saying "When I manually mark messages as Junk:" and then the abilitiy to choose your folder.

Ah, this proves that I'm full of bright ideas and at the same time very ignorant. :biggrin:
Locked