Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Bayesian filter
  FAQ FAQ  Forum Search   Register Register  Login Login

Bayesian filter

 Post Reply Post Reply
Author
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Topic: Bayesian filter
    Posted: 22 December 2005 at 5:43pm
Zoro,

All new tokens for incoming emails are cached for several minutes, and the main corpus database is updated with the new tokens on regular intervals (I believe it's around 30 minutes or so). Due to this, you will receive incorrect results if you send emails and then check the corpus right away, as the content of those emails will not have been added to the main corpus database.

From the spam/good numbers you posted however, we see that your percentage of spam compared to good emails is very, very low, menaing that you receive very little spam (around 5%) in your emails. most installations see between 60% and 80% of spam instead, which is significantly higher. The Bayesian filter usually catches a very small % of spam as it's the last filter to be used. As in your case the % of spam is so low, the % of emails bllocked by the bayesian filter will likely be a small fraction of that, and so it is possible you will not see blocks from that filter.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Zoro View Drop Down
Newbie
Newbie


Joined: 27 October 2005
Status: Offline
Points: 7
Post Options Post Options   Thanks (0) Thanks(0)   Quote Zoro Quote  Post ReplyReply Direct Link To This Post Posted: 22 December 2005 at 1:01pm
I've been spamfilter running for 2 months (spam 16453, good 289412) but I've never seen any mail blocked by bayesian filter. Thinking something wrong in corpus db, I deleted it to reinitialize.
With new corpus db I've tried this test:
- I set the keyword filter to reject 'viagra'
- I sent a mail to spamfilter containing 'viagra' in subject
- I dump corpus and found the token
  *Subject*viagra,0,1,0,400000005960464,22/12/2005
- I sent another mail like the 1st one
- I dump the corpus and found again
   *Subject*viagra,0,1,0,400000005960464,22/12/2005
As you can see data are the same, while I supposed to get  spam counter=2 and an increased probability value.
Notice the most of tokens have the same probability value (0,400000005960464), also in the former significant corpus.
- As last proof I pasted the quarantined mail and copied it in the Bayes prob box: the result is 0% spam.

Something wrong in my configuration?
thank you for any suggestions
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.063 seconds.