Print Page | Close Window

Bayesian filter

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=5427
Printed Date: 09 May 2025 at 1:23am


Topic: Bayesian filter
Posted By: Zoro
Subject: Bayesian filter
Date Posted: 22 December 2005 at 1:01pm
I've been spamfilter running for 2 months (spam 16453, good 289412) but I've never seen any mail blocked by bayesian filter. Thinking something wrong in corpus db, I deleted it to reinitialize.
With new corpus db I've tried this test:
- I set the keyword filter to reject 'viagra'
- I sent a mail to spamfilter containing 'viagra' in subject
- I dump corpus and found the token
  *Subject*viagra,0,1,0,400000005960464,22/12/2005
- I sent another mail like the 1st one
- I dump the corpus and found again
   *Subject*viagra,0,1,0,400000005960464,22/12/2005
As you can see data are the same, while I supposed to get  spam counter=2 and an increased probability value.
Notice the most of tokens have the same probability value (0,400000005960464), also in the former significant corpus.
- As last proof I pasted the quarantined mail and copied it in the Bayes prob box: the result is 0% spam.

Something wrong in my configuration?
thank you for any suggestions



Replies:
Posted By: LogSat
Date Posted: 22 December 2005 at 5:43pm
Zoro,

All new tokens for incoming emails are cached for several minutes, and the main corpus database is updated with the new tokens on regular intervals (I believe it's around 30 minutes or so). Due to this, you will receive incorrect results if you send emails and then check the corpus right away, as the content of those emails will not have been added to the main corpus database.

From the spam/good numbers you posted however, we see that your percentage of spam compared to good emails is very, very low, menaing that you receive very little spam (around 5%) in your emails. most installations see between 60% and 80% of spam instead, which is significantly higher. The Bayesian filter usually catches a very small % of spam as it's the last filter to be used. As in your case the % of spam is so low, the % of emails bllocked by the bayesian filter will likely be a small fraction of that, and so it is possible you will not see blocks from that filter.


-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP



Print Page | Close Window