Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Bayesian doesn't seem to work
  FAQ FAQ  Forum Search   Register Register  Login Login

Bayesian doesn't seem to work

 Post Reply Post Reply
Author
peet View Drop Down
Newbie
Newbie
Avatar

Joined: 01 August 2007
Location: United States
Status: Offline
Points: 21
Post Options Post Options   Thanks (0) Thanks(0)   Quote peet Quote  Post ReplyReply Direct Link To This Post Topic: Bayesian doesn't seem to work
    Posted: 19 October 2009 at 10:55pm
I'm not sure why, but Bayesian filter doesn't seem to filter out any e-mails.
I've cranked it down over the weeks little at a time and now I'm at 13.929% and still nothing seem to get caught.

I took an e-mail's raw content and dumped it to the Bayesian Probability screen, clicked the Show Bayes Prob, and on the Corpus Database tab I got:


10/19/09 19:50:56:203 -- **** R E S U L T S *********
10/19/09 19:50:56:203 -- passes Bayesian filter - 0% spam
10/19/09 19:51:23:968 -- **** R E S U L T S *********
10/19/09 19:51:23:968 -- passes Bayesian filter - 0% spam
10/19/09 19:51:49:781 -- **** R E S U L T S *********
10/19/09 19:51:49:781 -- passes Bayesian filter - 0% spam

I have learn new incoming enabled.
The folder: \SpamFilter\corpus has:
db.dat at 77MB
db.dat.prb at 61 MB

Corpus.ini file says:  (but I'm not sure what that means)
[Messages]
Spam=503515
Good=1452

Any thoughts?
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 20 October 2009 at 8:45pm
peet,

The Bayesian filter will start blocking emails only after 5000 valid emails and 5000 spam have been received. This is because this filter needs enough initial data on the incoming traffic to make accurate predictions about the future ones. In your case, you only received 1452 good emails, so the Bayesian filter is still in "learning" mode, analyzing the incoming traffic without stopping any.

Also pelase note that, when primed, this filter will be very selective. Most emails will be either 0.001% clean, or 99.99% spam. You will see thus very "clear-cut" probabilities that an email is either spam or not. In addition, since this filter is the one that is applied for last, after all the other filters have been applied, most of the spam will already have been caught by the other filters, so there will be very little left for this filter to stop. Often the Bayesian filter will block less than 0.1% - 1% of the spam when compared to the other filters, as most spam will already have been blocked.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
peet View Drop Down
Newbie
Newbie
Avatar

Joined: 01 August 2007
Location: United States
Status: Offline
Points: 21
Post Options Post Options   Thanks (0) Thanks(0)   Quote peet Quote  Post ReplyReply Direct Link To This Post Posted: 20 October 2009 at 9:04pm
Roberto,
Thanks!

Can this be expedited? Meaning the 5000 reduced to 2000 for example?

Also, in the e-mail headers, will the Bayesian add the % probability of it being spam or not?
In the Web quarantine review, I'd like to show based on some header data if an e-mail is Low probability of being spam, medium or high probability of being spam.
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 20 October 2009 at 9:13pm
Yes, this can be changed, but we don't recommend to as it may cause inaccurate results. If you still wish to proceed, look for the setting:

MinEmailsForBayesKickIn=5000

in the SpamFilter.ini file. There's no need to restart SpamFilter after the change.

The Bayesian filter will not log its value in the headers, especially since, as I mentioned earlier, it's only used for less than 1% of the incoming emails, meaning that more than 99% of the email will already have been blocked before the bayesian filter has a chance to look at them, making any stats it would compute for the remaining 1% not very useful.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.293 seconds.