Print Page | Close Window

Bayesian filter is not working properly

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: http://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=3990
Printed Date: 18 October 2017 at 2:37pm


Topic: Bayesian filter is not working properly
Posted By: Guests
Subject: Bayesian filter is not working properly
Date Posted: 16 July 2004 at 7:55pm

Hello,

I've just downloaded the trial version of SPAMFilter ISP and installed it successfully. I've also configured it to work with a SMTP server listening on another IP, on the same machine. Everything seems to be working fine as messages are being accepted by SPAMFilter and forwarded to the SMTP server.

Since the Bayesian filter comes empty, I was wondering how does SPAMFilter ISP know if an incoming message is a spam or a ham? Furthermore, I viewed the log file and saw that it always writes "passes Bayesian filter - 0% spam  (0ms)" for each message that it forwards.

The corpus folder is also empty, and I don't know how to teach the Bayesian filter of SPAMFilter ISP.

Any ideas how to proceed?

Thanks. Nadir.

 




Replies:
Posted By: LogSat
Date Posted: 17 July 2004 at 10:29am
Nadir,

The following is an excerpt from the readem.html file in the SpamFilter directory, which tells how the Bayesian filtering will train itself. If the corpus floder is empty, please check that the box labeled "learn new incoming emails" under the Settings - bayesian Filter tab is checked.

Roberto F.
LogSat Software

Bayesian Statistical Filtering
The new v2 release of SpamFilter ISP features statistical DNA fingerprinting of incoming emails. The statistical analysis is performed using Bayesian rules. Tokens within incoming emails are scanned and categorized in a corpus file. The content of all new incoming email is fingerprinted and checked against the historical data. If there is a high statistical probability that the email is spam, it is rejected.

The statistical engine kicks in after 5,000 non-spam and 5,000 spam emails have been received (values customizable by editing the SpamFilter.ini file). This is done to build a valid statistical base to use before emails are rejected. During this period of time, it is critical to avoid false positives. If a good email is quarantined, forcing it's redelivery either thru the web interface or the SpamFilter GUI will "teach" SpamFilter that the fingerprint in that email is a "good" one, and the statistical DNA database will adapt itself to it. It is very important initially to check the quarantine often to force delivery of legitimate email that has been blocked by the "regular" filtering rules.

A slider is used to control the accuracy of the statistical filter. Incoming emails are assigned a probability of being Spam, ranging from 0% (most likely a valid email) to 100% (most likely Spam). Any emails that have a probability of being spam above the value you set will be rejected. Typical threshold values are in the 99.9% range.

-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP



Print Page | Close Window