Print Page | Close Window

bayesian filter kicked in after 5000 mails, and is now stopping much n

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: http://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=3761
Printed Date: 22 October 2017 at 6:20am


Topic: bayesian filter kicked in after 5000 mails, and is now stopping much n
Posted By: Guests
Subject: bayesian filter kicked in after 5000 mails, and is now stopping much n
Date Posted: 11 June 2004 at 9:18am

we implemented logsat into production 2 days ago, today 5000 emails went trough, and bayesian filtering kicked in. 
since that time a whole lot of normal messages are being blocked because they get a value of 100% spam ?

did something go wrong with the learning process ?

what can I do to stop this ?

thanks, igor L.




Replies:
Posted By: LogSat
Date Posted: 12 June 2004 at 12:08am

Igor,

As more and more emails are received, SpamFilter will adapt to the kind of email traffic and recognize more and more spam as it comes in. But during the initial training period, more or less the first 24 hours / 10000 emails, it is important that the number of false positives be reduced to a minimum so the learining process is accurate. When an email is taken out of the quarantine, SpamFilter will know and will learn that similar emails are probably going to be legitimate.

We'd recommend you stop SpamFilter, delete the SpamFilter\corpus directory, then restart SpamFilter. This will clear your existing statistical database, so you may start from scratch. You may also do this during the morning of a regular work-day (non-weekend) as usually most legitimate emails are sent during the daytime. This will allow more "clean" emails to go thru the statistical filter during the initial training period, allowing a better learning process.

Roberto F.
LogSat Software

Roberto F.
LogSat Software



Posted By: Guests
Date Posted: 16 June 2004 at 8:54am

helo,

I did as suggested, on mondaymorning : deleted corpus folder, restarted service.

I then closely monitored for false positives, there were maybe 4. 

today we've passed 5000 mails again, and bayesian filtering started.

now nothing is being stopped by the bayesian filter ? also when I paste an obvious spam mail in the bayesian probabillity test, it says 0% spam.

this is like the opposite of previous behaviour.

as suggested I reduced false positives to a minimum,
I do have a whole lot of keyword filtering wich stops most of the spam.

-did something go wrong with training the filter again ?
-what is the correct way to train this filter ?

thanks, igor



Posted By: LogSat
Date Posted: 16 June 2004 at 11:18pm

Igor,

Can you please zip us your corpus directory so we can take a look at your corpus database?

Can you also please run the following query on the SpamFilter database and let us know he results?

SELECT     tblQuarantine.RejectID, tblRejectCodes.RejectDesc, COUNT(tblQuarantine.RejectID) AS Total
FROM         tblQuarantine INNER JOIN
                      tblRejectCodes ON tblQuarantine.RejectID = tblRejectCodes.RejectID
GROUP BY tblQuarantine.RejectID, tblRejectCodes.RejectDesc

Roberto F.
LogSat Software



Posted By: nippe
Date Posted: 23 June 2004 at 1:49pm

You wrote:

When an email is taken out of the quarantine, SpamFilter will know and will learn that similar emails are probably going to be legitimate.

Q:  Deliver and delete. Same thing (in this case)?



Posted By: LogSat
Date Posted: 24 June 2004 at 12:58am

Not sure I understood the question.

Email in the quarantine has already been "learned" by the filter as being bad. If it's a false positive, the only way for the filter to "unlearn" about it being bad is when a user (or an admin) forces that email to be delivered to the end user. Only when that happens the filter reverses the score, and then continues to assign different probabilities to the tokens so that they are more likely to be considered "good" in the future.

Roberto F.
LogSat Software



Posted By: nippe
Date Posted: 27 June 2004 at 8:10am

the only way for the filter to "unlearn" about it being bad is when a user (or an admin) forces that email to be delivered

OK  - and deleting a mail from the quarantine is not learning (or unlearning) the filter anything?




Print Page | Close Window