Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - bayesian filter kicked in after 5000 mails, and is now stopping much n
  FAQ FAQ  Forum Search   Register Register  Login Login

bayesian filter kicked in after 5000 mails, and is now stopping much n

 Post Reply Post Reply
Author
igor.L View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote igor.L Quote  Post ReplyReply Direct Link To This Post Topic: bayesian filter kicked in after 5000 mails, and is now stopping much n
    Posted: 11 June 2004 at 9:18am

we implemented logsat into production 2 days ago, today 5000 emails went trough, and bayesian filtering kicked in. 
since that time a whole lot of normal messages are being blocked because they get a value of 100% spam ?

did something go wrong with the learning process ?

what can I do to stop this ?

thanks, igor L.

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4105
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 12 June 2004 at 12:08am

Igor,

As more and more emails are received, SpamFilter will adapt to the kind of email traffic and recognize more and more spam as it comes in. But during the initial training period, more or less the first 24 hours / 10000 emails, it is important that the number of false positives be reduced to a minimum so the learining process is accurate. When an email is taken out of the quarantine, SpamFilter will know and will learn that similar emails are probably going to be legitimate.

We'd recommend you stop SpamFilter, delete the SpamFilter\corpus directory, then restart SpamFilter. This will clear your existing statistical database, so you may start from scratch. You may also do this during the morning of a regular work-day (non-weekend) as usually most legitimate emails are sent during the daytime. This will allow more "clean" emails to go thru the statistical filter during the initial training period, allowing a better learning process.

Roberto F.
LogSat Software

Roberto F.
LogSat Software

Back to Top
igor L. View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote igor L. Quote  Post ReplyReply Direct Link To This Post Posted: 16 June 2004 at 8:54am

helo,

I did as suggested, on mondaymorning : deleted corpus folder, restarted service.

I then closely monitored for false positives, there were maybe 4. 

today we've passed 5000 mails again, and bayesian filtering started.

now nothing is being stopped by the bayesian filter ? also when I paste an obvious spam mail in the bayesian probabillity test, it says 0% spam.

this is like the opposite of previous behaviour.

as suggested I reduced false positives to a minimum,
I do have a whole lot of keyword filtering wich stops most of the spam.

-did something go wrong with training the filter again ?
-what is the correct way to train this filter ?

thanks, igor

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4105
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 16 June 2004 at 11:18pm

Igor,

Can you please zip us your corpus directory so we can take a look at your corpus database?

Can you also please run the following query on the SpamFilter database and let us know he results?

SELECT     tblQuarantine.RejectID, tblRejectCodes.RejectDesc, COUNT(tblQuarantine.RejectID) AS Total
FROM         tblQuarantine INNER JOIN
                      tblRejectCodes ON tblQuarantine.RejectID = tblRejectCodes.RejectID
GROUP BY tblQuarantine.RejectID, tblRejectCodes.RejectDesc

Roberto F.
LogSat Software

Back to Top
nippe View Drop Down
Newbie
Newbie


Joined: 03 February 2005
Status: Offline
Points: 12
Post Options Post Options   Thanks (0) Thanks(0)   Quote nippe Quote  Post ReplyReply Direct Link To This Post Posted: 23 June 2004 at 1:49pm

You wrote:

When an email is taken out of the quarantine, SpamFilter will know and will learn that similar emails are probably going to be legitimate.

Q:  Deliver and delete. Same thing (in this case)?

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4105
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 24 June 2004 at 12:58am

Not sure I understood the question.

Email in the quarantine has already been "learned" by the filter as being bad. If it's a false positive, the only way for the filter to "unlearn" about it being bad is when a user (or an admin) forces that email to be delivered to the end user. Only when that happens the filter reverses the score, and then continues to assign different probabilities to the tokens so that they are more likely to be considered "good" in the future.

Roberto F.
LogSat Software

Back to Top
nippe View Drop Down
Newbie
Newbie


Joined: 03 February 2005
Status: Offline
Points: 12
Post Options Post Options   Thanks (0) Thanks(0)   Quote nippe Quote  Post ReplyReply Direct Link To This Post Posted: 27 June 2004 at 8:10am

the only way for the filter to "unlearn" about it being bad is when a user (or an admin) forces that email to be delivered

OK  - and deleting a mail from the quarantine is not learning (or unlearning) the filter anything?

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.109 seconds.