Print Page | Close Window

another bayesian question

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
Printed Date: 18 July 2018 at 4:13pm

Topic: another bayesian question
Posted By: johnsm
Subject: another bayesian question
Date Posted: 20 June 2005 at 3:33pm

I downloaded your software this weekend in an effort to find a decent solution to my spam woes. Everything seems to work fine so far. The documentation leaves a lot to be desired, but I managed none the less.

From what I read on your web site and through this forum, your Bayesian filter only works by learning in real time (having actual messages pass through it). And you say this is done so that the filter is customized to my environment. I have a collection of emails (about 10,000 each both spam and ham), that I would use to train any Bayesian filter I might try and use. They are my emails, hence my environment. From my understanding these emails are useless to your filter (yes, I realize there are other ways of filtering other than using the Bayesian exclusively), but I would like to train the Bayesian filter from these emails.

So, I have two questions which are basically the same: Am I wasting my time, by writing a script to "redeliver" all of these messages through you program to a dummy account and then just delete them? When I deliver the spam, I would just tell it to consider it all spam and then just delete them, and then deliver the ham and set it to let all of these pass through.

Question 2: When you force a quarantined message to be delivered, does it learn anything more than just the "From" and "To" message fields that are stored in the autoforcewhitelist file? Would it be better (and very time consuming) to redeliver all of my ham messages, let them get quarantined, and then force deliver them, or is that useless as far as training the Bayesian filter?





Posted By: LogSat
Date Posted: 21 June 2005 at 7:17pm

Assuming that you are re-delivering *exactly* the same emails, with the originial email source and headers (remembering that some email clients completely change the email), then yes, the process in your 1st question would work and would train the Bayesian filter correctly.

For your 2nd question, when a false positve is forced-delivered, the email is again processed by the Bayesian filter to re-train it and let it know that the email was actually good. This will update the corpus database with the new info so that future similar emails are treated differently.

Roberto Franceschetti" rel="nofollow - LogSat Software" rel="nofollow - Spam Filter ISP

Print Page | Close Window