LogSat Software

Message Topic Search Topic Options Post Reply Create New Topic Printable Version Translate Topic

   I downloaded your software this weekend in an effort to find a decent solution to my spam woes. Everything seems to work fine so far. The documentation leaves a lot to be desired, but I managed none the less. 

From what I read on your web site and through this forum, your Bayesian filter only works by learning in real time (having actual messages pass through it). And you say this is done so that the filter is customized to my environment. I have a collection of emails (about 10,000 each both spam and ham), that I would use to train any Bayesian filter I might try and use. They are my emails, hence my environment. From my understanding these emails are useless to your filter (yes, I realize there are other ways of filtering other than using the Bayesian exclusively), but I would like to train the Bayesian filter from these emails.
So, I have two questions which are basically the same: Am I wasting my time, by writing a script to "redeliver" all of these messages through you program to a dummy account and then just delete them? When I deliver the spam, I would just tell it to consider it all spam and then just delete them, and then deliver the ham and set it to let all of these pass through. 
Question 2: When you force a quarantined message to be delivered, does it learn anything more than just the "From" and "To" message fields that are stored in the autoforcewhitelist file? Would it be better (and very time consuming) to redeliver all of my ham messages, let them get quarantined, and then force deliver them, or is that useless as far as training the Bayesian filter?
thanks.

Author	Message Topic Search Topic Options Post Reply Create New Topic Printable Version Translate Topic
johnsm Members Profile Send Private Message Find Members Posts Add to Buddy List Newbie Joined: 20 June 2005 Status: Offline Points: 2	Post Options Post Reply Quote johnsm Report Post Thanks(0) Quote Reply Topic: another bayesian question Posted: 20 June 2005 at 3:33pm
	I downloaded your software this weekend in an effort to find a decent solution to my spam woes. Everything seems to work fine so far. The documentation leaves a lot to be desired, but I managed none the less. From what I read on your web site and through this forum, your Bayesian filter only works by learning in real time (having actual messages pass through it). And you say this is done so that the filter is customized to my environment. I have a collection of emails (about 10,000 each both spam and ham), that I would use to train any Bayesian filter I might try and use. They are my emails, hence my environment. From my understanding these emails are useless to your filter (yes, I realize there are other ways of filtering other than using the Bayesian exclusively), but I would like to train the Bayesian filter from these emails. So, I have two questions which are basically the same: Am I wasting my time, by writing a script to "redeliver" all of these messages through you program to a dummy account and then just delete them? When I deliver the spam, I would just tell it to consider it all spam and then just delete them, and then deliver the ham and set it to let all of these pass through. Question 2: When you force a quarantined message to be delivered, does it learn anything more than just the "From" and "To" message fields that are stored in the autoforcewhitelist file? Would it be better (and very time consuming) to redeliver all of my ham messages, let them get quarantined, and then force deliver them, or is that useless as far as training the Bayesian filter? thanks.

LogSat Members Profile Send Private Message Find Members Posts Add to Buddy List Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4106	Post Options Post Reply Quote LogSat Report Post Thanks(0) Quote Reply Posted: 21 June 2005 at 7:17pm
	John, Assuming that you are re-delivering exactly the same emails, with the originial email source and headers (remembering that some email clients completely change the email), then yes, the process in your 1st question would work and would train the Bayesian filter correctly. For your 2nd question, when a false positve is forced-delivered, the email is again processed by the Bayesian filter to re-train it and let it know that the email was actually good. This will update the corpus database with the new info so that future similar emails are treated differently.
	Roberto Franceschetti LogSat Software Spam Filter ISP

LogSat Software

Site Navigation[Skip]

Spam Filter ISP Support Forum

another bayesian question