another bayesian question |
Post Reply ![]() |
Author | |
johnsm ![]() Newbie ![]() Joined: 20 June 2005 Status: Offline Points: 2 |
![]() ![]() ![]() ![]() ![]() Posted: 20 June 2005 at 3:33pm |
I downloaded your software this weekend in an effort to find a decent solution to my spam woes. Everything seems to work fine so far. The documentation leaves a lot to be desired, but I managed none the less. From what I read on your web site and through this forum, your Bayesian filter only works by learning in real time (having actual messages pass through it). And you say this is done so that the filter is customized to my environment. I have a collection of emails (about 10,000 each both spam and ham), that I would use to train any Bayesian filter I might try and use. They are my emails, hence my environment. From my understanding these emails are useless to your filter (yes, I realize there are other ways of filtering other than using the Bayesian exclusively), but I would like to train the Bayesian filter from these emails. So, I have two questions which are basically the same: Am I wasting my time, by writing a script to "redeliver" all of these messages through you program to a dummy account and then just delete them? When I deliver the spam, I would just tell it to consider it all spam and then just delete them, and then deliver the ham and set it to let all of these pass through. Question 2: When you force a quarantined message to be delivered, does it learn anything more than just the "From" and "To" message fields that are stored in the autoforcewhitelist file? Would it be better (and very time consuming) to redeliver all of my ham messages, let them get quarantined, and then force deliver them, or is that useless as far as training the Bayesian filter? thanks.
|
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4105 |
![]() ![]() ![]() ![]() ![]() |
John,
Assuming that you are re-delivering *exactly* the same emails, with the originial email source and headers (remembering that some email clients completely change the email), then yes, the process in your 1st question would work and would train the Bayesian filter correctly. For your 2nd question, when a false positve is forced-delivered, the email is again processed by the Bayesian filter to re-train it and let it know that the email was actually good. This will update the corpus database with the new info so that future similar emails are treated differently. |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.103 seconds.