Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - another bayesian question
  FAQ FAQ  Forum Search   Register Register  Login Login

another bayesian question

 Post Reply Post Reply
johnsm View Drop Down

Joined: 20 June 2005
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote johnsm Quote  Post ReplyReply Direct Link To This Post Topic: another bayesian question
    Posted: 20 June 2005 at 3:33pm

I downloaded your software this weekend in an effort to find a decent solution to my spam woes. Everything seems to work fine so far. The documentation leaves a lot to be desired, but I managed none the less.

From what I read on your web site and through this forum, your Bayesian filter only works by learning in real time (having actual messages pass through it). And you say this is done so that the filter is customized to my environment. I have a collection of emails (about 10,000 each both spam and ham), that I would use to train any Bayesian filter I might try and use. They are my emails, hence my environment. From my understanding these emails are useless to your filter (yes, I realize there are other ways of filtering other than using the Bayesian exclusively), but I would like to train the Bayesian filter from these emails.

So, I have two questions which are basically the same: Am I wasting my time, by writing a script to "redeliver" all of these messages through you program to a dummy account and then just delete them? When I deliver the spam, I would just tell it to consider it all spam and then just delete them, and then deliver the ham and set it to let all of these pass through.

Question 2: When you force a quarantined message to be delivered, does it learn anything more than just the "From" and "To" message fields that are stored in the autoforcewhitelist file? Would it be better (and very time consuming) to redeliver all of my ham messages, let them get quarantined, and then force deliver them, or is that useless as far as training the Bayesian filter?





Back to Top
LogSat View Drop Down
Admin Group
Admin Group

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4077
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 21 June 2005 at 7:17pm

Assuming that you are re-delivering *exactly* the same emails, with the originial email source and headers (remembering that some email clients completely change the email), then yes, the process in your 1st question would work and would train the Bayesian filter correctly.

For your 2nd question, when a false positve is forced-delivered, the email is again processed by the Bayesian filter to re-train it and let it know that the email was actually good. This will update the corpus database with the new info so that future similar emails are treated differently.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

This page was generated in 0.227 seconds.