How Does Bayesian Learn ? |
Post Reply
|
| Author | |
Lee
Groupie
Joined: 04 February 2005 Location: United States Status: Offline Points: 50 |
Post Options
Thanks(0)
Quote Reply
Topic: How Does Bayesian Learn ?Posted: 13 December 2004 at 4:03pm |
|
I am trying to better understand how Spam Filters Bayesian works and if it is actually improving over time. My question is where does the Corpus db actually learn new words and how does it weight them ? Does it do this from only Keywords that are entered ? It would seem to me that the Bayesian filter should scan every rejected email and consider that to be spam. For example if spam is sent to an unknown address shouldn't the filter add words from those emails to it's db ? Lee |
|
![]() |
|
LogSat
Admin Group
Joined: 25 January 2005 Location: United States Status: Offline Points: 4106 |
Post Options
Thanks(0)
Quote Reply
Posted: 13 December 2004 at 10:11pm |
|
Lee,SpamFilter examines every single email that it receives (good and spam), and breaks it apart into tokens. Tokens from spam email are weighed and assigned a certain score, tokens from clean emails are weighed differently and assigned a different score. All tokens are inserted into the corpus database, and as new emails arrive, the corpus is updated in realtime and reloaded by SpamFilter every 10 minutes or so.The tokens are retrieved from the email's source, not from any of the keywords or filters you have configured. The SpamFilter keywords and filters simply help the statistical filter determine what is spam or not, they are not added to the corpus.Furthermore, when a user force-delivers a valid email that was mistakenly added to the quarantine, that is also further processed by SpamFilter, and its tokens are "tagged" to reflect a false positive, and the corpus database is updated to account for that fact to reduce the chance of a similar mistake happening in the future.Roberto F.
LogSat Software
|
|
![]() |
|
mikek
Senior Member
Joined: 22 February 2005 Location: Switzerland Status: Offline Points: 133 |
Post Options
Thanks(0)
Quote Reply
Posted: 16 December 2004 at 3:55am |
|
I am still convinced that for the bayesian filter to be effective, there should be some way to report false-negatives. Because right now, if you don't keep a lengthy list of keywords, too much spam will still get through, being tagged as "good" e-mail in the bayesian filter and therefore rendering the bayesian filter useless.
|
|
![]() |
|
keizersozay
Groupie
Joined: 26 January 2005 Location: United States Status: Offline Points: 77 |
Post Options
Thanks(0)
Quote Reply
Posted: 16 December 2004 at 2:37pm |
|
I've been thinking about what you said and I am going to try something. Hopefully Roberto can tell me if this will work or not too. What about adding an email adddress to spamfilter in the 'blacklist to' file for say 'reportspam@spam.mycompanyname' or something like that and add the ':nondr' option. notice I am not using a real domain. Now setup a distrobution list in your email server called 'reportspam' and include the above email address. (you may have to first set the above email address as a contact, then add the contact to the disto list) now setup your email server to forward email for that email address to SpamFilter using a smarthost...I think that can be done... This would automatically add the contents of the email to the baysfilter and help it be more effective. |
|
![]() |
|
LogSat
Admin Group
Joined: 25 January 2005 Location: United States Status: Offline Points: 4106 |
Post Options
Thanks(0)
Quote Reply
Posted: 17 December 2004 at 12:08am |
|
If I understood your idea correctly, end-users would be forwarding the spam to the reporting email. If so, unfortunately I do not believe that will yield accurate results. The statistical filter works on the email's source, and if a user has Microsoft Outlook for example, that client *completely* changes the email's source. WHen the user forward a message, it's completely different than the original, so new similar original messages won't even see the tokens the end-user has forwarded. Unless the user is able to forward the *full*, unmodified, original email's source this will not work as expected.Roberto F.
LogSat Software
|
|
![]() |
|
LogSat
Admin Group
Joined: 25 January 2005 Location: United States Status: Offline Points: 4106 |
Post Options
Thanks(0)
Quote Reply
Posted: 17 December 2004 at 12:13am |
|
Mike,While your statement is correct, SpamFilter's statistical filter could be trained better if we could feed it false-negatives, please note that a properly configured SpamFilter will block huge amounts of spam. During the past months for example, our own server has the following email counts:[Messages]
Spam=9105146
Good=559984We receive 20x more spam than good emails. Still, under these conditions, during the past 3 days we blocked 100,000 emails using MAPS filter, 90,000 using blacklisted countries, and only 1,000 using Bayesian filtering.The statistical filter is one of the last ones to be used as email arrives, other filters are applied first. And even though 1,000 may seem small compared to 100,000, it still means that we received one thousand less spam emails in three days....Roberto F.
LogSat Software
|
|
![]() |
|
Post Reply
|
|
|
Tweet
|
| Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.250 seconds.


Topic Options
Post Options
Thanks(0)


