The keyword filter now also searches in the "Received:" headers |
Post Reply
|
| Author | |
Alan
Guest Group
|
Post Options
Thanks(0)
Quote Reply
Topic: The keyword filter now also searches in the "Received:" headersPosted: 28 April 2004 at 6:09pm |
|
I noticed in the newer pre-release versions: // New to VersionNumber = '2.0.1.333'; I think that this will be a big plus to the Bayesian filtering in that spam that is routed through the same open-relays and/or from the same sources will be filtered. Roberto can you clarify the limitations of this new feature and how you see it being best utilized? |
|
![]() |
|
LogSat
Admin Group
Joined: 25 January 2005 Location: United States Status: Offline Points: 4106 |
Post Options
Thanks(0)
Quote Reply
Posted: 28 April 2004 at 11:48pm |
|
Alan, The usage of the ability to search the headers is something we'll leave to the user's inventive. What SpamFilter does is to retrieve all the "Received:" header values, and adds them to the body of the email so that the keyword filter will scan thru them as well. We have not included them in the Bayes analysis yet, as during our initial testing (that included all other headers as well) we were loosing some performance. This is something we may revisit in the near future however. Roberto F. |
|
![]() |
|
Alan
Guest Group
|
Post Options
Thanks(0)
Quote Reply
Posted: 29 April 2004 at 12:15pm |
|
Roberto, can you make it so the header info (or just parts of the header info such as "Receieved:") can be included in Bayesian filtering as an option? Maybe using a check box?I think it would really be a powerful new tool to catch spam that passed thorough some of the known open relays that some spammers find and continue to reuse. That way those that have powerful hardware can take advantage of the feature and those that feel they don't need/want to take a performance hit can leave it turned off. |
|
![]() |
|
LogSat
Admin Group
Joined: 25 January 2005 Location: United States Status: Offline Points: 4106 |
Post Options
Thanks(0)
Quote Reply
Posted: 30 April 2004 at 1:04am |
|
Alan, We're testing build 2.0.1.345 which is, as you requested, looking at all the Received: headers in the Bayesian filtering. More testing will be needed to see the effect this has on performance and the average size increase of the corpus database. This build is available for download in the registered user area on our website. If you'd like to test it we'd like to hear back from you how it's performing. Roberto F. |
|
![]() |
|
LogSat
Admin Group
Joined: 25 January 2005 Location: United States Status: Offline Points: 4106 |
Post Options
Thanks(0)
Quote Reply
Posted: 30 April 2004 at 1:08am |
|
As a followup on the previous answer, if you use the new 345 build, to be accurate you will probably need to start with a fresh corpus so that the received headers have the proper weight in the corpus database. Roberto F. |
|
![]() |
|
Alan
Guest Group
|
Post Options
Thanks(0)
Quote Reply
Posted: 30 April 2004 at 12:13pm |
|
I am giving the 345 release a try. It may take a few days to build up tokens. It looks like you haven't implimented any way to turn the feature on/off ? One possible problem did come to mind. If you use backup spooling servers in your MX record, some spammers target them as a secondary entryway. If you get a lot of spam sent using this method, I suspect the spooling servers could eventually be detected as spam by the Bayesian filtering? Roberto does this sound correct? |
|
![]() |
|
LogSat
Admin Group
Joined: 25 January 2005 Location: United States Status: Offline Points: 4106 |
Post Options
Thanks(0)
Quote Reply
Posted: 30 April 2004 at 11:02pm |
|
I would let statistics do their work... If spammers send mail to your backup MX server, most of the email you will receive from it will be spam. The Received headers will contain your backup's IP, and they will then be taken into consideration. When SpamFilter receives email from your backup, it will see the IP/server name in the received headers, which will cause the probability to be spam to increase slightly, but this is correct since most email from your backup is spam. If the message is good to begin with, statistically the number of "good" tokens will likely make up for the "bad" score caused by the ip. This is all theory however, actual use will prove its validity. Roberto F. |
|
![]() |
|
Post Reply
|
|
|
Tweet
|
| Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.313 seconds.


Topic Options
Post Options
Thanks(0)


