Bayseian Filter "Tip" |
Post Reply ![]() |
Author | |
Desperado ![]() Senior Member ![]() ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
![]() ![]() ![]() ![]() ![]() Posted: 26 July 2004 at 4:14pm |
All, On my higher volume ISP mail server, I was getting enough false positives after a couple of days running that I had disabled the filter due to customer complaints. About 2 weeks ago, I changed the INI setting "CleanUpCorpusIntervalDays=1" I cleared the corpus and started over. The result were that I am not getting nearly as many false positives (none that I have personally seen) and I have not heard a peep out of my customers. I also am getting LOTS of Bayesian matches so it has not hurt my filtering at all. I am not 100% sure, but I think that because I am also scanning the headers with the setting "ScanReceivedHeaders=1" that the Bayesian filter was getting overly agressive with messages whos headers had some of the same componants as some of the Spam. I may, in fact be "all wet" on this point but the end results seem to fantastic so I am happy. On my lower traffic servers, I did not have this issue. Regards, Dan S. |
|
![]() |
|
bpogue99 ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 59 |
![]() ![]() ![]() ![]() ![]() |
I had noticed on one of my servers that I get unusually large numbers of false positives. I've tried clearing and restarting the corpus but it doesn't seem to change the numbers. I'll try this tip. Thanks for the pointers Dan!!
|
|
![]() |
|
fdickey ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
One thing I am wondering now about the Bayseian filter is when it checks the headers, how does it decide to classify the parts that are constantly repetitive...such as the entry for our mail server...and the entry for the spam filter itself? The reason I ask is that I was getting a huge amount of false positives the past couple of days since upgrading to 2.1.1.367. I turned on the option to show the tokens in the logs so I could get a better idea of what the heck is going on and noticed that these entries were tokens in the database. Now I'm wondering if I shoud turn off the funtion to check the headers. I want to do whatever is most efficient and effective. |
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4105 |
![]() ![]() ![]() ![]() ![]() |
Fred,The statistical filter examines both good and bad emails and assigns scores to each word accordingly. If an entry is in a "good" email it will be assigned a much higher score than the same entry in a "bad" email. It's not only headers that are repetitive, but so are email addresses and IP addresses for example. The combination of *all* these scores counts towards the final outcome, a single possible high spam score for a serer name will not make the difference.Roberto F.LogSat Software
|
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.133 seconds.