Print Page | Close Window

Bayseian Filter "Tip"

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=4051
Printed Date: 31 July 2025 at 4:56pm


Topic: Bayseian Filter "Tip"
Posted By: Desperado
Subject: Bayseian Filter "Tip"
Date Posted: 26 July 2004 at 4:14pm

All,

On my higher volume ISP mail server, I was getting enough false positives after a couple of days running that I had disabled the filter due to customer complaints.

About 2 weeks ago, I changed the INI setting "CleanUpCorpusIntervalDays=1"

I cleared the corpus and started over.  The result were that I am not getting nearly as many false positives (none that I have personally seen) and I have not heard a peep out of my customers.  I also am getting LOTS of Bayesian matches so it has not hurt my filtering at all.  I am not 100% sure, but I think that because I am also scanning the headers with the setting "ScanReceivedHeaders=1" that the Bayesian filter was getting overly agressive with messages whos headers had some of the same componants as some of the Spam.

I may, in fact be "all wet" on this point but the end results seem to fantastic so I am happy.  On my lower traffic servers,  I did not have this issue.

Regards,

Dan S.




Replies:
Posted By: bpogue99
Date Posted: 29 July 2004 at 11:55am
I had noticed on one of my servers that I get unusually large numbers of false positives. I've tried clearing and restarting the corpus but it doesn't seem to change the numbers. I'll try this tip. Thanks for the pointers Dan!!


Posted By: Guests
Date Posted: 21 September 2004 at 5:46pm

One thing I am wondering now about the Bayseian filter is when it checks the headers, how does it decide to classify the parts that are constantly repetitive...such as the entry for our mail server...and the entry for the spam filter itself?

The reason I ask is that I was getting a huge amount of false positives the past couple of days since upgrading to 2.1.1.367.  I turned on the option to show the tokens in the logs so I could get a better idea of what the heck is going on and noticed that these entries were tokens in the database.

Now I'm wondering if I shoud turn off the funtion to check the headers.  I want to do whatever is most efficient and effective.



Posted By: LogSat
Date Posted: 23 September 2004 at 12:16am
Fred,

The statistical filter examines both good and bad emails and assigns scores to each word accordingly. If an entry is in a "good" email it will be assigned a much higher score than the same entry in a "bad" email. It's not only headers that are repetitive, but so are email addresses and IP addresses for example. The combination of *all* these scores counts towards the final outcome, a single possible high spam score for a serer name will not make the difference.

Roberto F.

LogSat Software



Print Page | Close Window