Print Page | Close Window

Baysein filter - How can I tell if it's working?

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=4067
Printed Date: 09 May 2025 at 1:18am


Topic: Baysein filter - How can I tell if it's working?
Posted By: CyberBob
Subject: Baysein filter - How can I tell if it's working?
Date Posted: 27 July 2004 at 4:01pm

I know this may sound crazy should we see in the quarntine emails that are "Baysein Filtered" or something like that? I stopped the filter on Monday am, deleted the corpus files and restarted so we can "retrain" the filter. Now it's been a couple days and the activity log shows Baysein filtering 0% but I cannot find one quarantined message. The slide bar is set near the middle showing 98.75% filtering.

Am I overlooking the obvious here?

Thanks in advance,

Bob




Replies:
Posted By: Desperado
Date Posted: 27 July 2004 at 8:06pm

Bob,

First the obvious ... is "Learning" on?  If so,  look in the Corpus folder and then open the corpus.ini file.  It should look something like:

[Messages]
Spam=244275
Good=68470

The Bayesian filter will not start quarantining messages until both values are above 5000.  This value is set by the SpamFilter.ini file as:

MinEmailsForBayesKickIn=5000

You can lower this number but you increase the probability of false positives.

Let me know if this helps or if you are still having issues

Regards,

Dan S. (User)



Posted By: CyberBob
Date Posted: 28 July 2004 at 10:02am

Dan,

Thanks for the quick reply.

Yes learning is on and the .ini file is over 10,000 for both Spam and Good.

So next question: If the Bayseian filter quarantines a message is that stated in the log file? I've searched our logs to only find blocked by IP or keyword or attachements etc.. I'm looking for something that says blocked by Bayseian filter or something like that. Am I dreaming?

In the corpus directory there are a couple .tmp files, a .dat and .prg that are all over 3mb in size so something is working I'm just not sure I'm looking for the right results on a log file?

 



Posted By: CyberBob
Date Posted: 28 July 2004 at 10:12am

I finally found a few messages filtered by the Bayseian filter. They appear to have an id of 14 in the reject code but there are very few of them in over 250,000 quaranteened messages? The Bayseian Filter Threshold seems to adjust itself? Could that be correct? It's currently set at 92.xx% Is that too high? What is the recommeded level to set it at?

I think at this point the filter is working, it's just set at to high of a level to detect much?

 

Thanks,

Bob



Posted By: Desperado
Date Posted: 28 July 2004 at 10:57am

Hey,

The filter will get better and better as time goes on depending on how good your standard filters are because those are what the filter bases it's statistics on.  Also, I have my filter set at 99.something % in all cases.

Dan



Posted By: CyberBob
Date Posted: 28 July 2004 at 11:39am

Sorry for the continued questions but you are saying the higher you set the % the better the filtering?

Also the more we update our keyword filters/attachments etc.. the better the Bayseian filter becomes?

I thought the filter was there to lessen the amount of manual filters we have to create? but you say it builds it's list from the filters we have manually put in place?

I created a query and since Monday when I deleted the Corpus DB files and restarted the Bayseian filter has only blocked 10 message and we have tens of thousands per day so something doesn't seem setup correctly yet?



Posted By: Desperado
Date Posted: 28 July 2004 at 12:02pm

First,  No, the percent is the level thatthe filter blocks at and the recomended value is in the 99% range so that you don't get overly agressive.

Second, the Bayesian filter builds it's information on what the filters teach it about the content of the messages that are blocked and as time goes on, it learns what the spam looks like.  In theory, once a good database is built, you could remove filters but I would not do that because I think that after a few days, it would start to reduce it's ability to detect garbage.

dan



Posted By: Guests
Date Posted: 02 August 2004 at 4:27pm
How do I know learning is on? So far my email block is over 20,000, but in the corpus.ini, it only shows 1520



Print Page | Close Window