SURBL and MAPS good for bayesian learning |
Post Reply ![]() |
Author | |
Ivan82 ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() Posted: 24 October 2005 at 11:45am |
The bayesian filter requires 5000 spam and non-spam e-mails to function. To quicken the learn process I've set Spamfilter to quarantine mail blocked by MAPS and SUBRL blacklists, then I manually delete or deliver mail in the quarantine folder. My question is, does manually deleting MAPS/SUBRL blocked email quicken the bayesian learn process, or is using this manual method entirely unnecessary? |
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
Ivan82,
Deleting entries in the database has no effect on the bayesian filter. However, if a good email was incorrectly blocked by one of the filters, forcing its delivery will resubmit the email to the bayesian filter, tagging it as "good", and this will help train the bayesian filter in recognizing "good" tokens within that email, so as to lessen the chances of mistakes in the future. |
|
![]() |
|
Ivan82 ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
OK, so I've removed the maps & subrl quarantine, and I've kept quarantine only for SPF issues. I've also added some keywords(ie, cialis, viagra, penis enlargement, morgage, etc). When checking into the quarantined e-mails I get plenty that have SPF issues and contain those blacklisted keywords. If I pass these emails on to be delivered, will they get blocked due to the keywords they contain? Or must I delete them from quarantine manually?
|
|
![]() |
|
Ivan82 ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
Oh, and the bayesian filter just isn't kicking in..not even after 60,000+ mail attempts discarded, 8000+ mail forwarded, and 17000+ spam blocked...
|
|
![]() |
|
Ivan82 ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
Possible new feature request:
I have a list of authorized e-mails, for which I recieve mail, mail to addresses not on this list is dropped by spamfilter. How about another check that if mail is being sent by unknownusername@mydomain.com to knownusername@mydomain.com, it will get blocked automatically? I suspect domain spoofing would get detected by SPF? But what about a user inside my domain that is attempting to spam all users from my domain? |
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
Ivan,
Please note that the bayesian filter "learns" emails as they are quarantined. If you delete them from the quarantine, this has no effect on the bayesian learning process as they have already been processed. If however you force the delivery of those quarantined emails, you will be telling the bayesian filter "this spam that you blocked is not really spam, reprocess the emails and re-learn that the content is actually good". The bayesian filter will thus re-eaxmine the email, try to learn about the email contents so next time similar emails will be passed, and will then deliver the current email to the intended recipient. RE: your statement " so I've removed the maps & subrl quarantine", we're not sure we understood. The more filters you have active, the more chances are that SpamFilter will block spam. Both the MAPS an SURBL filters are very effective, and will thus greatly help the bayesian filter in the learning process. As for the Bayesian filter not kicking in, please note that the Bayesian filter is used as a last resort to check for spam, after all the other filters have had a chance to do so. Only if they all fail is the Bayesian filter used. As such, it will have mostly pre-screened emails to check, and will only tag a very small percentage of them. As an example we provided a snapshot of our filter stats for 3 days worth of emails on the forum as follows: 94,828 IP found in MAPS search 74,161 IP address is from a blacklisted country 10,810 Invalid sender domain MX record 7,896 SPF Sender Policy Framework match 3,044 Keywords found in content 763 Exceeded maximum number of RCPT TO 526 Mail From and Mail To domains are equal 345 Statistical filter match 27 Mail From and Mail To are equal According to the above, the Bayesian statistical filter on our own server only blocked 0.2% of the spam found by the other filters. However that is still 354 spam emails that were successfully blocked. To answer your last posting, "if mail is being sent by unknownusername@mydomain.com to knownusername@mydomain.com", as you mentioned the SPF filter will take care of that. There is also another option that rejects emails if the "FROM" domain is the same as the "TO" domain. That will also cause a reject if the user is within your domain in the scenario you described. |
|
![]() |
|
Ivan82 ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
Thanks for the replies, what I meant when I said I disabled the MAPS and SUBRL quarantine, is that I unchecked the option to quarantine mail blocked by those filters so now it just deletes them automatically, we've been running Spamfilter for 4 days now and it was too time consuming to manually sort through the quarantined list. All filters are active though, my question was, without the manual process enabled of deleting/delivering quarantined possible spam(assuming all email blocked by SPF/MAPS/SUBRL is spam), does the Bayesian filter still get the data it needs to detect possible spam?
|
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
Ivan82,
Yes, SpamFilter will still pass on the emails to the Bayesian learning engine, even if you disable the quarantining of emails for some filters. |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.219 seconds.