Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - SURBL and MAPS good for bayesian learning
  FAQ FAQ  Forum Search   Register Register  Login Login

SURBL and MAPS good for bayesian learning

 Post Reply Post Reply
Author
Ivan82 View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ivan82 Quote  Post ReplyReply Direct Link To This Post Topic: SURBL and MAPS good for bayesian learning
    Posted: 24 October 2005 at 11:45am

The bayesian filter requires 5000 spam and non-spam e-mails to function. To quicken the learn process I've set Spamfilter to quarantine mail blocked by MAPS and SUBRL blacklists, then I manually delete or deliver mail in the quarantine folder. My question is, does manually deleting MAPS/SUBRL blocked email quicken the bayesian learn process, or is using this manual method entirely unnecessary?

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 25 October 2005 at 6:56am
Ivan82,

Deleting entries in the database has no effect on the bayesian filter. However, if a good email was incorrectly blocked by one of the filters, forcing its delivery will resubmit the email to the bayesian filter, tagging it as "good", and this will help train the bayesian filter in recognizing "good" tokens within that email, so as to lessen the chances of mistakes in the future.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Ivan82 View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ivan82 Quote  Post ReplyReply Direct Link To This Post Posted: 25 October 2005 at 12:25pm
OK, so I've removed the maps & subrl quarantine, and I've kept quarantine only for SPF issues. I've also added some keywords(ie, cialis, viagra, penis enlargement, morgage, etc). When checking into the quarantined e-mails I get plenty that have SPF issues and contain those blacklisted keywords. If I pass these emails on to be delivered, will they get blocked due to the keywords they contain? Or must I delete them from quarantine manually?
Back to Top
Ivan82 View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ivan82 Quote  Post ReplyReply Direct Link To This Post Posted: 26 October 2005 at 10:05am
Oh, and the bayesian filter just isn't kicking in..not even after 60,000+ mail attempts discarded, 8000+ mail forwarded, and 17000+ spam blocked...
Back to Top
Ivan82 View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ivan82 Quote  Post ReplyReply Direct Link To This Post Posted: 26 October 2005 at 11:45am
Possible new feature request:

I have a list of authorized e-mails, for which I recieve mail, mail to addresses not on this list is dropped by spamfilter. How about another check that if mail is being sent by unknownusername@mydomain.com to knownusername@mydomain.com, it will get blocked automatically? I suspect domain spoofing would get detected by SPF? But what about a user inside my domain that is attempting to spam all users from my domain?


Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 26 October 2005 at 4:14pm
Ivan,

Please note that the bayesian filter "learns" emails as they are quarantined. If you delete them from the quarantine, this has no effect on the bayesian learning process as they have already been processed. If however you force the delivery of those quarantined emails, you will be telling the bayesian filter "this spam that you blocked is not really spam, reprocess the emails and re-learn that the content is actually good". The bayesian filter will thus re-eaxmine the email, try to learn about the email contents so next time similar emails will be passed, and will then deliver the current email to the intended recipient.

RE: your statement " so I've removed the maps & subrl quarantine", we're not sure we understood. The more filters you have active, the more chances are that SpamFilter will block spam. Both the MAPS an SURBL filters are very effective, and will thus greatly help the bayesian filter in the learning process.

As for the Bayesian filter not kicking in, please note that the Bayesian filter is used as a last resort to check for spam, after all the other filters have had a chance to do so. Only if they all fail is the Bayesian filter used. As such, it will have mostly pre-screened emails to check, and will only tag a very small percentage of them.

As an example we provided a snapshot of our filter stats for 3 days worth of emails on the forum as follows:

94,828 IP found in MAPS search
74,161 IP address is from a blacklisted country
10,810 Invalid sender domain MX record
7,896 SPF Sender Policy Framework match
3,044 Keywords found in content
763 Exceeded maximum number of RCPT TO
526 Mail From and Mail To domains are equal
345 Statistical filter match
27 Mail From and Mail To are equal

According to the above, the Bayesian statistical filter on our own server only blocked 0.2% of the spam found by the other filters. However that is still 354 spam emails that were successfully blocked.


To answer your last posting, "if mail is being sent by unknownusername@mydomain.com to knownusername@mydomain.com", as you mentioned the SPF filter will take care of that. There is also another option that rejects emails if the "FROM" domain is the same as the "TO" domain. That will also cause a reject if the user is within your domain in the scenario you described.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Ivan82 View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ivan82 Quote  Post ReplyReply Direct Link To This Post Posted: 27 October 2005 at 5:39am
Thanks for the replies, what I meant when I said I disabled the MAPS and SUBRL quarantine, is that I unchecked the option to quarantine mail blocked by those filters so now it just deletes them automatically, we've been running Spamfilter for 4 days now and it was too time consuming to manually sort through the quarantined list. All filters are active though, my question was, without the manual process enabled of deleting/delivering quarantined possible spam(assuming all email blocked by SPF/MAPS/SUBRL is spam), does the Bayesian filter still get the data it needs to detect possible spam?
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 27 October 2005 at 8:03pm
Ivan82,

Yes, SpamFilter will still pass on the emails to the Bayesian learning engine, even if you disable the quarantining of emails for some filters.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.219 seconds.