Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Beta version of new SpamFilter v2.0 is available
  FAQ FAQ  Forum Search   Register Register  Login Login

Beta version of new SpamFilter v2.0 is available

 Post Reply Post Reply
Author
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Topic: Beta version of new SpamFilter v2.0 is available
    Posted: 08 October 2003 at 11:37pm

We have released to the public the beta for the new version of SpamFilter ISP v2.0. This release features statistical DNA fingerprinting of emails, which should allow greater accuracy in fighting SPAM.

The beta can be obtained from SpamFilter's download page at http://www.logsat.com/sfi-download.asp.

Please read the beta notes carefully. The first hundreds/thousands of emails received are critical in obtaining an accurate statistical database. It is important that when building your first database the number of false positives (good emails classified as spam) be kept at a minimum. For this reason it may be a good idea to start running the new version during the day, when there is usually a higher traffic of legitimate emails.

Roberto F.
LogSat Software

Back to Top
kspare View Drop Down
Senior Member
Senior Member


Joined: 26 January 2005
Location: Canada
Status: Offline
Points: 334
Post Options Post Options   Thanks (0) Thanks(0)   Quote kspare Quote  Post ReplyReply Direct Link To This Post Posted: 11 October 2003 at 2:37pm

Hi Roberto, got the beta version running.

I have a question though. Is it possible to have the corpus data populate in mysql instead of on each computer? For companies who have a primary and backup smtp server, they will both be collecting and building a corpus file on each computer instead of using shared info.

Just curious.

Kevin

Back to Top
eric View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote eric Quote  Post ReplyReply Direct Link To This Post Posted: 11 October 2003 at 8:42pm

its a beta... :-)

is it possible to tweak it into an \\servername\sharename\corpusfilename.name ?

and have to servers share that ?

Back to Top
Ric Marques View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ric Marques Quote  Post ReplyReply Direct Link To This Post Posted: 14 October 2003 at 12:56pm

Roberto -

I'm just thinking out loud here... but as the DNA technology develops, would there be a way to share the statistical corpus between different SpamFilterISP users?  Build in a Morpheus or Kazaa type peer-to-peer network (that is optional) between users to share each other's SPAM fingerprints so that every user benefits from the SPAM that we all receive?

Again - just thinking out loud...

-Ric

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 15 October 2003 at 12:53am

Performing realtime statistical analysis of emails is very process intensive. We fought hard to achieve an acceptable performance, and to do so we had to stay away from storing the corpus in a database.

In a certain way, it's not a bad thing to have different corpii for different servers. unless they are load-balanced, the backup smtp servers usually take a lower load, and only certain spammers will send email to them directly bypassing the primary smtp server. This causes the email arriving at the secondary to be statistically different from the email going to the primary. It is better in this case to have separate statistical data, as this will improve accuracy.

Roberto F.
LogSat Software

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 15 October 2003 at 12:57am

Not really... To optimize for speed, each copy of SpamFilter maintains an in-memory copy of the corpus database, which is saved at intervals to disk. The disk file is only ready when SpamFilter starts, not thereafter. So each server would have the same corpus only on startup, but as time goes, the in-memory copy will be different between the various servers.

But the beatuty is, being all based on statistics, once the corpus grows to a MB or so, the differences are irrelevant...

Roberto F.
LogSat Software

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 15 October 2003 at 1:05am

Ric,

We had thought of that at the beginning, but then discovered that each company/provider receives different emails. We may have 3-4 users who subscribe to a dating service. They're emails will cause the statistical database to "sway" in a certain way to accomodate their needs. This will cause very different results if the same database is used by another company ho instead could be a local government...

A very, very effective statistical corpus can be obtained from scratch just by having SpamFilter running for 24 hours. Once it's built, the corpus is tailored for that company and reflect pretty accuratle the kind of email traffic that is expected of it. There should be no need to import other user's results.

Roberto F.
LogSat Software

Back to Top
kspare View Drop Down
Senior Member
Senior Member


Joined: 26 January 2005
Location: Canada
Status: Offline
Points: 334
Post Options Post Options   Thanks (0) Thanks(0)   Quote kspare Quote  Post ReplyReply Direct Link To This Post Posted: 16 October 2003 at 10:11am

The dna filtering doesn't seem to be working for me.

It shows that it is scanning a message (19ms) for example, and that it is adding it to the bayes corpus file, but every message reads 0% spam and this includes spam messages.

I'm sitting at 863 forwarded messages and 1133 blocked.

Any ideas?

Back to Top
Trinidad View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Trinidad Quote  Post ReplyReply Direct Link To This Post Posted: 16 October 2003 at 3:52pm

From what I am understanding this new version sounds as if everything will be blocked until the users start forwarding in from their quarentine area the emails that are legit, am i reading this correct?

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 17 October 2003 at 12:07am

Kevin,

The statistical filter kicks in when you receive 500good+500spam emails. It's accuracy will be very low at the beginning, but will improve dramatically as more emails arrive. This is a beta, so test are still ongoing, but we see than when the corpus reaches 2-4MB in size with a few thousand emails in each group, then it will catch spam at a regime rate.

Roberto F.
LogSat Software

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 17 October 2003 at 12:44am

Not exactly. At first, the statistical engine will determine what is good email and what is bad mail by looking at what the other filters SpamFilter uses do. As more and more emails are received, SpamFilter will adapt to the kind of email traffic and recognize more and more spam as it comes in. But during the initial training period, more or less the first 24 hours, it is important that the number of false positives be reduced to a minimum so the learining process is accurate. When an email is taken out of the quarantine, SpamFilter will know and will learn that similar emails are probably going to be legitimate.

Roberto F.
LogSat Software

Back to Top
Trinidad View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Trinidad Quote  Post ReplyReply Direct Link To This Post Posted: 17 October 2003 at 10:32am
I have another question. Will the corpus file override any other settings that im using?  For example, I have plenty of regex and keyword settings, now if I catch a legit email and the user forwards it in, will it then not get caught by my settings the second time around because of the statistical engine learning that it was a legitimate email?
Back to Top
Ric Marques View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ric Marques Quote  Post ReplyReply Direct Link To This Post Posted: 17 October 2003 at 1:14pm

Roberto -

Will you be developing a way to report false negatives?  My users are CONSTANTLY wanting to send me the SPAM that currently gets through the filter.

-Ric

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 18 October 2003 at 10:51am

We've given this plenty of thought in the past. It would be very easy if the users could simply forward their spam to a special email address that SpamFilter know about.

Unfortunately the problem with that is the Outlook client. It reformats the emails so much that they are completely different at times as the original message. The header information is also stripped out.

Our plan is to create an Outlook plugin to be installed on the clients that will allow them the reporting of spam directly to SpamFilter, but it will be a few months before we can have something read for that.

Roberto F.
LogSat Software

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 18 October 2003 at 11:00am

Brian,

If an email is quarantined, and a user (or the admin thru the GUI) selects to force-deliver it, it will bypass all checks and be delivered no matter what.

Please note that the email will be temporarily save to the queue directory before being delivered. If the email contains a virus, if you have anti-virus software running it may delete it before SpamFilter can send it. SpamFilter caches emails on disk partly for this reason, to allow anti-virus software to catch/clean/delete and infected files. SpamFilter is designed not to "break" should an expected email file "disappear".

Roberto F.
LogSat Software

Back to Top
Ric View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ric Quote  Post ReplyReply Direct Link To This Post Posted: 19 October 2003 at 11:34pm

What about a web based interface?  Users could copy and paste the entire message into a page that would then dump the message into a table that SpamFilterISP would check on periodically.  It could easily be built into the same web interface that is in use now to check on quarantined messages... just submit the form - and viola!

There are a LOT of clients that (similar to Outlook) reformat the messages and strip out the original header information - 95% of my staff would have the same problem - and the plug-in wouldn't help us... (we decided against using ANY MS client years ago - and we have enjoyed NOT having most of the email virus headaches that many others have suffered through...)

Just a thought...

-Ric

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.078 seconds.