Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - corpus database
  FAQ FAQ  Forum Search   Register Register  Login Login

corpus database

 Post Reply Post Reply
Author
kspare View Drop Down
Senior Member
Senior Member


Joined: 26 January 2005
Location: Canada
Status: Offline
Points: 334
Post Options Post Options   Thanks (0) Thanks(0)   Quote kspare Quote  Post ReplyReply Direct Link To This Post Topic: corpus database
    Posted: 14 January 2004 at 9:13pm

Hi Roberto, I know this has been brought up before, but are there still any plans to put the corpus database into the actual database of choice? I ask this because I am now running redundant servers. If I was to keep all traffic on one server it would maximize the ability of the filter. But if that server were to go down and traffic came accross the 2nd spam server, the filter will not work nearly as good....it would be nice to see if this could be implemented?

Kevin

Back to Top
ASB View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote ASB Quote  Post ReplyReply Direct Link To This Post Posted: 15 January 2004 at 10:24pm
Why not just periodically copy for Corpus database over to the other server?
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 15 January 2004 at 10:44pm

Kevin,

We had actually tried implementing a fully database-based corpus during our alpha development. That would have been a preferred solution for us as well, but performance testing showed HUGE problems. Neither MS SQL nor MySQL were even close to being able to handle the massive quantity of queries needed to scan for and update the corpus.

Regarding the issue of having two separate corpii databases, we actually encourage that to be the case, since separate servers, especially in a primary/secondary MX situation, receive different emails and thus the statistics are different. But you do bring up a very valid point in that if the backup is never receiving traffic except for when the primary goes down, that would reduce the effectiveness.

An option would be to not have the backup write to and update the corpus database, but to have it only reloaded from file at regular intervals. Basically it's just a read-only copy. The administrator would have to use a scheduled file replication to copy it from the primary server. This is something we can add as an option in the ini file rather easily, we'll see if we can have it done before the beta period is over.

Robero F.
LogSat Software

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 15 January 2004 at 10:47pm

That can't currently be done because SpamFilter updates the file every 10 minutes with the new tokens it "learned", thus overwriting any changes that may have been copied over. Only right after SpamFilter updates it, then it's loaded back into memory.

Please see my next posting in this thread for more info though, as there may be a solution for this.

Roberto F.
LogSat Software

Back to Top
Kevin View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Kevin Quote  Post ReplyReply Direct Link To This Post Posted: 16 January 2004 at 12:00am

That is a good possiblility. I currently have both servers run a script to copy down all the white/black lists so that is just one extra task. easily done.

There has to be some sort of a solution for the servers to share in real time the corpus databse though?

What if you could specify a path for the corpus data, if that could be done, could both servers share the database?

 

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 16 January 2004 at 7:53am

That's not as simple as it would seem, since both SpamFilters would be trying to write to the same files. These operations need to occur extremely fast, and they were the cause of the prvious mem leaks and performance issues we were experiencing. Adding routines that would prevent locking/sharing of the files by multiple instances of SpamFilter would greatly affect performance, and we do not want to do that just now. After this new version is stable and reliable, it will be something we'll take a look at again.

Roberto F.
LogSat Software

<<
What if you could specify a path for the corpus data, if that could be done, could both servers share the database?
>>

Back to Top
kspare View Drop Down
Senior Member
Senior Member


Joined: 26 January 2005
Location: Canada
Status: Offline
Points: 334
Post Options Post Options   Thanks (0) Thanks(0)   Quote kspare Quote  Post ReplyReply Direct Link To This Post Posted: 16 January 2004 at 8:20am

Fair Enough.

Back to Top
ASB View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote ASB Quote  Post ReplyReply Direct Link To This Post Posted: 17 January 2004 at 8:31pm

This sounds like a good option.

As a variation, would it be possible for to have a primary database which behaves as normal, and a second file which is a read-only copy?

This way clustered mail servers could provide their primary files as a secondary, read-only file for each other...

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.227 seconds.