corpus database |
Post Reply ![]() |
Author | |
kspare ![]() Senior Member ![]() Joined: 26 January 2005 Location: Canada Status: Offline Points: 334 |
![]() ![]() ![]() ![]() ![]() Posted: 14 January 2004 at 9:13pm |
Hi Roberto, I know this has been brought up before, but are there still any plans to put the corpus database into the actual database of choice? I ask this because I am now running redundant servers. If I was to keep all traffic on one server it would maximize the ability of the filter. But if that server were to go down and traffic came accross the 2nd spam server, the filter will not work nearly as good....it would be nice to see if this could be implemented? Kevin |
|
![]() |
|
ASB ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
Why not just periodically copy for Corpus database over to the other server?
|
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
Kevin, We had actually tried implementing a fully database-based corpus during our alpha development. That would have been a preferred solution for us as well, but performance testing showed HUGE problems. Neither MS SQL nor MySQL were even close to being able to handle the massive quantity of queries needed to scan for and update the corpus. Regarding the issue of having two separate corpii databases, we actually encourage that to be the case, since separate servers, especially in a primary/secondary MX situation, receive different emails and thus the statistics are different. But you do bring up a very valid point in that if the backup is never receiving traffic except for when the primary goes down, that would reduce the effectiveness. An option would be to not have the backup write to and update the corpus database, but to have it only reloaded from file at regular intervals. Basically it's just a read-only copy. The administrator would have to use a scheduled file replication to copy it from the primary server. This is something we can add as an option in the ini file rather easily, we'll see if we can have it done before the beta period is over. Robero F. |
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
That can't currently be done because SpamFilter updates the file every 10 minutes with the new tokens it "learned", thus overwriting any changes that may have been copied over. Only right after SpamFilter updates it, then it's loaded back into memory. Please see my next posting in this thread for more info though, as there may be a solution for this. Roberto F. |
|
![]() |
|
Kevin ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
That is a good possiblility. I currently have both servers run a script to copy down all the white/black lists so that is just one extra task. easily done. There has to be some sort of a solution for the servers to share in real time the corpus databse though? What if you could specify a path for the corpus data, if that could be done, could both servers share the database?
|
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
That's not as simple as it would seem, since both SpamFilters would be trying to write to the same files. These operations need to occur extremely fast, and they were the cause of the prvious mem leaks and performance issues we were experiencing. Adding routines that would prevent locking/sharing of the files by multiple instances of SpamFilter would greatly affect performance, and we do not want to do that just now. After this new version is stable and reliable, it will be something we'll take a look at again. Roberto F. << |
|
![]() |
|
kspare ![]() Senior Member ![]() Joined: 26 January 2005 Location: Canada Status: Offline Points: 334 |
![]() ![]() ![]() ![]() ![]() |
Fair Enough. |
|
![]() |
|
ASB ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
This sounds like a good option. As a variation, would it be possible for to have a primary database which behaves as normal, and a second file which is a read-only copy? This way clustered mail servers could provide their primary files as a secondary, read-only file for each other... |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.227 seconds.