Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Bayesian Filer Test Not working
  FAQ FAQ  Forum Search   Register Register  Login Login

Bayesian Filer Test Not working

 Post Reply Post Reply Page  12>
Author
Eric View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Eric Quote  Post ReplyReply Direct Link To This Post Topic: Bayesian Filer Test Not working
    Posted: 27 November 2004 at 3:49pm

I am trying to run a test on for the Bayesian Filter, I paste the contents on of  a known spam mail message, and under the Corpus DB, it says the DB is locked and that it passes with 0%. 

I have a feeling this is why I get a ton of spam....and it only seems to get worse than better.

Also, The learning status is "Inactive" even though the "Learn new incoming emails" box is checked.....

Thanks ya!

Eric

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 02 December 2004 at 7:04pm
Eric,

There may be a problem with the statistical corpus database. Can you please try to stop SpamFilter, delete the SpamFilter\corpus directory, then restart SpamFilter. Please note that this will reset your statistical database, and SpamFilter will again need to receive the inistial 5,000 good and 5,0000 spam emails to "prime" the database.

Roberto F. LogSat Software
Back to Top
Clutcher View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Clutcher Quote  Post ReplyReply Direct Link To This Post Posted: 15 December 2004 at 12:58pm

I have tryed also the method you suggested (deleting Corpus) and read almost all the forum but I still can't see any email with spam probabilty not equal to 0% and I receive a lot of spam.

The program added to crpus a lot of words but they all seem to be "good one". In fact I can't understand ho to give to Spamfilter a Spam example to trigger bayes.

Btw, the program is great undeed and with some quick and simply settings still blocks thousands of virus and other things.

TIA

MArco

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 16 December 2004 at 11:00pm
Marco,

In the corpus you'll find entries for both "good" and "bad" words, it is the score that is assigned to them which determines how they are used to check for spam. Each token in the corpus has statistics attached to it indicating how many times that token has appeared in spam emails and how many times it was included in a good email. Using statistical math, a score is given to it, and an analysys of the higher scores in an email determines if the email is good or not.

If you have received more emails than the threshold of 5,000 good and 5,000 spam, can you confirm that not a single email has triggered the Bayesian filter? You can check that using the new statistical pie chart available in the latest version that shows the emails blocked by the various filters.

Roberto F. LogSat Software
Back to Top
Clutcher View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Clutcher Quote  Post ReplyReply Direct Link To This Post Posted: 20 December 2004 at 10:23am

>In the corpus you'll find entries for both "good" and "bad" words, it is the score that is
>assigned to them which determines how they are used to check for spam.

In fact all the word in the corpus have the same value of "good" and therefore as I stated before spam probability is always 0%

>If you have received more emails than the threshold of 5,000 good and 5,000 spam, can
>you confirm that not a single email has triggered the Bayesian filter?

Yes. I think the problem is that nothing or noone told SpamFilter what is SPAM so it can't assign bad values to bad words. And I don't know how to let it learn.

>You can check that using the new statistical pie chart available in the latest version that
> shows the emails blocked by the various filters.

27431 emails blocked, none of them as SPAM

TIA

Ciao

MArco

 

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 20 December 2004 at 10:57pm
Marco,

It's very unusual that *all* the words in the corpus have the same value of good. To verify, can you please go to the "Settings - Bayesian Filter - Corpus Database" tab in SpamFilter, then click on the "Dump Corpus" button. That will generate a listing of all entries in the corpus database, along with the number of times each token appearead in a good email, in a spam email, the probability that an email containing that token is spam, and the last time an email arrived with that token.

In that list, look for the following, and please paste the results in the forum so we can take a look at your values. *sex, *SEX, *Subject*viagra, *unsubscribe,

As a comparison, these are the ones right now in our own corpus database.

*Token ,Good, Spam, ProbSpam, ModDate *sex,449,13120,0.473827302455902,12/20/2004 *SEX,71,515,0.182700991630554,12/20/2004 *Subject*viagra,16,2053,0.79815948009491,12/20/2004 *unsubscribe,4828,54253,0.257227122783661,12/20/2004

<< Yes. I think the problem is that nothing or noone told SpamFilter what is SPAM so it can't assign bad values to bad words. And I don't know how to let it learn. >>

Actually everytime any of SpamFilter's filters finds a spam email, it updates the statistical coprus with the email's tokens and assignes them "spam scores". Every single email that SpamFilter receives goes thru this processes.

Please go into more details on your statement "27431 emails blocked, none of them as SPAM". Do you mean that you do not have a single email in the quearantine database, or showing on the statistical pie chart, but yet SpamFilter shows 27431 emails as blocked? If so, this means that SpamFilter blocked 27431 attempts by spammers to "relay" using your SMTP server, but that *all* "incoming" email addressed to your users was not blocked at all. This would tend to indicate a misconfiguration of SpamFilter, as with its default filters SpamFilter will indeed block a huge amount of emails addressed to your domains.

Roberto F. LogSat Software
Back to Top
Clutcher View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Clutcher Quote  Post ReplyReply Direct Link To This Post Posted: 21 December 2004 at 8:01am

>It's very unusual that *all* the words in the corpus have the same value of good. To verify,
>can you please go to the "Settings - Bayesian Filter - Corpus Database" tab in SpamFilter, >then click on the "Dump Corpus" button.

In fact, it's what I did

>In that list, look for the following, and please paste the results in the forum so we can take
> a look at your values. . *sex, *SEX, *Subject*viagra, *unsubscribe,

*sex,1,1,0,400000005960464,16/12/2004
*unsubscribe,1,0,0,400000005960464,21/12/2004

I could continue but all the words have the same values, apart from the date.

>Actually everytime any of SpamFilter's filters finds a spam email

Which filters? How can they say it's spam?

>Do you mean that you do not have a single email in the quearantine database, or showing
>on the statistical pie chart, but yet SpamFilter shows 27431 emails as blocked?

I'm full of blocked emails but none because SPAM

14000 Exceed RCPT
9000 IP in MAPS
5000 Keyword found
4000 Invalid MX
1000 SPF
800  Domain local blacklist
etc

>This would tend to indicate a misconfiguration of SpamFilter, as with its default filters
> SpamFilter will indeed block a huge amount of emails addressed to your domains.

Yes, but, again, how could he block a message as spam if none of the words are classified as spam? I'm sorry but I still can't understand how.

If i paste a message that passed:

REPLICA WATCH MODELS

Rolex, Patek Philippe, Bvlgari
Cartier, Gucci, Franck Muller

.. and 25 other most famous manufacturers

I obtain:

12/21/04 14.01.40.109 -- ()  Token Good Spam Prob is Spam
12/21/04 14.01.40.125 -- ()  and 1 0 0,4
12/21/04 14.01.40.125 -- ()  Bvlgari 0 0 0,2
12/21/04 14.01.40.125 -- ()  Cartier 0 0 0,2
12/21/04 14.01.40.125 -- ()  famous 1 0 0,4
12/21/04 14.01.40.125 -- ()  Franck 0 0 0,2
12/21/04 14.01.40.125 -- ()  From* 0 3 0,4
12/21/04 14.01.40.125 -- ()  Gucci 0 0 0,2
12/21/04 14.01.40.140 -- ()  manufacturers 1 0 0,4
12/21/04 14.01.40.140 -- ()  MODELS 0 0 0,2
12/21/04 14.01.40.140 -- ()  most 1 0 0,4
12/21/04 14.01.40.140 -- ()  Muller 0 0 0,2
12/21/04 14.01.40.140 -- ()  other 1 0 0,4
12/21/04 14.01.40.140 -- ()  Patek 0 0 0,2
12/21/04 14.01.40.140 -- ()  Philippe 1 0 0,4
12/21/04 14.01.40.140 -- ()  Received* 0 2 0,4
12/21/04 14.01.40.140 -- ()  REPLICA 0 0 0,2
12/21/04 14.01.40.140 -- ()  Return 3 4 0,3371
12/21/04 14.01.40.140 -- ()  Return-Path*Path* 3 4 0,3371
12/21/04 14.01.40.140 -- ()  Rolex 0 0 0,2
12/21/04 14.01.40.140 -- ()  Subject* 2 0 0,4
12/21/04 14.01.40.140 -- ()  To* 1 0 0,4
12/21/04 14.01.40.140 -- ()  WATCH 0 0 0,2
12/21/04 14.01.40.140 -- () ------------------------------------------------------------
12/21/04 14.01.40.140 -- ()  Cartier 0,2
12/21/04 14.01.40.140 -- ()  Bvlgari 0,2
12/21/04 14.01.40.140 -- ()  Franck 0,2
12/21/04 14.01.40.156 -- ()  WATCH 0,2
12/21/04 14.01.40.156 -- ()  REPLICA 0,2
12/21/04 14.01.40.156 -- ()  MODELS 0,2
12/21/04 14.01.40.156 -- ()  Gucci 0,2
12/21/04 14.01.40.156 -- ()  Rolex 0,2
12/21/04 14.01.40.156 -- ()  Muller 0,2
12/21/04 14.01.40.156 -- ()  Patek 0,2
12/21/04 14.01.40.156 -- ()  Return-Path*Path* 0,3371
12/21/04 14.01.40.156 -- ()  Return 0,3371
12/21/04 14.01.40.156 -- ()  Subject* 0,4
12/21/04 14.01.40.156 -- ()  most 0,4
12/21/04 14.01.40.156 -- ()  Philippe 0,4
12/21/04 14.01.40.156 -- ()  other 0,4
12/21/04 14.01.40.156 -- ()  famous 0,4
12/21/04 14.01.40.156 -- ()  and 0,4
12/21/04 14.01.40.156 -- ()  manufacturers 0,4
12/21/04 14.01.40.156 -- ()  From* 0,4
12/21/04 14.01.40.156 -- ()  Received* 0,4
12/21/04 14.01.40.156 -- ()  To* 0,4
12/21/04 14.01.40.156 -- **** R E S U L T S *********
12/21/04 14.01.40.156 -- passes Bayesian filter - 0% spam

To be clear: how could I or could it increase the value of "Patek"?

Thanks again.

Ciao

MArco

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 21 December 2004 at 8:27pm

Marco,

Continuo in italiano cosi' forse le cose saranno piu' chiare.

I primi due numeri nel corpus database indicano quante email catalogate come "buone" e quante email marcate come "spam" sono arrivate le quali contengono quella parola.

Nei tuoi due esempi:

*sex,1,1,0,400000005960464,16/12/2004
*unsubscribe,1,0,0,4000000 05960464,21/12/2004

  • vuol dire che:
    hai ricevuto 2 emails in tutto con la parola "sex", una e' stata marcata come spam, l'altra come buona.
  • hai ricevuto una sola email con la parola "unsubscribe", ed e' stata marcata come "buona".

Ora e' molto strano che dal 16 di dicembre ad oggi tu abbia ricevuto solo due emails con la parola "sex"... E' una parola usata spessissimo nello spam, e per questo penso ci sia una misconfigurazione nel tuo setup di SpamFilter.

Quando chiedi  <<Which filters? How can they say it's spam?>>, SpamFilter usa multiple tecniche diverse per catturare lo spam. Quando dici :

I'm full of blocked emails but none because SPAM

14000 Exceed RCPT
9000 IP in MAPS
5000 Keyword found
4000 Invalid MX
1000 SPF
800  Domain local blacklist

in realta' tutte le email che menzioni sono state bloccate perche' erano SPAM. I filtri usati per decidere se una determinata email era spam sono elencati in grassetto sopra. Ti rimando al file readme.htm che trovi nella directory di SpamFilter per vedere cosa sono questi filtri e come funzionano. Il filtro statistico Bayesiano del quale stiamo parlando in tutto questa thread non e' altro che uno dei tanti filtri che SpamFilter usa per trovare lo spam.

Ripeto che penso ci sia un problema con la tua configurazione, non perche' non hai emails bloccate dal filtro statistico (capita spesso, date che alcune delle altre tecniche usate da SpamFilter sono molto meglio del filtro bayesiano), ma perche' i numberi che menzioni non mi sembrano corretti. Per semplificare le cose, se puoi mandarci in uno zip i seguenti:

il file SpamFilter.ini
tutti i tuoi files con le blacklist/whitelist che hai configurato
la tua directory SpamFilter\corpus con i file che contiene

cercheremo di capire cosa non va'.

Per quanto riguarda la domanda sul "Patek", non puoi intervenire sul punteggio che SpamFilter assegna. Questo viene automaticamente aggiornato da SpamFilter ogni volta che un'email con quella parola arriva, a seconda da come viene catalogata dagli altri filtri.

Roberto F.
LogSat Software

Back to Top
Paul D View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Paul D Quote  Post ReplyReply Direct Link To This Post Posted: 27 December 2004 at 8:50am

I am running v395 and yet to see anything being blocked via bayesian filter all report 0% SPAM

I just made a copy of my corpus folder and stopped  delted it so it can start from scratch.

I find it hard to belive that out of all these emails less than 1% is being blocked..

any help would be appreciated.

Thanks

[Messages]
Spam=514936
Good=1274498

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 27 December 2004 at 9:54pm
Paul,

It is indeed strange, and it's possible SpamFilter is not configured properly and/or mail is being routed to SpamFilter in such a way as to mask the original source IP of the sender. Without knowing the source IP, SpamFilter will not be able to use many of its filters. Can you please zip and email us a copy of your SpamFilter.ini file, your blacklist and whitelist files, and one of your latest SpamFilter's logfiles so we can try to see what is happening?

Roberto F. LogSat Software
Back to Top
Paul D View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Paul D Quote  Post ReplyReply Direct Link To This Post Posted: 28 December 2004 at 9:19am
sent attachment to support@logsat.com ?
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 28 December 2004 at 10:06pm

Paul,

We received your files, and your settings appear to work fine. The activity logfile you sent shows that on the 27th SpamFilter blocked about 41% of your total incoming emails. The average for the previous days was a bit lower, showing that around 30% of your total emails is spam.

From your post it seems that you state SpamFilter is blocking only 1% of your emails. As that was not the case, we may have misunderstood. Did you mean that you find it hard to believe that the bayesian filter only stopped 1% of spam? If so, then actually that is absolutely normal. The Bayesian filter is used as a last resort to check for spam, after all the other filters have had a chance to do so. Only if they all fail is the Bayesian filter used. As such, it will indeed have mostly pre-screened emails to check, and will only tag a very small percentage of them.

As an example we provided a snapshot of our filter stats for 3 days worth of emails on the forum as follows:

94,828 IP found in MAPS search
74,161 IP address is from a blacklisted country
10,810 Invalid sender domain MX record
7,896 SPF Sender Policy Framework match
3,044 Keywords found in content
763 Exceeded maximum number of RCPT TO
526 Mail From and Mail To domains are equal
345 Statistical filter match
27 Mail From and Mail To are equal

According to the above, the Bayesian statistical filter on our own server only blocked 0.2% of the spam found by the other filters. However that is still 354 spam emails that were successfully blocked.

Roberto F.
LogSat Software

 

Back to Top
omaits View Drop Down
Newbie
Newbie
Avatar

Joined: 25 February 2005
Location: United States
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote omaits Quote  Post ReplyReply Direct Link To This Post Posted: 28 February 2005 at 9:33am

I have a question about this old topic....

I am going to restart our database like you mentioned above because I setup the system wrong and screwed up my database. I tried your technique but I cant figure out how to stop the service! The button that says STOP SERVICE is grayed out on the application. Also, I tried stopping to with ctrl+alt+del and it told me I wasnt allowed. How can I stop it?

Sorry if the question is stupid. Im a rookie.

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 28 February 2005 at 4:05pm
Either from a DOS prompt type "net stop spamfilter" or follow these instructions from symantec.com:

How to start or stop a service in Windows

Situation:
You have Windows NT, Windows 2000, or Windows XP, and you want to know how to start or stop a service.

Solution:
Log on as Administrator, and then follow the instructions for the operating system that is installed on the computer.

Windows NT 4.0:

  1. Click Start.
  2. Point to Settings, and choose Control Panel.
  3. Double-click the Services icon.
  4. Select the service that you want to start or stop.
  5. Click Startup. The Service Properties window opens.
  6. Start or stop the service and then close the Properties window.

    To start the service:
    1. Check Automatic.
    2. Click OK.
    3. Click Start.

    To stop the service:
    1. Check Manual.
    2. Click OK.
    3. Click Stop.

Windows 2000:
  1. Click Start, point to Settings, and then click Control Panel. The Control Panel appears.
  2. Double-click Administrative Tools.
  3. Double-click the Services icon. The Services window appears.
  4. Double-click the service that you want to stop or start. The Service Properties window appears.
  5. Click Start to start the service, or click Stop to stop the service.
  6. Click OK and close the Services window.

Windows XP:
  1. Click Start.
  2. Click Control Panel.
  3. Double-click Administrative Tools.
  4. Double-click the Services icon.
  5. Double-click the service that you want to stop or start. The Service Properties window appears.
  6. Look in the upper-left corner. If the service is running, click "Stop the service."
    If the service is not running, click "Start the service."
  7. Click OK and close the Services window.



Edited by LogSat
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
omaits View Drop Down
Newbie
Newbie
Avatar

Joined: 25 February 2005
Location: United States
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote omaits Quote  Post ReplyReply Direct Link To This Post Posted: 28 February 2005 at 4:33pm
Thanks Roberto....stupid question, I know. I apologize. Anyways, I deleted the directory and my Beyesian filter is busy learning what is/isn't spam. Thanks again.
Back to Top
Sundance View Drop Down
Newbie
Newbie


Joined: 18 July 2006
Location: Hungary
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sundance Quote  Post ReplyReply Direct Link To This Post Posted: 18 July 2006 at 4:53am

Hi Guys!

I think I HAD the same problem as a few other ones from the forum, and I know the solution. I live in Hungary and the server setting are hungarian. The decimal separator, therefore, is a comma (,). So, The corpus is like:

*sex,1,1,0,400000005960464,16/12/2004
*unsubscribe,1,0,0,400000005960464,21/12/2004

So that poor SPAM FILTER cannot understand its own corpus, because instead of 0.400000005960464 value, it reads ZERO because of the comma. And it cannot count and memorize real probability values. In my corpus, there were the commas, and EVERY prob. value was the same.

So what did I do? I went to Settings/Internetional settings, and set the "decimal separator" form comma (,) to a dot (.) . Then I terminated Smap Filter (stopping it is not enough), deleted the entire Corpus Directory, restarted Spam Filter, and waited for a few emails. Then dumped the corpus.

sex,2,1,0.21334443560464,16/12/2004

Yes, there are REAL probability values this time!!!! When testing, still everything is 0% spam :( But I can only hope, when I reach the 5000/5000 count, and the Bayes filter kicks in, everything will be OK.

Back to Top
WebGuyz View Drop Down
Senior Member
Senior Member


Joined: 09 May 2005
Location: United States
Status: Offline
Points: 348
Post Options Post Options   Thanks (0) Thanks(0)   Quote WebGuyz Quote  Post ReplyReply Direct Link To This Post Posted: 18 July 2006 at 6:07pm
If this is true then it should be a quick fix for Roberto.
http://www.webguyz.net
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 18 July 2006 at 9:47pm
Sundance,

You have an excellent report, and had us scrambling all afternoon to double-check the code behind the Bayesian calculations.

It does seem however that the "bug" is limited to the dump of the Bayesian corpus to screen. SpamFilter's internal probability data is stored in binary format in the db.dat.prb file. As it's stored in native binary format, there is no issue with comma/dot international headaches when reading/writing the file. So far, all internal calculations also appear to be using the binary format. The only time we convert the binary probability to text (thus falling victim to the dot/comma problem) is when we output the data on screen.

We'll be going over the code one more time to be more certain, but so far it does seem as the Bayesian filter itself does not have a problem with the decimal separator.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Sundance View Drop Down
Newbie
Newbie


Joined: 18 July 2006
Location: Hungary
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sundance Quote  Post ReplyReply Direct Link To This Post Posted: 19 July 2006 at 9:29am

Thank you for your answer!

Hmmmmm, then I've got some more serious problem. My Bayesian Filter shows 0%spam for all the messages (even before I deleted the folder, and it was way over 5000/5000 messages). When I tried the Bayesian Filter for test messages in the Settings/Bayesian window, the filter also reported 0% spam for everything.

If it is not the corpus, then what could cause this bug?

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 19 July 2006 at 10:19pm
Sundance,

Have you checked our exaplanation on why the Bayesian filter will have a lower "hit" ratio compared to the other filters earlier in this thread:
http://www.logsat.com/spamfilter/forums/forum_posts.asp?TID= 4647#4885

Once you reach the 5000/5000 emails, can you please check the Statistics tab in SpamFilter and post the results of how many emails are stopped by the various filters (this only works if you have enabled the quarantine database)?
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Sundance View Drop Down
Newbie
Newbie


Joined: 18 July 2006
Location: Hungary
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sundance Quote  Post ReplyReply Direct Link To This Post Posted: 20 July 2006 at 11:18am

Roberto,

Yes I have read the forums, and your explanation too. But, my bayes doesnt't have a 'lower' hit ratio, it does not catch _anything_.

All mails are 0%spam. 5000/5000 passed and everything was just like before. Nothing changed. Just like the bayes filter didn't even start, or something. Not any sign of the Bayes filter.

In Stats, MAPS, SURBL, keywords Filters are at about 60%,30%,10%. Bayes not mentioned there. :(

 

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 20 July 2006 at 4:22pm
Can you zip and email us your corpus directory, and one of SpamFilter's activity files, once you reach the 5000/5000 count?
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Sundance View Drop Down
Newbie
Newbie


Joined: 18 July 2006
Location: Hungary
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sundance Quote  Post ReplyReply Direct Link To This Post Posted: 21 July 2006 at 3:43am
Okay. Just a few days till 5000/5000 and I'll send them.
Back to Top
Sundance View Drop Down
Newbie
Newbie


Joined: 18 July 2006
Location: Hungary
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sundance Quote  Post ReplyReply Direct Link To This Post Posted: 24 August 2006 at 9:33am

Roberto!

Yesterday I sent it to You to support@logsat.com.

Did you get it?

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 24 August 2006 at 6:47pm
Yes, we've been analyzing it. The contents are rather strange, as while there's 42,738 entries, only FIVE tokens have a spam probability of .9 or higher. There's about 370 with a spam probability of .1 or less. The other 42,000 all have a probability of .4.

This is rather unusual. However from your corpus.ini file I see that you received about the same number of good emails as the amount of spam emails. This is also unusual, as normally the amount of spam is much higher than the amount of clean emails. This may be causing the bayesian filter some problems as the numbers are too similar.

Can you please also zip up one of the latest SpamFilter's activity logfiles so we can take another look?
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Sundance View Drop Down
Newbie
Newbie


Joined: 18 July 2006
Location: Hungary
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sundance Quote  Post ReplyReply Direct Link To This Post Posted: 28 August 2006 at 3:26am

Roberto,

Sadly enough, the logging was not enabled :( Until your post....

Since then, it is. What should I do now?

A. Send you the logfiles generated since aug 24?

B. Logfiles, and the corpus (which contains data before aug 24 as well)

C. Erase log and corpus NOW, and post both, say, a week later? So they will contain data about the same period of time?

regards,

Sundance

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 28 August 2006 at 4:24pm
Let's start with the simplest, which is D. Just zip and email us just one day's worth of logs (today or yesterday for ex) so we can see if there's any major issues immediately visible.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Sundance View Drop Down
Newbie
Newbie


Joined: 18 July 2006
Location: Hungary
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sundance Quote  Post ReplyReply Direct Link To This Post Posted: 29 August 2006 at 3:06am

OKay. I've sent it.

Thank you for Your help!!!!

Back to Top
Sundance View Drop Down
Newbie
Newbie


Joined: 18 July 2006
Location: Hungary
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sundance Quote  Post ReplyReply Direct Link To This Post Posted: 12 September 2006 at 10:59am

Hello....

Any ideas? Did you get my email????

Sundance

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 12 September 2006 at 11:34am
Actually on the 29th I sent you an email to say that we had not recived the email with the corpus file, and were waiting to hear back from you... sorry. Can you please re-send it?
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
 Post Reply Post Reply Page  12>
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.125 seconds.