Print Page | Close Window

Statistics: keyword

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=5755
Printed Date: 31 July 2025 at 11:19am


Topic: Statistics: keyword
Posted By: meatboy
Subject: Statistics: keyword
Date Posted: 15 August 2006 at 2:21am

Hi,

As a suggestion to improve Spamfilter would it be possible to add statistics on how often a keyword has been used to find spam? A count showing the keywords/regex effectiveness?

I suspect the order that the keywords are scanned would mean that keywords that are "higher up" in the list would tend to score more but the information would at least show those keywords that are not of any use. The idea is to reduce the number of useless words.

Could this be implemented and would it be of any use?

Tim




Replies:
Posted By: sgeorge
Date Posted: 16 August 2006 at 11:40am
Hi meatboy.  Actually, this is very possible if you are using a quarantine database and quarantine all messages that blocked because of keyword matches.  Here's some SQL that should give you a list of the keywords that sent messages to the quarantine, sorted by greatest # of occurances per keyword:

SELECT rejectdetails, count(*)
FROM tblQuarantine
WHERE rejectid = 13
group by rejectdetails
ORDER BY count(*) desc


Stephen



Posted By: LogSat
Date Posted: 16 August 2006 at 4:13pm
Thanks sgeorge, excellent idea, we'll be using that ourselves !!

-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: sgeorge
Date Posted: 16 August 2006 at 4:57pm
Oh my , well thanks!

Stephen


Posted By: meatboy
Date Posted: 16 August 2006 at 7:15pm

Hi Sgeorge,

I have tried a quarantine DB but only the access one that LogSat provide. I like you idea though. Perhaps I can swap over to a SQL Db instead.  Thanks for the idea!

Tim



Posted By: Desperado
Date Posted: 18 August 2006 at 9:09am

Hmmm ... I still think Sawmill works better for this as it looka at ALL the blocked items rather than just the ones that are still in quarantine as below: (not sure how this will post)

  Keywords Messages Bytes
1 [((?i)<div><font face=arial size=2><img alt="" hspace=0)] 1,863 33.8 % 14.58 M
2 [((?i)Subject:=\?ISO\-\d*\-\1?.*\?[a-z0-9]{20,})] 912 16.6 % 1.77 M
3 [((?i)\<(font|span)[^>]+style[^>]+float[^>]*:[^>]*right)] 874 15.9 % 1.90 M
4 [(con\$ign\-net)] 434 7.9 % 7.57 M
5 [((?i)(((been|are) pre\-(approved|qualified))|(email is a commercial adv)))] 319 5.8 % 639.00 k
6 Found prohibited attachment 261 4.7 % 677.00 k
7 [((?i)<.?object)] 215 3.9 % 583.00 k
8 Exceeded max spaces in subject 148 2.7 % 1.04 M
9 [((?i)Subject:=\?utf\-\d*\?.*\?[a-z0-9]{20,})] 110 2.0 % 181.00 k
10 [((?i)<.?iframe)] 109 2.0 % 279.00 k
11 [Found Content-Transfer-Encoding=base64 and Content-Type=text/html/plain] 65 1.2 % 179.00 k
12 [((?i)((want watch)|(need watch)|(r0lex)|(bom\.evif)|(/replica/)|(z\.php)|(/r/sales)|(/rep/sales)|((fogw)|(eank)|(toels)... 57 1.0 % 116.00 k
13 [((?i)\<\!\[cdata\[)] 37 0.7 % 77.00 k
14 [((?i)Subject:(([\s]|[\!-\xB4]){0,10}[\|]){2})] 29 0.5 % 5.00 k
15 [((?i)style>(.){5,30}visibility: hidden;)] 26 0.5 % 27.00 k
16 [((?i)((ivacy is extremely import)|(this is not spam)|(not wanting to receiv)|(killers without prescrip)))] 25 0.5 % 45.00 k
17 [((?i)subject:.*(@.+@))] 11 0.2 % 13.00 k
18 [((?i)subject:.*((ڨ){3,}.*))] 7 0.1 % 0 b
19 [((?i)https://www\.surepayroll\.com)] 3 0.1 % 3.00 k
  Total 5,505 100 %   29.61 M




-------------
The Desperado
Dan Seligmann.
Work: http://www.mags.net
Personal: http://www.desperado.com



Posted By: sgeorge
Date Posted: 18 August 2006 at 10:08am
meatboy, I use an Access DB as well and that query does the trick for me... I believe that the query should work unchanged in SQL and mySQL as well.

Desperado, good point.  Only reason I don't use Sawmill is because my evaluation expired.   Hey, glad to see that some of my keywords are in your top 5 list.

Stephen


Posted By: Desperado
Date Posted: 18 August 2006 at 10:24am

Stephen,

Top 1 even!

Thanks



-------------
The Desperado
Dan Seligmann.
Work: http://www.mags.net
Personal: http://www.desperado.com




Print Page | Close Window