Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Keyword may have not been scanned
  FAQ FAQ  Forum Search   Register Register  Login Login

Keyword may have not been scanned

 Post Reply Post Reply
Author
sgeorge View Drop Down
Senior Member
Senior Member


Joined: 23 August 2005
Status: Offline
Points: 178
Post Options Post Options   Thanks (0) Thanks(0)   Quote sgeorge Quote  Post ReplyReply Direct Link To This Post Topic: Keyword may have not been scanned
    Posted: 16 February 2007 at 9:58am
Hi All, long time no see. :)

One message came through that I was hoping a RegEx blacklist keyword would match.  I've checked my logs to see if there was any whitelisting, or if part of the message was skipped for being over the max scan size, and from the logs it looks like neither was the case.

Here's the RegEx keyword:
((?i)\w ?\w ?\w ?\w ?\. ?p ?k)


Here is a copy of the plain-text content of the message:

front of get-smart playtime can create has a A lack of spontaneous three =
mornings and parents alike.=20

Nothing could be better than=20
 ● CHINA BIOLIFE ENTERP (CBFE.PK) ● STOCK!!!

New CBFE.PK STOCK this is GREAT OPPORTUNITY  to BE a rich man!!!
Forecasts for YOU is only positive just purchase this CBFE.PK SHARE!!!

Trust us cause we ASSURE U the real profit!!!
For more info about CBFE.PK check brokers web-site!!!

Hurry up U must buy this CBFE.PK SHARE on FRIDAY: 02/16/07

 a lack of playtime  contribute to depression become creative, videos, =
enrichment


And here's the relevant snippet from the log files (i.p.s and addresses have been changed...):

02/16/07 04:25:25:984 -- (6628) Connection from: 123.123.123.123  -  Originating country : United States
02/16/07 04:25:26:343 -- (6628) Resolving 123.123.123.123 - intrepid.xo.com
02/16/07 04:25:26:718 -- (6628) - SPF analysis for spam.com done: - none
02/16/07 04:25:26:718 -- (6628) Mail from: sender@spam.com
02/16/07 04:25:27:640 -- (6628) - MAPS search done...
02/16/07 04:25:27:640 -- (6628) RCPT TO: recipient@mydomain.com accepted
02/16/07 04:25:27:890 -- (6628) EMail from sender@spam.com to recipient@mydomain.com passes Bayesian filter - 22.3561% spam  (31ms)
02/16/07 04:25:27:890 -- (6628) EMail from sender@spam.com to recipient@mydomain.com was queued. Size: 1 KB, 1024 bytes
02/16/07 04:25:27:906 -- (7496) Sending email from sender@spam.com to recipient@mydomain.com --
02/16/07 04:25:27:953 -- (6392) Time to add Msg to Bayes corpus:0
02/16/07 04:25:28:047 -- (6628) Disconnect
02/16/07 04:25:28:218 -- (7496) EMail from sender@spam.com to recipient@mydomain.com --  was forwarded to 10.10.10.1:26


I am running v 3.1.3.615.  Also, my max scan setting in SpamFilter.ini is:
MaxMsgSizeForKeywordScan=64

Thanks for your help.  I'm hoping that I'm just missing something, but it seems kind of funky.

Stephen


Edited by sgeorge
Back to Top
sgeorge View Drop Down
Senior Member
Senior Member


Joined: 23 August 2005
Status: Offline
Points: 178
Post Options Post Options   Thanks (0) Thanks(0)   Quote sgeorge Quote  Post ReplyReply Direct Link To This Post Posted: 16 February 2007 at 10:04am
Also, I meant to mention something interesting I noticed in my "RegEx Test" tab in SpamFilter.  If I enter the RegEx search string "(?i)\w ?\w ?\w ?\w ?\. ?p ?k" (no quotes),  I found the following...

The pattern was found in this text:
All work and no play makes Jack a dull boy.
All work and no play makes Jack a dull boy.
All work and no play makes Jack a dull boy.  TEST.PK
All work and no play makes Jack a dull boy.
All work and no play makes Jack a dull boy.


But it was not found in this text:
All work and no play makes Jack a dull boy.
All work and no play makes Jack a dull boy.
All work and no play makes Jack a dull boy.
All work and no play makes Jack a dull boy.  TEST.PK
All work and no play makes Jack a dull boy.


Thanks for listenin'. :)

Stephen
Back to Top
sgeorge View Drop Down
Senior Member
Senior Member


Joined: 23 August 2005
Status: Offline
Points: 178
Post Options Post Options   Thanks (0) Thanks(0)   Quote sgeorge Quote  Post ReplyReply Direct Link To This Post Posted: 23 February 2007 at 6:49pm
Just a mini-update...

I tried doing a full uninstall & reinstall of v 3.1.3.615.  Oddly, it did not fix the problem.

Stephen
Back to Top
ImInAfrica View Drop Down
Groupie
Groupie
Avatar

Joined: 27 June 2006
Location: FL, USA
Status: Offline
Points: 60
Post Options Post Options   Thanks (0) Thanks(0)   Quote ImInAfrica Quote  Post ReplyReply Direct Link To This Post Posted: 25 February 2007 at 4:20pm

I tested this on 650 and can confirm same issue
Looks like over certain number of characters before the regex hit the regex fails.

Amir

Back to Top
sgeorge View Drop Down
Senior Member
Senior Member


Joined: 23 August 2005
Status: Offline
Points: 178
Post Options Post Options   Thanks (0) Thanks(0)   Quote sgeorge Quote  Post ReplyReply Direct Link To This Post Posted: 26 February 2007 at 5:11pm
Hey, thanks for testing it man. 

Edited by sgeorge
Back to Top
mikek View Drop Down
Senior Member
Senior Member
Avatar

Joined: 22 February 2005
Location: Switzerland
Status: Offline
Points: 133
Post Options Post Options   Thanks (0) Thanks(0)   Quote mikek Quote  Post ReplyReply Direct Link To This Post Posted: 08 March 2007 at 10:54am
I can confirm this, I was always wondering why so many spams with inline images came through, although I had the correct "src=cid:..." keywords set.

Just tested my keyword with a mail that came through. If I paste the whole email, the regex test outputs "not found". If I just paste a few lines around the src=cid, it will output "found", like it should...

This is a serious issue that has to be looked into!

Cheers,

Mike
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 08 March 2007 at 11:05am
Mike,

Can you please froward us the whole email (headers and email body included)?
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
mikek View Drop Down
Senior Member
Senior Member
Avatar

Joined: 22 February 2005
Location: Switzerland
Status: Offline
Points: 133
Post Options Post Options   Thanks (0) Thanks(0)   Quote mikek Quote  Post ReplyReply Direct Link To This Post Posted: 08 March 2007 at 11:06am
Just did some more tests and it looks like it has something to do with the regex that is used...

For me, the error shows with this regex: src="cid:(.)*\$(.)*@(.)*"

E-Mail is on it's way...



Edited by mikek
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4068
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 09 March 2007 at 11:34pm
Everyone,

It seems that some of your RegEx are causing a stack overflow for their complexity, and while SpamFilter will recover from the error, this will cause it to miss the keyword match in that particular string.

We're currently looking at the "greedy" option in RegEx, that is enabled by default in SpamFilter. In the sample mikek provided, we modified his RegEx to include the modifier:
(?-g) at the beginning of the expression. This disables the "greedy" mode in RegEx and successfully detects the string.

Mike, if you change your string from:

((?i)(src="cid:(.)*\$(.)*@(.)*"))

to

((?-gi)(src="cid:(.)*\$(.)*@(.)*"))
or
((?-g)(?i)(src="cid:(.)*\$(.)*@(.)*"))

your expression will work.

Unfortunately this means you may have to add the (?-g) modifier in all your RegEx. We're looking into what side-effects we'd have if we were to disable greedy mode by default in SpamFilter...
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
mikek View Drop Down
Senior Member
Senior Member
Avatar

Joined: 22 February 2005
Location: Switzerland
Status: Offline
Points: 133
Post Options Post Options   Thanks (0) Thanks(0)   Quote mikek Quote  Post ReplyReply Direct Link To This Post Posted: 12 March 2007 at 6:52am
Hi Roberto

turning off "greedy" mode worked!

personally, i would not change the default behaviour, but maybe update the documentation to state that greedy mode is on by default (as it is with most regex implementations) and mention the -g parameter.

it would also be nice if an exception caused by a regex would be logged...

Cheers,

Mike
Back to Top
sgeorge View Drop Down
Senior Member
Senior Member


Joined: 23 August 2005
Status: Offline
Points: 178
Post Options Post Options   Thanks (0) Thanks(0)   Quote sgeorge Quote  Post ReplyReply Direct Link To This Post Posted: 12 March 2007 at 10:46am
...Nice detective work.  Thanks you two!

Stephen
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.078 seconds.