Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - RegEx, Line breaks, and Case Sensitive Keywords
  FAQ FAQ  Forum Search   Register Register  Login Login

RegEx, Line breaks, and Case Sensitive Keywords

 Post Reply Post Reply
Author
DigitalMan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote DigitalMan Quote  Post ReplyReply Direct Link To This Post Topic: RegEx, Line breaks, and Case Sensitive Keywords
    Posted: 29 July 2003 at 7:09pm

I've been reading over the Regular_Expressions.htm file that installs with SpamFilter and have been trying to figure out how to make my keywords do two things, but I keep failing miserably (due to being a novice and not great with programming).  Any help would be grand.  I think I just don't know how to construct regular expressions at all.

1) I'd like to make a keyword string be case insensitive.  Currently, mixed case is getting through.  For example, if I have "human growth hormone" as a keyword string and "Human Growth Hormone" is a string in the email, it goes through because the cases don't match.

2) Similarly, if a string has a line break in it, it too is getting through.  For example:

human
growth
hormone

is getting through because the message hard coded line breaks.

Before you all flame me, I did read the Regular Expressions file several times and spent a couple hours trying to do these otherwise simple operations.  Apparently I just don't get it so I beg your collective forgiveness and earnestly request whatever help you can give.

Thanks,
--DM

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 29 July 2003 at 7:55pm

DM,

First, my view is just that ... my view.  You may want to read through all the recent posts on RexEx for some ideas but, if the RegEx has a "literal" word in all lower case, it will detect ANY case in the scanned message.  My experience says that this behavior is a function of the specific RegEx "Engine" ... in this case, and I do not know this for a fact, but it is acting so close to Delphi's engine that it must be modeled after that compilers interpreter.

Now that I have possibly made a fool of myself ... My "View" is that you do not want to look for specific words but rather the techniques that the spammers use to obscure the text itself.  If you look at some of my more recent posts, you will see that that is what I am trying to do ... and for the most part, it does a good job.

If you take a look at the actual source of a message, NOT the "rendered" version that you see in your mail client, you will see that most spam is riddled with strange html comments, and %'s and all sorts of crap.  That's what you want to build a filter to find.

Again ... my opinion only.  Now everyone can focus on shooting me down ... rather than you!

Dan S.

 

Back to Top
DigitalMan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote DigitalMan Quote  Post ReplyReply Direct Link To This Post Posted: 30 July 2003 at 1:33pm

Dan et. al,

Thanks for your reply.  I'm definitely going to start implementing some of the more advanced techniques as seen here on the site.  I've put one filter in place that works with the eleven-character comment tags.  However, some crap keeps coming in.

I'd still like to know how to make keywords case-insensitive though, as a lot of spam that reaches my inbox has certain keywords in the subject line that, if filtered case-insensitive, may reduce spam further.

Thanks again,
--DM

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 30 July 2003 at 8:50pm

DM,

 

Actually, I did answer it ... If you use all lower case in your RegEx (except for special chars that require caps), then the match will work for BOTH upper and lower case.  Example:

(<html>)  will match <html> or <Html> or <HtMl>  etc.

Dan S.

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.063 seconds.