Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Help with multiple RegEx search strings and using
  FAQ FAQ  Forum Search   Register Register  Login Login

Help with multiple RegEx search strings and using

 Post Reply Post Reply
Author
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Topic: Help with multiple RegEx search strings and using
    Posted: 27 December 2004 at 5:15pm

Ok let me setup a simple example to ask my question. If I want to block "bob" you put that in your keyword list.

Now if I wanted to block "bob" and "b o b" I put this line in my keywords:

(bs*os*b) the s looks for a space and * looks for zero or more spaces between letters

I've tested this and it works well but you have to be very careful using it.

Now I wanted to take it one more step using the "|" to stop things like b-o-b so I tried:

(b(s|-*)o(s|-*)b) and it worked great BUT it's not only going to block "b o b" but it will block "bob" also. So I tried to use a comma to add a keyword to the front or back of this and I cannot get it to work. eg:

(b(s|-*)o(s|-*)b),test

So it SHOULD block "bob test" or "b o b test"

But it's not working.

Am I using the proper syntax or the correct number of brackets or what?

I lost my mind this holiday season so any help would be appreciated.

Thanks,

Bob

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 28 December 2004 at 11:30am

Hmmm ... I am slightly confused.  EXACTLY what do you want blocked and *not* want blocked?

As an example: ((?i)((b.o.b)))   Will block B O B and b-o b etc but not Bob.

Is this the sort of thing you are looking for?

WARNING ... this will also block baoab (not that that is a word!)

Dan S.

Back to Top
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Posted: 29 December 2004 at 11:07am

Sorry for my rambling I think I'm getting close to an answer but here's another way to say it.

Say we are going to block "online drugs" we add:

online,drugs

to the keyword list but we also need the variations using cheap drugs and online pharmacy so it now would read

cheap|online,drugs|pharmacy or we could use brackets

(cheap|online),(drugs|pharmacy) now if I wanted to look for upper/lower case

((?i)(cheap|online),(drugs|pharmacy))

At this point everything is ok but if we take to the next level and block anyone putting a space or hypen between letters then I start getting string errors. I'll just use one word to as an example so this doesn't get too long

((?i)(c(s|-*)h(s|-*)e(s|-*)a(s|-*)p|online),(drugs|pharmacy))

When I do this I start getting string errors in my log and I'm pretty sure I'm not using the proper amount of brackets in the correct places?

Once I start adding (s|-*) between the letters of "cheap" do I need another set of brackets to enclose the word and then another set to enclose "cheap|online" ????

So back to my original post I have found this to work well between letters

(s|-*) as it will look for zero or more of both space and hypen but you have to be careful with it, if you put this in your keywords by itself

c(s|-*)h(s|-*)e(s|-*)a(s|-*)p

it will not only block any variation of "c h e A p" or "c-h-e-a-p" but it will also block the plain word "cheap" and now you have people upset with you.

So I tried to use this rule with "|" and "," to put more keywords together and that's where my troubles started.

Sorry for the long post but I hope this makes more sense.

Thanks,

Bob

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 29 December 2004 at 11:31am

First, whats with the commas?  I do not think you want any commas in your RegEx's

Let's start there and I will look at your expressions early this afternoon.

Dan

 

Back to Top
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Posted: 29 December 2004 at 11:44am

Now we are on the same wave length and I wonder if this is the problem. You use comma's to separate keywords you want the filter to find anywhere in the email body. This is a function of SpamFilter coding not RegEx.

So the Million dollar question :-) that I'm trying to get to:

Can you use two different RegEx rules on the same line AND separate them by comma's so the rule will find two keywords in an email?

Does this help?

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 29 December 2004 at 12:14pm
I have been wrong before and I hope Roberto will correct me if I mis-state this but ... *do not* use commas.  They will be treated as literal commas and will not have the effect you think.  Also, The SpamFilter implementation of RegEx can not "look ahead" so the "AND" function is not valid.  What this means is that you can find 2 words in a message using a variety of ways to separate them but *only* in the order they are presented in the expression.

An additional issue is, if the expression is written to look for 2 words separated by too many characters, the expression may fail with a "Loop Stack Exceeded" error.  Search your logs for "String Match" and you will locate any RegEx's that are failing.  Often, however, a RegEx works in some cases but causes a "Loop Stack Exceeded" error in others.   In fact, I just spent many hours "adjusting" 2 expressions that were causing that error way too often.  I feel that it is probably a good idea to limit the "Search Scope" of an expression to avoid those errors as with the expression example below:

((?i)Subject:(([\s]|[\!-\xB4]){0,10}[\|]){2})

I would prefer to *not* use the "{0,10} clause but if I don't, I get many failures.

BTW, if anyone has a less complex way of doing the above RELIABLY, please chime in.  This expression very simply catches any instance of 2 or more "Pipe" characters anywhere in the subject even if the pipes are separated by any of the first 115 ASCII characters.

Dan S.

Back to Top
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Posted: 31 December 2004 at 11:36am

Roberto,

Can you give us some input here? I think Dan is saying the same this as me but clarify:

Is it ok to use comma's when seprating keywords EXCEPT when you use RegEx on both sides of the comma?

This is one of my top rules and obviously it's working well:

((?i)cheap|online,(d(\^|\s|-|\.*)r(\^|\s|-|\.*)u(\^|\s|-|\.*)g(\^|\s|-|\.*)s))

This rule first finds cheap or online THEN looks for any variation of the word "drug"

d r u g

d.r.u.g

d-r-u-g

or any variation

d.r u-g

But as soon as I try to add any RegEx to the "cheap or online" the rule stops working?

I'm still doing some testing and I think this will work but I'm not getting my escape brackets in the correct spot or enough/too many of them.

I'll post more as I find more but Roberto may have some better insight for us?

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 02 January 2005 at 11:14pm
Dan,

That was a good call, the comma in a RegEx *will* be interpreted as a literal, it cannot be used to "and" two expressions. RegEx is as powerful as it is complicated, so we did that by design as with RegEx it should usually be possible to construct an expression such as to obtain what is required "and-ing" two or more expressions.

Roberto F. LogSat Software
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4065
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 02 January 2005 at 11:16pm
Bob,

Dan is correct. Please see http://www.logsat.com/spamfilter/forums/showmessage.asp?messageID=4921 for a confirmation.

Roberto F. LogSat Software
Back to Top
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Posted: 03 January 2005 at 11:56am

Ok guys thanks for the feedback but through this process I've figured out how to make it work. I have multiple Regex rules separated by comma's including some plain words and it appears to be working and becoming one of my top rules in my "Top Keywords" report.

Let me monitor this for the next week or so and I'll report back as it appears to be a very powerful way to use RegEx multiple times in complicated rules.

Bob

Back to Top
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Posted: 03 January 2005 at 5:56pm

Dan or Roberto,

What RegEx would you use to replace a comma in the fashion I'm trying to use it to "AND" two "expressions" or keywords together in an email?

I'd love to use (\s*) between each letter of mortgage but using the * will also block the actual word "mortgage."

So I need to combine a couple keywords like

approved,((m(\s*)o(\s*)r(\s*)t(\s*)g(\s*)a(\s*)g(\s*)e))

What would you use to replace the comma?

Thanks in advance,

Bob

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.078 seconds.