Print Page | Close Window

"RegEx for fun & profit"

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: http://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1355
Printed Date: 18 October 2017 at 9:08am


Topic: "RegEx for fun & profit"
Posted By: Desperado
Subject: "RegEx for fun & profit"
Date Posted: 11 July 2003 at 8:45am
 
All,
 
I thought I would throw out a few ideas to convince some of the SpamFilter ISP users of the potential power of RegEx's (Regular Expressions).  Perhaps this will help when you are trying to nail some of the more ingenious Spam techniques without "throwing out the baby with the bath water". 
 
Let me preface this with a "Disclaimer".  I am no expert with Regular Expressions but having a fair amount of experience with Perl, I have been forced to learn and use them over the years.  Each software package has it's own "Engine" to interpret the expressions so you always have to "Play" with them to get them right.  I make no claims whatsoever about the accuracy of the information below. DO NOT USE the expressions I have here ... use them only as a starting point. I should also state that I am in no way affiliated with LogSat and as such, LogSat can not take any responsibility for any of my stupid mistakes!
 
I feel that anything that knocks out a few Spams here and a few Spams there eventually adds up to help but it is important to make sure that any filter is actually doing something useful because the longer your black lists are, the harder the software has to work.  I do a log parse run each day to see if my filters are effective and I take anything out that is not helping.
 
OK ... One thing I did was come up with a "standard" expression that will describe a generic email address construct as:
 
(([\-a-zA-Z0-9_\.\+])+@([\-a-zA-Z0-9_\.\+]+\.)+[a-z]{2,6})
 
Once you have this, you should be able to use the format to kill off "Bad" addresses.  As an example, Hotmail has announced that any address starting with a digit, is not valid.  Therefore, I can construct an expression such as:
 
(\b[\d+]+([\-a-zA-Z0-9_\.\+])+@hotmail\.com)  to detect and block it.  WARNING:  I believe that if there is one bad address in the "TO" field, the entire message gets blocked so this should only be used in the "From" field.
 
Here is a list I have come up with that describes some know "Bad" email constructs:
 
  • numeric-only localparts aol.com, msn.com, bellsouth.net, brandeis.edu
  • localparts starting with a digit from juno.com and hotmail.com
  • localparts longer than 16 characters from aol or hotbot or canada.com
  • localparts w/ _ and longer than 16 characters and at least 1 digit @(hotbot|juno|rocketmail|excite|hotmail|mail).com
  • mailto:test*@test.com" CLASS="ASPForums" TITLE="WARNING: URL created by poster. - test*@test.com
For a good laugh,  This is the regular expression that I used in my Sendmail Server to attempt to slow the flood down.  I AM NOT RECOMENDING THIS!  This EXACT RegEx does, in fact, work with "ActiveState" Perl!
 
 ^(mailer\-daemon[0-9]+.*<@.*|.*([0-9].*prsesly|discounts|software[0-9])<@yahoo\.com|.*(saveonink|printsupplies|inkjet|toner_).*<@.*|subscriber_services[0-9]+<@.*|test.*<@test.*\.com|[0-9]+<@(aol\.com|msn\.com|bellsouth\.net|brandeis\.edu)|[0-9][^<]*<@(hotmail|juno)\.com|.{16}[^<]+<@(canada|aol|hotbot)\.com|.{10}.*_.{2}.*[0-9].{2}.*<@(hotmail|juno|rocketmail|hotbot|excite|yahoo|msn|mail)\.com|.*free4you<@.*|.*_...._._._.<@.*brandeis\.edu|INVESTMENT_ALERT-.*|xtrafreeporn.*|Nasdaq_Newsdesk.*|ListsOnSale.*|InvestorInsights__.*|subscriptionssavings_.*|MarketingLists.*[0-9].*<@.*)\.?>
 
Wasn't that fun?
 
Dan S.
 



Print Page | Close Window