Filtering l33t speak
Would anyone know what regex I could use to match any word that is a combination of letters and numbers or letters and special charaters, that is used to spell out words? i.e. similar to l33t speak (not totally or all l33t speak). For example when $ is used to substitute S or 3 for E or 1 for i or 1 for L or 0 for O etc.
June 30th, 2015 10:03am

Best place is Google for this:

Use this to help test your RegEx - http://www.regexr.com/

The last link is especially helpful for testing before something goes into production.  I've used it for creating custom DLP Templates or even for querying text with PowerShell.  Let me know if you need any further information.

Free Windows Admin Tool Kit Click here and download it now
July 1st, 2015 11:45am

Changed what this from what it originally said because I've moved on from that point. Basically I originally thought Exchange regular expressions and therefore any online regex tester wouldnt work. Anyway just for the heck of it I tried something on the test site you suggested and then used it as a test rule. The rule seems to work so i did a few more and it does seem to work the same.

Now that I know exchange works the same and what is stated in  'Regular Expressions in Transport Rules' is not the total list, I can create better patterns.

I think creating 1 or 2 regex to find all l33t speak will be too complicated or beyond me so I've decided to create a few little ones to match specific patterns. I think I have an idea how to do that but I still have issues with the logic, for example:

\$.*\D

I read that as it should have a $ followed by 0 or more of any character and a non numeric digit but doesnt seem to be the case it matches $ on its own. Think i have a problem with what some of the syntac actually means or its range.


  • Edited by AOgilvie 19 hours 10 minutes ago
July 2nd, 2015 5:16am

I find that when helping someone with RegEx that it's easier to help if I know the pattern you are building the query for.  Is there a specific word or patter your are looking for?  Most queries I've worked with are bank numbers and SSN numbers and there are plenty of code samples out there for that. 

Take a look at some RegEx tutorial sites that give real examples and that might help:

If those don't clear things up then just give me an example of what you want the RegEx to match and I will see what I can do to help.

Free Windows Admin Tool Kit Click here and download it now
July 2nd, 2015 8:32am

Changed what this from what it originally said because I've moved on from that point. Basically I originally thought Exchange regular expressions and therefore any online regex tester wouldnt work. Anyway just for the heck of it I tried something on the test site you suggested and then used it as a test rule. The rule seems to work so i did a few more and it does seem to work the same.

Now that I know exchange works the same and what is stated in  'Regular Expressions in Transport Rules' is not the total list, I can create better patterns.

I think creating 1 or 2 regex to find all l33t speak will be too complicated or beyond me so I've decided to create a few little ones to match specific patterns. I think I have an idea how to do that but I still have issues with the logic, for example:

\$.*\D

I read that as it should have a $ followed by 0 or more of any character and a non numeric digit but doesnt seem to be the case it matches $ on its own. Think i have a problem with what some of the syntac actually means or its range.


  • Edited by AOgilvie Thursday, July 02, 2015 12:17 PM
July 2nd, 2015 9:14am

Thanks for that, its certainly helpful. There more definition/explaination I get the better as the issue I was having was more to do with the definitions or my understanding of the definitions.  In the example I gave above I misread the definition for \D and didnt realise \D included whitespace, which is why i was getting unexpected matches.  To answer your question as to what I was trying to match, I was trying to find any words that contained $ and letters or at least in my example I thought it would find any word that began with $ or had $ in the middle with a letter within the word or at the end.  That is, it should find $abc, $4bc, $a4bc etc but not $10 or $ on its own.

I think I'm getting my head around it slowly.

 
Free Windows Admin Tool Kit Click here and download it now
July 3rd, 2015 3:25am

Thanks for that, its certainly helpful. There more definition/explaination I get the better as the issue I was having was more to do with the definitions or my understanding of the definitions.  In the example I gave above I misread the definition for \D and didnt realise \D included whitespace, which is why i was getting unexpected matches.  To answer your question as to what I was trying to match, I was trying to find any words that contained $ and letters or at least in my example I thought it would find any word that began with $ or had $ in the middle with a letter within the word or at the end.  That is, it should find $abc, $4bc, $a4bc etc but not $10 or $ on its own.

I think I'm getting my head around it slowly.

Here's what I've come up with thus far:

\$[^\,\.\s]*[a-zA-Z]|[a-zA-Z][^\,\.\s]*\$

It seems to be working as to be expected i.e finds any words that substitutes $ for s but does not match valid use of $ ($10 for example). 

  • Edited by AOgilvie 23 hours 51 minutes ago
July 3rd, 2015 7:23am

Thanks for that, its certainly helpful. There more definition/explaination I get the better as the issue I was having was more to do with the definitions or my understanding of the definitions.  In the example I gave above I misread the definition for \D and didnt realise \D included whitespace, which is why i was getting unexpected matches.  To answer your question as to what I was trying to match, I was trying to find any words that contained $ and letters or at least in my example I thought it would find any word that began with $ or had $ in the middle with a letter within the word or at the end.  That is, it should find $abc, $4bc, $a4bc etc but not $10 or $ on its own.

I think I'm getting my head around it slowly.

Here's what I've come up with thus far:

\$[^\,\.\s]*[a-zA-Z]|[a-zA-Z][^\,\.\s]*\$

It seems to be working as to be expected i.e finds any words that substitutes $ for s but does not match valid use of $ ($10 for example). 

  • Edited by AOgilvie Friday, July 03, 2015 7:36 AM
Free Windows Admin Tool Kit Click here and download it now
July 3rd, 2015 7:23am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics