Common Regular Expressions

I’ve gotten better and more comfortable with regular expressions as time has passed, and sometimes I spend timing wading through google for some common regular expressions I want to put into use, because I’m sure someone has already created it. Well, this isn’t always true (or easy to find), so I decided to collect some common Regular Expressions that may benefit readers. It’s a good idea to keep two things handy if you want to play with Regular Expressions yourself:

Common Regular Expressions:

Date and Time RegEx

Time format (no seconds):
HH:MM am/pm

^([1-9]|1[012]):(0[0-9]|[1-5][0-9])\s?(am|AM|pm|PM)$

Date in mm/dd/yyyy format, with an option for m/d/yyyy (exclude zero’s)

^(0?[1-9]|1[012])[ \/.-](0?[1-9]|[12][0-9]|3[01])[ \/.-](19|20)\d\d$

Date in dd/mm/yyyy format, with an option for d/m/yyyy (exclude zero’s)

^(0?[1-9]|[12][0-9]|3[01])[ \/.-](0?[1-9]|1[012])[ \/.-](19|20)\d\d$

Demographics RegEx

Age in years – max 122

^([0-9]|[1-9][0-9]|[1-9][0-1][0-9]|[1-9]2[0-2])$

Height in Feet and Inches:
6’3″

^([1-8]')?\s?([1-9]|1[01])$

Contact Information RegEx

U.S. Phone Number – parenthesis, periods, dashes, underscore, and spaces are allowed:
(123)456-7890
(123) 456 – 7890
( 123 )456-7890
1234567890
123.456.7890
123-456-7890
123 456 7890

^[\(\s\._-]*\d{3}[\)\s\._-]*\d{3}[\s\._-]*\d{4}$

U.S. Zip Code – 5 or 9 digit with dash

^\d{5}([\-]\d{4}){0,1}$

Email Address – (use preg_match) credit goes to fightingforalostcause.net

/^[-_a-z0-9\'+*$^&%=~!?{}]++(?:\.[-_a-z0-9\'+*$^&%=~!?{}]+)*+@(?:(?![-.])[-a-z0-9.]+(?

Currency RegEx

Currency – U.S. Dollars and Cents with commas for multiple’s of 1000 and a period for the decimal:
$12,000.23

^\$?([1-9]{1}[0-9]{0,2}(\,[0-9]{3})*(\.[0-9]{0,2})?|[1-9]{1}[0-9]{0,} (\.[0-9]{0,2})?|0(\.[0-9]{0,2})?|(\.[0-9]{1,2})?)$

Currency – British Pounds with commas for multiple’s of 1000 and a period for the decimal:
£12,000.23

^\u00A3?([1-9]{1}[0-9]{0,2}(\,[0-9]{3})*(\.[0-9]{0,2})?|[1-9]{1}[0-9]{0,} (\.[0-9]{0,2})?|0(\.[0-9]{0,2})?|(\.[0-9]{1,2})?)$

Currency – Euros with periods for multiple’s of 1000 and a comma for the decimal:
€12.000,23

^\u20AC?([1-9]{1}[0-9]{0,2}(\.[0-9]{3})*(\,[0-9]{0,2})?|[1-9]{1}[0-9]{0,} (\,[0-9]{0,2})?|0(\,[0-9]{0,2})?|(\,[0-9]{1,2})?)$

Currency – Euros, French style, with spaces for multiple’s of 1000 and a comma for the decimal:
€12 000,23

^\u20AC?([1-9]{1}[0-9]{0,2}(\s[0-9]{3})*(\,[0-9]{0,2})?|[1-9]{1}[0-9]{0,} (\,[0-9]{0,2})?|0(\,[0-9]{0,2})?|(\,[0-9]{1,2})?)$

Testing Regular Expressions with Color Highlighting

Discovered a great website today while working on a complex Regular Expression: RegExPal.com. The site provides color code highlighting for RegEx syntax, including real-time evaluation of test data. If you, for instance, forget to include a closing or opening parenthesis for a group inside the regular expression, the orphaned parenthesis will be highlighted in red for easy identification of the error.

Here is a quick screenshot of some of the highlighting in action:
RegExPal Screenshot

Be advised that this is based on JavaScript regular expressions, which are comparable to the PERL compatible RegEx functions, such as preg_replace() and preg_match(). It’s a good idea to get familiar with this syntax, as ereg() and eregi() style functions will be removed from PHP6 when it is released, sometime in the future. Make sure to also checkout the Regular Expressions Cookbook contributed to by the author of RegExPal.com.

Regular Expression Syntax (RegEx) Sample

The handy Regular Expression Syntax from the PHP book (pages 149-150). Check out the free PDF if you don’t have it already!

Regular Expression Syntax

^ – Start of string
$ – End of string
. – Any single character
( ) – Group of expressions
[] – Item range ( e.g. [afg] means a, f, or g )
[^] – Items not in range ( e.g. [^cde] means not c, d, or e )
– (dash) – character range within an item range ( e.g. [a-z] means a through z )
| (pipe) – Logical or ( e.g. (a|b) means a or b )
? – Zero or one of preceding character/item range
* – Zero or more of preceding character/item range
+ – One or more of preceding character/item range
{integer} – Exactly integer of preceding character/item range ( e.g. a{2} )
{integer,} – Integer or more of preceding character/item range ( e.g. a{2,} )
{integer,integer} – From integer to integer (e.g. a{2,4} means 2 to four of a )
\ – Escape character
[:punct:] – Any punctuation
[:space:] – Any space character
[:blank:] – Any space or tab
[:digit:] – Any digit: 0 through 9
[:alpha:] – All letters: a-z and A-Z
[:alnum:] – All digits and letters: 0-9, a-z, and A-Z
[:xdigit:] – Hexadecimal digit
[:print:] – Any printable character
[:upper:] – All uppercase letters: A-Z
[:lower:] – All lowercase letters: a-z

PERL Compatible (PCRE) only ( preg_*() )

/ – delimiter before and after the expression

Character classes:

\c – Control character
\s – Whitespace
\S – Not whitespace
\d – Digit (0-9)
\D – Not a digit
\w – Letter, Digit, Underscore [a-zA-Z0-9_]
\W – Not a letter
\x – Hexadecimal digit
\O – Octal digit

Modifiers:

i – Case-insensitive
s – Period matches newline
m – ^ and $ match lines
U – Ungreedy matching
e – Evaluate replacement
x – Pattern over several lines