
This article provides details of some of the more complicated Regular Expressions I’ve needed to compose for various projects, etc. It is a living document that I’ll update with any complicated regular expressions I have to develop.
On this page:
UK Post Codes
Based on Official UK Post Code Standards.
How the expression is composed…
GIR 0AA is an historical post code that does not conform to the current rules but it still valid and in use so needs to be explicitly catered for:
"[gG]{1}[iI]{1}[rR]{1} ?[0]{1}[aA]{2}"
Code language: plaintext (plaintext)
The different prefixes are detected individually:
AN prefix: "([a-zA-Z-[qvxQVX]]{1}[0-9]{1})"
ANN prefix: "([a-zA-Z-[qvxQVX]]{1}[0-9]{2})"
AAN prefix: "([a-zA-Z-[qvxQVX]]{1}[a-zA-Z-[ijzIJZ]]{1}[0-9]{1})"
AANN prefix: "([a-zA-Z-[qvxQVX]]{1}[a-zA-Z-[ijzIJZ]]{1}[0-9]{2})"
ANA prefix: "([a-zA-Z-[qvxQVX]]{1}[0-9]{1}[a-zA-Z-[ilmnopqrvxyzILMNOPQRVXYZ]]{1})"
AANA prefix: "([a-zA-Z-[qvxQVX]]{1}[a-zA-Z-[ijzIJZ]]{1}[0-9]{1}[a-zA-Z-[cdfgijkloqstuzCDFGIJKLOQSTUZ]]{1})"
Code language: plaintext (plaintext)
The suffix is always the same format (NAA):
"[0-9]{1}[a-zA-Z-[cikmovCIKMOV]]{2}"
Code language: plaintext (plaintext)
The full expression is therefore:
const string PostCodeRegEx = "(^[gG]{1}[iI]{1}[rR]{1} ?[0]{1}[aA]{2}$)|(^(([a-zA-Z-[qvxQVX]]{1}[0-9]{1})|([a-zA-Z-[qvxQVX]]{1}[0-9]{2})|([a-zA-Z-[qvxQVX]]{1}[a-zA-Z-[ijzIJZ]]{1}[0-9]{1})|([a-zA-Z-[qvxQVX]]{1}[a-zA-Z-[ijzIJZ]]{1}[0-9]{2})|([a-zA-Z-[qvxQVX]]{1}[0-9]{1}[a-zA-Z-[ilmnopqrvxyzILMNOPQRVXYZ]]{1})|([a-zA-Z-[qvxQVX]]{1}[a-zA-Z-[ijzIJZ]]{1}[0-9]{1}[a-zA-Z-[cdfgijkloqstuzCDFGIJKLOQSTUZ]]{1})){1} ?[0-9]{1}[a-zA-Z-[cikmovCIKMOV]]{2}$)";
Code language: C# (cs)
Email Addresses
This attempts to support valid forms of email address as defined by RFC5322.
It targets a base-form of local@domain.
The local part can be a combination of a-z, A-Z, 0-9 and ASCII codes of 33, 35-39, 42, 43, 45, 47, 61, 63, 94–96, 123–126. Periods are also allowed, but not as the initial character.
NOTE: Even though ASCII character codes of 32, 34, 40, 41, 44, 58, 59, 60, 62, 64, 91–93 are allowed they have an added restriction that they must be encapsulated in speech marks with back-slash escape-code identifiers, and as they would complicate the regex to the point where it would be nearly impossible to author they are being omitted/ignored in this case.
The domain part can be alpha-numeric with periods (but the initial character cannot be a period.) IP-addresses (v4 and v6) are both allowed as the domain part if required.
Local part: "^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*"
Domain part: "(\w+|(([\-\w]+\.)+[a-zA-Z]*)|(([0-9]{1,3}\.){3}[0-9]{1,3}))$"
Code language: plaintext (plaintext)
The full expression is therefore:
const string EmailRegEx = @"^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@(\w+|(([\-\w]+\.)+[a-zA-Z]*)|(([0-9]{1,3}\.){3}[0-9]{1,3}))$";
Code language: C# (cs)