Regular Expression Alternations
I promised in my last article that I'd have more to say about regular expression. So here you have it. Today, we look at another tough problem for Regular Expressions. Let’s consider writing a pattern to validate that a string contains at least one digit, at least one uppercase letter, and at least 6 characters. One would think that this would be easy. But here we’re faced with an AND situation and the regular expression syntax doesn't provide an AND operator. Take, for example, any validation problem that has the form…
A and B and C
Without an AND operation in Regex, you are almost forced to go outside of the pattern and implement the test in multiple patterns and multiple passes of your validator. That may be the best approach, or it may be impossible if you are working with a blackbox validator and must provide a single Regex pattern.
I've found two approaches that can be applied to solve such problems. We'll discuss one of those approaches today. The approach is to use the “Alternation” pattern. MSDN documentation lists 3 Alternations, the simple OR, or vertical bar, the "expression" and the "name" alternation. Our problem requires the "expression" version.
(?(expression)yes|no)
This pattern is used like an IF-THEN-ELSE programming pattern. In the IF-THEN-ELSE pattern, the THEN portion can function like an AND. We could rewrite our test above as…
IF A THEN IF B THEN IF C THEN MATCH
But the alternation pattern isn’t as flexible as a programming language like C# or VB.Net. We can’t easily drop the ELSE as we did above. So, our pseudo code above would have to take the form…
The psuedo regex pattern using our original requirements looks something like this...
This should look very similar to the IF-THEN-ELSE statement above. All we have left now is to write the individual patterns. We need a pattern for each test, a pattern to match the whole string and a pattern that will never match any string.
(.*[A-Z].*) matches if there's at least one upper case letter in the string
(.*[0-9].*) matches if there's at lease one digit in the string
(.{6,}) matches if there are 6 or more characters in the string
(.*) matches the entire string
([^\W\w]) won't match anything
Notice how each test is written in such a way that the whole string is selected. That way all tests are operating on the exact same input. Now piece it together and it looks like this...
(?(.*[A-Z].*)(?(.*[0-9].*)(?(.{6,})(.*)|([^\W\w]))|([^\W\w]))|([^\W\w]))
It's not very pretty, but it's conceptually straight forward. It follows the pseudo-regex precisely. See if you can format it nicely like the pseudo-pattern above.
The final step is to verify our pattern with some code. We’ll use C# for that. Just create a new Windows Forms Application project in C#. On the form, add a reference to the Regular Expression name space.
Next drop a Label and a Textbox on the form. Clear the text in label1. Then double click the textbox and insert into the textBox1_TextChanged handler the following code…
The program will display “Valid” or “Invalid Entry” depending upon the input in the text box. This is a fairly simple password validator, but now you have the tools to expand upon it if you like. There is one caveat though. This pattern will not work in the ASP.Net RegularExpressionValidator component. In my next article on Regular Expressions, I will explain why. I will also explain my second approach mentioned above which will work in the RegularExpressionValidator situation.
A and B and C
Without an AND operation in Regex, you are almost forced to go outside of the pattern and implement the test in multiple patterns and multiple passes of your validator. That may be the best approach, or it may be impossible if you are working with a blackbox validator and must provide a single Regex pattern.
I've found two approaches that can be applied to solve such problems. We'll discuss one of those approaches today. The approach is to use the “Alternation” pattern. MSDN documentation lists 3 Alternations, the simple OR, or vertical bar, the "expression" and the "name" alternation. Our problem requires the "expression" version.
(?(expression)yes|no)
This pattern is used like an IF-THEN-ELSE programming pattern. In the IF-THEN-ELSE pattern, the THEN portion can function like an AND. We could rewrite our test above as…
IF A THEN IF B THEN IF C THEN MATCH
But the alternation pattern isn’t as flexible as a programming language like C# or VB.Net. We can’t easily drop the ELSE as we did above. So, our pseudo code above would have to take the form…
IF A THEN
IF B THEN
IF C THEN
MATCH
ELSE
FAIL MATCH
ELSE
FAIL MATCH
ELSE
FAIL MATCH
The psuedo regex pattern using our original requirements looks something like this...
(? (does string have an uppercase letter)
( ?(does string have a digit)
(? does string have at least 6 characters)
(match the whole string)
| (fail the match))
| (fail the match))
| ( fail the match))
This should look very similar to the IF-THEN-ELSE statement above. All we have left now is to write the individual patterns. We need a pattern for each test, a pattern to match the whole string and a pattern that will never match any string.
(.*[A-Z].*) matches if there's at least one upper case letter in the string
(.*[0-9].*) matches if there's at lease one digit in the string
(.{6,}) matches if there are 6 or more characters in the string
(.*) matches the entire string
([^\W\w]) won't match anything
Notice how each test is written in such a way that the whole string is selected. That way all tests are operating on the exact same input. Now piece it together and it looks like this...
(?(.*[A-Z].*)(?(.*[0-9].*)(?(.{6,})(.*)|([^\W\w]))|([^\W\w]))|([^\W\w]))
It's not very pretty, but it's conceptually straight forward. It follows the pseudo-regex precisely. See if you can format it nicely like the pseudo-pattern above.
The final step is to verify our pattern with some code. We’ll use C# for that. Just create a new Windows Forms Application project in C#. On the form, add a reference to the Regular Expression name space.
using System.Text.RegularExpressions;
Next drop a Label and a Textbox on the form. Clear the text in label1. Then double click the textbox and insert into the textBox1_TextChanged handler the following code…
string pattern = @"(?(.*[A-Z].*)(?(.*[0-9].*)(?(.{6,})(.*)|([^\W\w]))|([^\W\w]))|([^\W\w]))";
if(Regex.IsMatch(textBox1.Text,pattern))
{
label1.Text = "Valid";
label1.ForeColor = Color.Black;
}
else
{
label1.Text = "Invalid Entry";
label1.ForeColor = Color.Red;
}
The program will display “Valid” or “Invalid Entry” depending upon the input in the text box. This is a fairly simple password validator, but now you have the tools to expand upon it if you like. There is one caveat though. This pattern will not work in the ASP.Net RegularExpressionValidator component. In my next article on Regular Expressions, I will explain why. I will also explain my second approach mentioned above which will work in the RegularExpressionValidator situation.
Comments