Sunday, November 16, 2008

.NET 2.0 Extension Methods

I regularly write programs in C# on Visual Studio 2008. The default Framework for the environment is 3.5. But sometimes I must re-target 2008 to Framework 2.0. Is it possible to take advantage of the new "Extension Method" language feature of 2008 when required to target 2.0? This is a good question and applies to more than the extension methods. But for now, let's answer the question for extensions.

Yes.

But, you need to do some of the work yourself. The compiler's support for extension methods is really unconnected with Framework 3.5. But maybe you've seen the error...


Cannot define a new extension method because the compiler required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be found. Are you missing a reference to System.Core.dll?


To write an extension, you need the ExtensionAttribute somewhere, anywhere. The compiler is looking for it in the System.Runtime.CompilerServices namespace which is in System.Core.dll. The 2.0 and 3.5 versions of this DLL are different. The good news is you don't need the whole 3.5 DLL, you just need the ExtensionAttribute class and you can actually write it yourself.

So, here I have written the class you need in C#.

namespace System.Runtime.CompilerServices
{
    [AttributeUsage(
        AttributeTargets.Assembly
        |AttributeTargets.Class
        |AttributeTargets.Method,
        Inherited=false,
        AllowMultiple=false)
    ]
    public class ExtensionAttribute : Attribute
    {
    }
}

Add this to a new class file or an existing file. Make sure you add it outside of all other namespace definitions since you will be adding to the System.Runtime.CompileServices namespace.

That's all it should take. Good luck with your projects and using extension methods in C# with Framework 2.0.

Sunday, November 9, 2008

An Interesting Delegate Usage

An interesting thing happened while I was surfing the MSDN C# forums. I was completely thrown for a loop when I came across some code posted by a regular contributor. It went something like this...

private void ACallBackFunction()
{
if(this.InvokeRequired)
this.Invoke(new ThreadStart(ACallBackFunction));
else
this.label1.Text = "Something or other";
}

The programming pattern should be very familiar to those using threads, because this is how a child thread updates the GUI (only the thread on which the GUI is created can update the UI). But, what seemed very strange was the use of "new ThreadStart(...)". I thought, what does a ThreadStart() delegate have to do with this operation?

The answer is ... nothing. But the application is none the less interesting. Here, the programmer decided to draw from a handy delegate in the C# tool chest just to get that required delegate to call "Invoke()" with. "ThreadStart" is a delegate type for a void method taking no arguments. It saved having to write a delegate type for the operation that might have looked like this...

delegate void ACallBackFunctionDelegate();

...just so that the program could later write...

private void ACallBackFunction()
{
if(this.InvokeRequired)
this.Invoke(new ACallBackFunctionDelegate(ACallBackFunction));
else
this.label1.Text = "Something or other";
}

So the by borrowing the pre-existing delegate declaration, namely ThreadStart, a few lines of coding were avoided. Now, I don't particularly like the implementation (it might lead a newbie astray). But, I do appreciate the awakening it caused in my thinking about delegates. Sometimes I would get all hung up in the declaration of the delegate and forget that the delegate need not be so tightly coupled to the method I'm intending it for. The delegate describes a well-typed variable for safely calling any number of methods which adhere to the prescribed signature.

Okay, so now I am more open minded about the re-use of my delegates as well. In fact, with the .NET Framework 3.x, there are a few new delegates that address the common signatures of many of our functions. The delegate types are "Action" and "Func". Both, are available as generic types making it very easy to construct a delegate variable on the fly similar to the example above. For example, one can write...

private void ACallBackFunction()
{
if(this.InvokeRequired)
this.Invoke(new Action(ACallBackFunction));
else
this.label1.Text = "Something or other";
}

The benefit of using "Action" instead of "ThreadStart", in my opinion, is it's a little less confusing.

Saturday, October 25, 2008

Regular Expressions in C# - Password Validator Revisited

Sometimes we make life more difficult than it needs to be.

Lately, I've learned a little more about Regular Expressions and "Negative Look-Around". I've used it a lot, but it only recently dawned upon me that I was not making full use of it.

Case in point, take a look at my earlier regular expression article where I explain how to validate a password with multiple requirements (Regular Expression Alternations). This article discusses the absense of the boolean AND in regular expressions and provides a complex IF-THEN-ELSE approach to test a string for conforming to multiple "password" constraints. And my subsequent article (Regular Expression Double Negatives) gets more complex with an "inside-out" approach to compensate for the lack of an AND operator.

But, let's go back even further to the article where I "explain" Negative Look-ahead. In this article, I provide the simple Negative Look-ahead pattern...


“(?!pattern)”

... and proceed to show the unexpected behavior and put it in perspective so that you can understand the behavior better. Though the pattern looks simple, its behavior is not. And frankly, I'm beginning to consider it a mis-use of look-around. Even my standard pattern for any sort of look-around goes something like this...


[look-behind]pattern[look-ahead]

... where either "look-around" pattern is optional. This is certainly a valid use of the look-around as it creates required bounds around the pattern to be found. Or, to put it another way, the positive look-behind and look-ahead here describes the "context" of the pattern to match.

But in my earlier articles, I suggested that multiple constraints on a pattern was a lot more difficult. Yes, it's more difficult, but not really as difficult as I first believed. Once you understand the look-around pattern better than I did, in particular, once you understand the construction...


[look-ahead]pattern[look-behind]

... you can begin writing more understandable multiple-constraint patterns.

Let's dig in. I've introduced two terms that need explanation. Context and Constraint are the terms I used for describing two important aspects of the pattern matching requirements.

When requirements dictate that we are looking for occurances of a particular pattern in a string, very often the match will be dependent upon the non-matched terms around it. This is the context of the match. For example, if we are looking for the word "an" in a paragraph, the implied context is that the two characters "an" are only a word if it is preceded by whitespace or starts the paragraph, and it is followed by whitespace or ends the paragraph. The simple pattern for the word "an" would be...


\b[Aa]n\b

... but look-around syntax can also be used...


[look-behind]pattern[look-ahead]
(?<=^|\s+)[Aa]n(?=$|\s)

... thus expressing the context of the pattern that makes it a word. (Actually, word boundary must take into account words at the end of phrases and sentences which are followed with punctuation. For now, we'll keep it simple.)

Expanding the context, let's say we want to find the word "an" where ever it is mis-used. That would be every occurance of "an" that is not followed by a word beginning with a vowel. (Again, there are some exceptions. When an acronym begins with an F, H, L, M, N, R, S, or X, and we vocalize the letters as in FTP, we precede the acronym with the article "an" rather than "a". But when vocalizing the acronym as a word as in SCSI (pronounced scuzzy), we use the article "a". Again, let's keep it simple.) So, the pattern would look like this...


\b[Aa]n\b(?=\s*[^AaEeIiOoUu])

... finding all occurances of the word "an" followed by a word that should use the article "a" instead. And a simple Regex.Replace() will fix such occurances as in this example...


string test =
@"This is a test of a string that mis-uses the word ""an"".
An helicopter is incorrect in American english.
But an elephant is correct usage.";
string pattern = @"\b[Aa]n\b(?=\s*[^AaEeIiOoUu])";
Console.WriteLine("{0}", Regex.Replace(test, pattern, "A"));

... Yes, it will replace "An" or "an" with the uppercase "A". An "Evaluator" parameter would make great sense here. But rather than getting bogged down in the details, I want to look at constraints right now.

Constraints generally provide the exceptions to the rule. Your search requirements may be to find words, but not all words. Constraints limit your matches to more specific values or exclude certain values. By their very nature, constraints are often best described with the boolean AND operator. According to my previous articles, you might conclude that there is no hope but to write a very obscure pattern using double negative and OR pattern matching. I'm happy to say, you have a few more options based upon the constraining construction using look-around. That construction would be...


[look-ahead]pattern[look-behind]

... In this construction, when implemented correctly, the look-ahead and look-behind will both examine the same characters that pattern examines. But, look-around is non-consumptive. That means a look-around expression that matches does not appear in the Match.Value property of a Regex.Match() operation. For example...


string test = "the end.";
string pattern = "(?<=the )end"; //[look-behind]pattern
Console.WriteLine("what'd I find ({0})",Regex.Match(test,pattern).Value);
// Outputs: what'd I find (end)

... So, the pattern "the " was found but was not considered part of the match.

The non-consumptive behavior comes in very handy when placing a look-ahead in front of a pattern. That's because the look-ahead AND the pattern must both match to have a match. Let's say we want to validate a string contains upper and lowercase letters and has at least one lowercase letter. You could write this pattern without look-around as follows...


string pattern = "^[a-zA-Z]*[a-z]+[a-zA-Z]*$";

... or with look-ahead as ...


string pattern = "^(?=[a-zA-Z]*$).*[a-z].*$";

... Notice that the look-ahead makes a general check that it only contains letters. But, because it does not consume the letters, the pattern following examines the same characters to make sure that the string contains a lowercase letter. The judicious placement of ^ and $ guarantee that both patterns examine the same set of characters. When arranged like this, the construction says that the look-ahead pattern AND the capture pattern must match or nothing matches.

Now, if you want to validate that the letter string has at least one uppercase and one lowercase letter, you cannot do it without look-around. The single pattern cannot be constructed in a way that does not impose an unwanted ordering. But with look-around, you only need to add additional look-around patterns like so...


string pattern = "^(?=.*[A-Z].*$)(?=[a-zA-Z]*$).*[a-z].*$";

... Here's some code to demonstrate...


string [] tests = {
"abcdEFG",
"ABCDEFG",
"Abcd$fg",
"abcdefg",
};
string pattern1 = "^[a-zA-Z]*[a-z]+[a-zA-Z]*$";
string pattern2 = "^(?=[a-zA-Z]*$).*[a-z].*$";
string pattern3 = "^(?=.*[A-Z].*$)(?=[a-zA-Z]*$).*[a-z].*$";
foreach (string test in tests)
{
Console.WriteLine("pattern1 {0} {1}", Regex.IsMatch(test, pattern1).ToString(), test);
Console.WriteLine("pattern2 {0} {1}", Regex.IsMatch(test, pattern2).ToString(), test);
Console.WriteLine("pattern3 {0} {1}", Regex.IsMatch(test, pattern3).ToString(), test);
}

So then, we do have an AND operation implied in the way we combine the look-ahead and pattern. Similarly, the AND operation is implied when combining a pattern followed by the look-behind. But, in order for these to operate like an AND operation, the patterns in each look-ahead and the final pattern must overlap the string space exactly. I accomplished the overlap by placing the "^" start-of-string and "$" end-of-string in such a way that each individual pattern match refered to the same characters. Of course, you don't have to make them overlap precisely if you are looking for more exotic behavior than AND.

Here is another example of a pattern that's very difficult (if not impossible) to write without look-around. Consider a string with words in it. You want to capture all words except a few choice words, let's say "and", "but" and "or". You need a pattern that finds all words AND a pattern that rejects specific words. You can only reject specific words using negative look-around. Here is the pattern and some code to demonstrate...


string pattern = @"\b\w+(?<!\band|\bbut|\bor)\b";
string test =
"This is a test string, and it is good for finding words, but "
+"it will not find all words. If you want to find all words, "
+"modify this or write your own. Band is a word to make sure "
+"words are words, butter is another and so is order.";
MatchCollection mx = Regex.Matches(test, pattern);
foreach (Match m in mx)
Console.WriteLine("{0}", m.Value);

... The pattern first finds the words, then the negative look-behind tosses out the ones we don't want.

Finally, let's look at that one more pattern. This would be the password validator pattern I covered in previous articles. But, this time we shall use a straight forward AND construction with look-ahead. A valid password will be only letters, and digits , will have at least one uppercase, one lowercase and one digit, and be at least 6 characters long. Here's the pattern...


string pattern = "^(?=[a-zA-Z0-9]{6,}$)(?=.*[a-z].*$)(?=.*[A-Z].*$).*[0-9].*$";

That it for now. I hope you've found this useful.

Wednesday, October 15, 2008

Callbacks And Delegates

It's been a couple of weeks since last writing. Besides being very busy, I've had a touch of "writer's block". Nothing too serious, I should get over it soon.

Lately, I've been intrigued by the abundant use of callback functions and delegates in the .NET Framework. If you want to do anything interesting, you gotta know callbacks. I'll try to describe callbacks and delegates in simplistic terms. Maybe it will help you over the hump.

Consider the following code...


List list = new List();
//Populate list with some MyClass objects
for(int i=0; i<10; ++i)
list.Add(new MyClass(){ ID=i, strValue=i.ToString() });
//Sort the list
list.Sort();
foreach(MyClass c in list)
Console.WriteLine("{0}",c.strValue);

...

public class MyClass
{
public int ID;
public string strValue;
public override string ToString() { return strValue; }
}


This code throws an exception at the line that says Sort(). That's because the default List.Sort() expects the objects it contains to implement IComparable. Now, we could quickly add IComparable to the class and implement the interface. But, if you don't have control of the class, what would you do then? Or if the built-in IComparable interface didn't sort on the field you want, what then?

You're in luck, because Sort() is overloaded and will accept as a parameter an IComparer or a Comparison object.


public class CompareMyClass : IComparer
{
public int Compare(MyClass x, MyClass y)
{
return x.strValue.CompareTo(y.strValue);
}
}


With this piece of code, you can now call Sort() with a new CompareMyClass object and the exception goes away and the list comes out sorted. This is pretty cool, but it's also pretty static. If I wanted to sort by the ID field instead, I would have to change my class or write a new class. We can reduce some of that by going to the Comparison class. This is a generic class whose constructor takes as an argument a callback function. The Comparison constructor has the calling signature...



Comparison.Comparison(int (T,T) target)

The generic parameter T can be anything. But when we see the signature in the constructor's parameter list, one might start scratching their heads. The best way to read the parameter signature is to start with the word "target". It could have been anything, but "target" suggests that it is the target of some operation. In fact, it is. It is the target function that Sort() will call over and over to determine the correct order of the list elements. The "int (T,T)" is the "type" of the parameter. This means that "target" is a function that takes two parameters of type T and returns an "int" value. Since T is a generic parameter we can replace it with MyClass as we do below. We no longer need the CompareMyClass class, but we do need a function to call.


int MyCompare(MyClass x, MyClass y)
{ return x.strValue.CompareTo(y.strValue); }

// and Sort looks like this...
list.Sort(new Comparison(MyCompare));

You could have two different functions to compare MyClass objects two different ways.



int MyCompare2(MyClass x, MyClass y)
{ return x.ID.CompareTo(y.ID); }

But, then you have to plug in the correct function when you want a different sort behavior. If the comparison type might change at runtime based upon user input, you can set a "delegate" variable and provide that to Comparison() instead.


// at class scope
delegate int MyCompareDelegate(MyClass x, MyClass y); // declares a delegate type

...
// at method scope
MyCompareDelegate dlgt; // declares a delegate variable
if(radioButton1.Checked == true)
dlgt = MyCompare;
else
dlgt = MyCompare2;

list.Sort(new Comparison(dlgt));

The delegate lets us treat the functions as objects. Since a delegate for the callback will work just as well as the callback, we can write the comparison code right at the place where we use it with "anonymous delegates"...


MyCompareDelegate dlgt;
if (radioButton1.Checked == true)
dlgt = delegate(MyClass x, MyClass y)
{
return x.ID.CompareTo(y.ID);
};
else
dlgt = delegate(MyClass x, MyClass y)
{
return x.strValue.CompareTo(y.strValue);
};
//Sort the list
list.Sort(new Comparison(dlgt));

Here we have set the "dlgt" delegate variable to one of two "anonymous" functions. Creating and anonymous function returns a delegate that can be assigned to a variable like any other function, as long as the signatures match. Well if that's the case, then the Lambda syntax should also work, shouldn't it?


MyCompareDelegate dlgt;
if (radioButton1.Checked == true)
dlgt = (x,y) => x.ID.CompareTo(y.ID);
else
dlgt = (x,y) => x.strValue.CompareTo(y.strValue);
//Sort the list
list.Sort(new Comparison(dlgt));

Yes. The Lamda Expression implicitly determines the types of x, y and the return value from the delegate assignment and context. Pretty cool!

I stop there. This is starting to be too much fun. The power of the delegate and callback is huge. Keep playing with these and you'll get the hang of it.

Monday, September 22, 2008

A Simple Task Queue

I’ve spent the last 4 posts talking about Regular Expressions and some difficult patterns. But, this is a C# blog, so I really want to be talking about C#. Today, I hope to provide you with a nice little start on a multi-threading "Task Queue" application. A Task Queue will place task requests in a queue that will be serviced Asynchronously and in the order received.

Rather than keep you in suspense, here’s the code up front. If you'd like an explanation, I've attempted that below. (Update 10/6/2008: Sorry folks about the bug below. The Enqueue method must set the the Busy field to true when queing the first task in order to avoid the thread race. It's now been fixed.)


using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Threading;

namespace TaskQueuePOC
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

int counter = 0;
private void button1_Click(object sender, EventArgs e)
{
new MyTaskQueue(counter++).Enqueue();
}
}

public class MyTaskQueue : TaskQueue
{
public MyTaskQueue(object UserData)
: base(UserData)
{
}
protected override void Task()
{
Thread.Sleep(2000);
Console.WriteLine(UserData.ToString());
}
}

public abstract class TaskQueue
{
public object UserData { get; private set; }
public TaskDelegate TaskDlgt { get; private set; }
protected abstract void Task();

public void Enqueue()
{
TaskDlgt = new TaskDelegate(Task);
lock(lockObject)
{
if(Busy)
_q.Enqueue(TaskDlgt);
else {
Busy = true;
TaskDlgt.BeginInvoke(new AsyncCallback(this.TaskCallback), TaskDlgt);
}
}
}

private static Queue _q = new Queue();
private static bool Busy = false;
private static object lockObject = new object();

public TaskQueue(object Data)
{
UserData = Data;
}

public delegate void TaskDelegate();

private void TaskCallback(IAsyncResult ar)
{
TaskDelegate dlgt = ar.AsyncState as TaskDelegate;
if(dlgt.Equals(TaskDlgt))
dlgt.EndInvoke(ar);
NextTask();
}
private void NextTask()
{
TaskDelegate dlgt;
lock(lockObject)
{
if(_q.Count > 0)
{
dlgt = _q.Dequeue();
dlgt.BeginInvoke(TaskCallback,dlgt);
}
else
Busy = false;
}
}
}
}


I've titled the article "A Simple Task Queue", but simple is a little mis-leading. That's because it's difficult to tell by looking (for the beginner) where the threads are or even how it works. The key to understanding this implementation is to understand the "delegate BeginInvoke" call. I've covered that in other articles on threading, in particular, I refer you to "Threading with .NET ThreadPool Part 4" for a deeper discussion.

Overall, the work horse of this application is the Abstract Class TaskQueue. The class implements all that's needed to queue tasks and execute them in order. All the user need supply is a derived class that overrides "Task()" and a Constructor that passes in some UserData. The application would instantiate a new derived TaskQueue object with the data needed for the task and then call the Enqueue() method.

The TaskQueue works by creating 3 private static control members. There is the queue which holds delegates to run. There's also a Busy member to say when there is an active thread running. There is also a LockObject used internally to synchronize thread access to the the queue and the Busy indicator. This is needed because as one thread completes and tries to update the Busy indicator or take a new delegate off of the queue for execution, the "producer" thread (our UI in this case) may be trying to enqueue another delegate. Since two threads could be accessing these variables simultaneously, we synchronize them with a Mutex on the LockObject.

The multi-threading comes into play when a delegate's BeginInvoke(...) method is called. This method will allocate a thread from the ThreadPool and execute the Task in that thread. BeginInvoke is provided a TaskCallback() function and some state information. In this case, the state information is a reference to the delegate.

The Callback function is responsible for checking the queue and launching the next Task or setting Busy to false.

The magic in all of this is that the delegate is a reference to a specific instance of a TaskQueue object's Task() method. That way, the delegate's Task() method has access to local information about that specific task. So, each delegate will operate with its own version of UserData. Notice, this is also the reason that NextTask() dequeue's a delegate and places the reference in a local variable rather than the TaskDlgt member. TaskDlgt is a reference to one's self, while the delegate taken from the queue is a different delegate. So, each task's completion callback is responsible to start the next task (if any).

Also, notice that the Tasks are highly encapsulated. Once the task is on the queue, all that is available is the delegate. The rest of the task is somewhat hidden, though reflection can be used to make tweaks to the tasks and task data if necessary.

There are many improvements that can be made. Most notably, one could add a way to stop the queue and cancel remaining tasks. One could add controls to their derived class and overridden Task() member to allow individual task control or gross control over the entire queue.

There you have it, a "simple" Task Queue for your amusement and edification. Hopefully, you find it as useful as I have. Also, I have written several other articles on threads and threading. Please feel free to poke around my archives, you may find these articles useful as well.

10/6/2008: Recently, a kind reader pointed out a bug in my code (since fixed) where "Busy" was not being set to "true" anywhere. The obvious behavior was that all tasks ran immediately, no queing was done at all.

Saturday, September 20, 2008

Regular Expression Double Negatives

In my last article titled "Regular Expression Alternation" we discussed Alternation as one means of performing the AND operation in a regular expression pattern.

(?(expression)yes|no)

But, what if you don't have this pattern in your version of RegEx? Some of you may be reading to gain knowledge that you can apply in other programming languages besides C# or the .NET environment. The pattern I described in the last article does not work in all versions of .NET Regular Expressions. In particular, it does not work in JScript Regular Expression Syntax. And, when you use tools like ASP.NET’s RegularExpressionValidator, you are implicitly using JScript. So then, what do you do?

To restate the original requirement, we want to validate a string that has at least one digit, at least one uppercase letter, and at least 6 characters. The problem statement is clearly an AND problem, so it begs to be written with the Alternation pattern. But in situations where we can’t use the Alternation, we can opt for my second approach. We can use the Negative Look-Ahead pattern.

(?!expression)

The negative look-ahead pattern is “non-consuming” and matches strings (or parts of string) that are not followed with the “expression”. You may wish to brush up on negative look-ahead with my article “Regular Expressions in C# - Negative Look-Ahead”. With negative look-ahead and using the OR pattern, we can use double-negative logic to express the original problem. This approach works based upon the logic axiom…

( A AND B ) == NOT ( (NOT A) OR (NOT B) )

Using this rule, the problem can be stated differently. We can state our problem in terms of ORs instead of ANDs. In other words...


If it is not the case that our string is devoid of digits or devoid of uppercase letters or short of 6 characters, then it is a valid string.


Notice I chose terms that suggest negative tests. Take a moment and think about it.

IF ( NOT ( NOT(Has Digit) OR NOT(Has Uppercase) OR NOT(6 or more chars) ) )

This has the same logic result as…

IF ( (Has Digit) AND (Has Uppercase) AND (6 or more chars) )

If you can’t get your mind around this basic programming concept, then re-writing your pattern requirement in terms of ORs will be difficult. So, experiment on paper and with generic logic variables to prove to yourself that your re-written pattern logic means the same thing as the original.

Now let’s look at the actual Regular Expression pattern. The component patterns from which we build look like this...

(^[^\d]*$) string is devoid of digits
(^[^A-Z]*$) string is devoid of uppercase letters
(^.{0,5}$) string has 5 or fewer characters

String them together to get the inner test ...

(^[^\d]*$)|(^[^A-Z]*$)|(^.{0,5}$)

Sometimes you can pause here and simply test for your string to match. If you supply an invalid string, it should MATCH. If you provide a valid string, it should NOT match. So, at this point, we have the opposite of what we really want. (A validator may not allow you to test this way, you would have to write a C# program to use as a “test jig”). So, we now negate the inner pattern with the negative look-ahead syntax...

(?!(^[^\d]*$)|(^[^A-Z]*$)|(^.{0,5}$))

This would work for the IsMatch() approach, but it won't quite work for the Match() approach. We have to add some bounds...

^.*(?!(^[^\d]*$)|(^[^A-Z]*$)|(^.{0,5}$)).*$

This pattern is much harder to decipher than the AND pattern using Alternation. But now, armed with the knowledge from these last two articles, you should be able to discern the meaning of such patterns. We should be able to plug this pattern in to all of the algorithms we covered in previous article and have it validate strings according to the requirement.

That’s all for now, I hope you enjoyed the discussion and find these examples useful.

Sunday, September 14, 2008

Regular Expression Alternations

I promised in my last article that I'd have more to say about regular expression. So here you have it. Today, we look at another tough problem for Regular Expressions. Let’s consider writing a pattern to validate that a string contains at least one digit, at least one uppercase letter, and at least 6 characters. One would think that this would be easy. But here we’re faced with an AND situation and the regular expression syntax doesn't provide an AND operator. Take, for example, any validation problem that has the form…

A and B and C

Without an AND operation in Regex, you are almost forced to go outside of the pattern and implement the test in multiple patterns and multiple passes of your validator. That may be the best approach, or it may be impossible if you are working with a blackbox validator and must provide a single Regex pattern.

I've found two approaches that can be applied to solve such problems. We'll discuss one of those approaches today. The approach is to use the “Alternation” pattern. MSDN documentation lists 3 Alternations, the simple OR, or vertical bar, the "expression" and the "name" alternation. Our problem requires the "expression" version.

(?(expression)yes|no)

This pattern is used like an IF-THEN-ELSE programming pattern. In the IF-THEN-ELSE pattern, the THEN portion can function like an AND. We could rewrite our test above as…

IF A THEN IF B THEN IF C THEN MATCH

But the alternation pattern isn’t as flexible as a programming language like C# or VB.Net. We can’t easily drop the ELSE as we did above. So, our pseudo code above would have to take the form…

IF A THEN
IF B THEN
IF C THEN
MATCH
ELSE
FAIL MATCH
ELSE
FAIL MATCH
ELSE
FAIL MATCH

The psuedo regex pattern using our original requirements looks something like this...

(? (does string have an uppercase letter)
( ?(does string have a digit)
(? does string have at least 6 characters)
(match the whole string)
| (fail the match))
| (fail the match))
| ( fail the match))

This should look very similar to the IF-THEN-ELSE statement above. All we have left now is to write the individual patterns. We need a pattern for each test, a pattern to match the whole string and a pattern that will never match any string.

(.*[A-Z].*) matches if there's at least one upper case letter in the string
(.*[0-9].*) matches if there's at lease one digit in the string
(.{6,}) matches if there are 6 or more characters in the string
(.*) matches the entire string
([^\W\w]) won't match anything

Notice how each test is written in such a way that the whole string is selected. That way all tests are operating on the exact same input. Now piece it together and it looks like this...

(?(.*[A-Z].*)(?(.*[0-9].*)(?(.{6,})(.*)|([^\W\w]))|([^\W\w]))|([^\W\w]))

It's not very pretty, but it's conceptually straight forward. It follows the pseudo-regex precisely. See if you can format it nicely like the pseudo-pattern above.

The final step is to verify our pattern with some code. We’ll use C# for that. Just create a new Windows Forms Application project in C#. On the form, add a reference to the Regular Expression name space.

using System.Text.RegularExpressions;

Next drop a Label and a Textbox on the form. Clear the text in label1. Then double click the textbox and insert into the textBox1_TextChanged handler the following code…

string pattern = @"(?(.*[A-Z].*)(?(.*[0-9].*)(?(.{6,})(.*)|([^\W\w]))|([^\W\w]))|([^\W\w]))";
if(Regex.IsMatch(textBox1.Text,pattern))
{
label1.Text = "Valid";
label1.ForeColor = Color.Black;
}
else
{
label1.Text = "Invalid Entry";
label1.ForeColor = Color.Red;
}


The program will display “Valid” or “Invalid Entry” depending upon the input in the text box. This is a fairly simple password validator, but now you have the tools to expand upon it if you like. There is one caveat though. This pattern will not work in the ASP.Net RegularExpressionValidator component. In my next article on Regular Expressions, I will explain why. I will also explain my second approach mentioned above which will work in the RegularExpressionValidator situation.

Friday, September 12, 2008

Regular Expressions in C# - Negative Look-ahead

In my last article on Regular Expressions, we looked at a couple of simple expressions and 3 algorithms to use the expressions for “validating” strings. The main point of the article is that Regular Expression behavior will be confusing if not considered in the context of the algorithm being used. Please take a look at Regular Expressions. What’s that got to do with C#? to get the background for this article.

In this article, we will discuss one of the more difficult Regular Expression problems. How do you use “Negative Look-ahead”? To use Regular Expression “Negative Look-Ahead” or “Negative Look-Behind”, you have to change the way you think about pattern matching. First, the negative look-ahead takes the syntax…

“(?!pattern)”

In the words of the MSDN documentation…

(Zero-width negative lookahead assertion.) Continues match only if the subexpression does not match at this position on the right. For example, \b(?!un)\w+\b matches words that do not begin with un.


This expression will “find” parts of the supplied string that are not followed by the “pattern”. One might erroneously think that this pattern will refuse to match strings that have the “pattern” in them. It does not, in fact, the following example actually “finds” some strange results.

string pattern = @"(?!invalid)"; //negative lookahead
string test = "invalid";
Regex rx = new Regex(pattern);
Console.WriteLine("Match:\t\t\t{0}\t({1})", rx.Match(test).Success.ToString(), test);
Match mx = rx.Match(test);
while (mx.Success)
{
foreach (Group g in mx.Groups)
{
Console.WriteLine("\t\t\t\t({0}) ({1}): {2}", mx.Value, g.Value, g.Index);
}
mx = mx.NextMatch();
}

The result is that there are actually 7 matches in the test string. The confusing matter is that each of the matches is an empty string, and the index for each match increments, from 1 to the number of characters in the test string. To understand this behavior, let’s think about the pattern differently. Let’s break it apart into some simple components. You could consider the pattern to be equivalent to…

“” + “(?!invalid)” + “”

This is a pattern to match an empty string, followed by a pattern to reject the string “invalid”, followed by a pattern to match an empty string. In short, without the negative look-ahead, this is a pattern to match an empty string. If you were to use just the simple empty string pattern on the test string, there would be 8 matches with indices for each match incrementing from 0 to the number of characters in the test string. There are 7 matches with the negative look-ahead and 8 matches without. The indices start at 1 with the negative look-ahead, and they start at 0 without. So, the negative look-ahead is causing one of the potential matches to fail. It’s rejecting the empty string match that occurs just prior to the first character in the test string. I.e. it is rejecting the only empty string match that is followed by “invalid” and matching all the others.

Oh my head is spinning! Why did I eliminate the negative look-ahead above? I did so to understand its effect on the entire pattern matching process. And I learned that the negative is actually finding a pattern to eliminate. In other words, it’s doing its job. It is eliminating from the set of matches the one match that is followed by the string “invalid”. But, because there are many other “empty-string” matches, IsMatch() returns “true” and Match() returns 7 matches. I also eliminated the negative look-ahead because look-around” patterns do not “consume” characters. They will never show up in the match result, so by temporarily removing the non-consuming pattern, I can see the real pattern that the look-around pattern constraints.

The Regular Expression philosophy is to find things, not to eliminate things. Since Regex works so hard at finding matches, it becomes difficult to write patterns whose job is to exclude strings. One could just write the pattern to find the strings that are unwanted and negate the match in program logic. Negative matching syntax is available, but Regex treats such syntax as a means of reducing the number of matches while trying to find ANYTHING that would otherwise match.

So how do I make this work? Consider the requirement to match only those strings that do not have the sequence “invalid” in it. First, you must define a pattern to match any otherwise valid string. You must define the pattern in such a way that it will match the string using the most restrictive form of the 3 programming approaches described in my last article. In particular, your pattern must pass the following test…

Match mx = Regex.Match(test, pattern);
if(mx.Success && (mx.Groups[0].Value == test))

The test above makes sure that not only does the test string “have” a match but the test string “is” the match. The empty string pattern will find many matches, but the whole match will not equal the test string for any test string other than an empty string. You have to watch out for the asterisk (*) and plus (+) which will consume as much as they can, or as little as necessary to achieve a match. Such patterns will work in the logic above, but they can change behavior as you start adding look-around patterns. In the case of our requirement, the all encompassing pattern would be “^.*”. Leave off the “$” because we will be using look-ahead. When looking ahead, we do not want to anchor the end of the string unless absolutely necessary.

Next, define a pattern to find whole strings that you’d like to exclude. “.*invalid.*” works. This pattern matches any string containing the sequence “invalid”. Next, wrap this pattern with the negative look-ahead syntax “(?.*invalid.*)”. And finally, insert it into the first pattern after the first anchor (^ in our case). Our resulting pattern would be “^(?!.*invalid.*).*”. Use the following test jig to prove this pattern to yourself.

string pattern = @"^(?!.*invalid.*).*";//negative look-ahead
string[] tests = {
"invalid",
"",
"this is also invalid",
"but this is okay",
"but this invalid string is not",
};

Regex rx = new Regex(pattern);
foreach (string test in tests)
{
Console.WriteLine("Match:\t\t\t{0}\t({1})", rx.Match(test).Success.ToString(), test);
Match mx = rx.Match(test);
while (mx.Success)
{
foreach (Group g in mx.Groups)
{
Console.WriteLine("\t\t\t\t({0}) ({1}): {2}", mx.Value, g.Value, g.Index);
}
mx = mx.NextMatch();
}
}

I’ll stop here for today and let you digest what’s going on. Certainly, the pattern can be optimized and some characters reduced. Experiment with changes and observe the effects. I am not through with the subject of Regular Expressions and Negative type matching. Check back later as I hope to post on the subject again soon.

Friday, September 5, 2008

Regular Expressions? What's That Got To Do With C#?

...Only that I often need to know Regular Expressions for my C# work. However, the online help and resources seem to come up a little short. So, today I diverge a little and discuss this cryptic yet valuable ancillary topic to try to help you through your next Regex dilema.

I'm not going to waste time and internet bandwidth explaining what a Regular Expression is, there are plenty of sites for those. But, I will give special thanks here to OmegaMan who compiled and posted the following on the MSDN Regex Forum...

OmegaMan's .Net Regex Resources Reference

I refer to it often.

So let's dive right in. Whenever you are considering using regular expressions, you need to determine what kind of pattern matching problem are you trying to solve.

1) Do I want a regular expression to check a string for validity?
2) Do I want a regular expression to find certain things in my string?
3) Do I want a regular expression so that I can replace patterns in my string?

There can be some overlap in #1 and #2 since validity may depend upon the string containing "certain things". Certainly, if you want to replace something in #3, you need to find it in the string with #2. But, many of the problems people run into with regular expressions can be traced to not having identified the problem properly.

Take the problem, "make sure my string contains only letters and digits". Sounds simple enough, and one may write the pattern ...


"[\da-zA-Z]*"
\d = digit
a-z = lowercase letter
A-Z = uppercase letter
[]* = zero or more of the characters in the brackets


This pattern might "find" letters and digits in a string, but it doesn't say that there are ONLY letters and digits in that string. Regex.IsMatch(...) tests a string with a pattern and returns true if the string contains a match. Regex.Match(...) tests a string with a pattern and returns a Match object indicating if the string contains matches and details, if any, about each match.

So if you are doing a validity check with the pattern above, your results will depend on which tools you use and how you use them. Given that pattern, both IsMatch() and Match() will find matches in a string, even if it contains undesirable characters. In fact, because of the asterisk (*), the string doesn't have to contain any of the pattern characters for there to be "a match" (it matches the empty string). These functions, given this pattern, are simply indicating whether or not any part of the string matches the pattern. Here's some code to demonstrate...

Example 1:

string pattern = @"[\da-zA-Z]*"; //use the @ to tell c# to leave \ alone
string[] tests = {
"containsOnlyLettersAnd01234",
"contains letters And 01234, but also spaces",
"!@#$%", // contains none of the desired characters
"", // a completely empty string
};
foreach(string test1 in tests)
Console.WriteLine("IsMatch:\t{0}\t({1})",Regex.IsMatch(test1,pattern).ToString(),test1);
foreach(string test2 in tests)
Console.WriteLine("Match:\t{0}\t({1})",Regex.Match(test2,pattern).Success.ToString(), test2);

...outputting...

IsMatch: True (containsOnlyLettersAnd01234)
IsMatch: True (contains letters And 01234, but also spaces)
IsMatch: True (!@#$%)
IsMatch: True ()
Match: True (containsOnlyLettersAnd01234)
Match: True (contains letters And 01234, but also spaces)
Match: True (!@#$%)
Match: True ()


Obviously, several of these strings are not valid by our requirements. So, what went wrong, and how would you "validate" the string? In order to do the desired validity check, one must consider both the pattern and the Regex method that will be used. For instance, the pattern as written will validate if you write the supporting code to accommodate it. For example...

Example 2:

Regex rx = new Regex(pattern);
foreach(string test1 in tests)
Console.WriteLine("Modified Match:\t{0}\t({1})",
(rx.Match(test1).Success && (rx.Match(test1).Value == test1)).ToString(),
test1);

...outputting...

Modified Match: True (containsOnlyLettersAnd01234)
Modified Match: False (contains letters And 01234, but also spaces)
Modified Match: False (!@#$%)
Modified Match: True ()


Now that looks much better. Our pattern match now "works" except for the empty string. The requirement might be considered vague and allow for such a match. Many applications will have an input text box that starts out blank. When the user enters characters, the text is then validated. These apps usually explicitly test for empty text. Case in point, the ASP.NET RegularExpressionValidator states that it will not validate the empty string, i.e., empty strings will PASS. It is up to the programmer to require some input. By the way, RegularExpressionValidator does pattern matching validation on both the client and the server. On the client, it uses JScript Regular Expression syntax, which has a smaller feature set and syntax than the server uses. It also uses the same program construction as the second example.


if(match != null && (match[0] == value)) // valid


If you use other tools, you must know how the validity test is done. For, in the last example, patterns that have look-ahead or look-behind will often fail. They will look perfectly valid, and they will match using IsMatch(), but they require that you add some additional pattern to consume those characters that look-around does not consume. We'll get into that in a future article.

Now the pattern could have been written differently. Using the next pattern example, you can use any of the three methods to validate that the string contians only letters and digits. This time, I also require that the input string not be blank.

"^[\da-zA-Z]+$"
^ = match the beginning of string/line, zero width pattern
$ = match the end of the string/line, zero width pattern
[]+ = 1 or more of the characters in the brackets

Example 3:

string pattern2 = @"^[\da-zA-Z]+$";
Regex rx = new Regex(pattern2);
foreach (string test in tests)
{
Console.WriteLine("Modified Match:\t{0}\t({1})",
(rx.Match(test).Success && (rx.Match(test).Value == test)).ToString(),
test);
Console.WriteLine("IsMatch:\t\t{0}\t({1})", rx.IsMatch(test).ToString(), test);
Console.WriteLine("Match:\t\t\t{0}\t({1})", rx.Match(test).Success.ToString(), test);
}

...outputting...

Modified Match: True (containsOnlyLettersAnd01234)
IsMatch: True (containsOnlyLettersAnd01234)
Match: True (containsOnlyLettersAnd01234)
Modified Match: False (contains letters And 01234, but also spaces)
IsMatch: False (contains letters And 01234, but also spaces)
Match: False (contains letters And 01234, but also spaces)
Modified Match: False (!@#$%)
IsMatch: False (!@#$%)
Match: False (!@#$%)
Modified Match: False ()
IsMatch: False ()
Match: False ()


In otherwords, the pattern and the approach must work together. When you have control of both, then solving the problem is easier. But as you can see in the last example, all 3 approaches can validate according to our requirements by tweaking the pattern. The trick is often to find the tweak that works in all cases. When you can't change the underlying programming, like in the case of RegularExpressionValidator, you have to be able to write your pattern to "match" the underlying approach.

Now these are simple examples and I would have liked to get into some real meaty Regex expressions, but I've run out of time, and this posting is late. I'll be back though with more on Regular Expressions in the next article. For now, I hope this is of some use to you.

Monday, September 1, 2008

Enumerations and Strings

It has been a while since my last post. Things get crazy at times. But, here is another bit of sample code for you to muse over. We will look at enumerations and strings.

In C#, an enumeration is NOT a string, nor can you define it to be one. An enumeration can be defined as any of a number of integer types. You can leave it as its default type...


public enum unspecifiedTypeEnum
{
one = 1, two, three,
}


In which case you get named 32 bit values. Or you can specify the type of the underlying value as byte, sbyte, ushort, short, uint, int, ulong or long...


public enum byteTypeEnum : byte
{
one = 1, two, three,
}
// or...
public enum ushortTypeEnum : ushort
{
one = 1, two, three,
}
// etc...


... But, you can't declare it as a string.

That doesn't mean you can't use strings at all. When I want to save data to a file for future reference, I like to make my enumerations human readable and store them in the file as human-readable. It would be a shame to use enumerations to write clear code only to store them as very cryptic values in my maintenance and configuration files. Getting the string representation given the enumeration name is easy. We just use the object.ToString() overloaded method. For enumerations, this returns the name of the value as a string...


public enum Animals { cat, mouse, bird, dog, }
//....

Animals myAnimal = Animals.cat;
Console.WriteLine("CurrentAnimal={0}",myAnimal.ToString());


When I load a configuration value back in to my program, I would want to work with the value as the original enumeration type. That's a little trickier, but with the help of "reflection", it can be done. This little routine shows how...


using System.Reflection;
// ...

Animals FromString(string animal)
{
Type t = typeof(Animals);
FieldInfo[] fi = t.GetFields();

try
{
foreach (FieldInfo f in fi)
{
if (f.Name.Equals(animal))
{
return (Animals)f.GetRawConstantValue();
}
}
}
catch
{
}
throw new Exception("Not an Animal");
}

Actually, it's not so tricky. A reader (thankyou Paul) pointed me to the "static" Enum functions which do much of the hard work for you. The function above can be more simply written as...

Animals FromString(string animal)
{
try
{
return (Animals)Enum.Parse(typeof(Animals), animal);
}
catch (ArgumentException ex)
{
throw new Exception("(" + animal + ")" is not in the Animals enumeration.",ex);
}
}


This gives me an enumeration value for my string, assuming the string matches one of the enumeration names. Enhancements could be made to this code to store and retrieve the fully qualified name. I'll leave that as an exercise for the reader.

You may be wordering about the try and catch above. The GetRawConstantValue() returns a value that we cast to our enumeration type. If that value is invalid for our enumeration, then the cast throws an exception. Also, understand that there are other values in the FieldInfo array besides just enumeration fields. Enumerations have hidden fields that are accessible through reflection. Under contrived circumstances, the caller may pass in the name of one of these hidden fields. The "animal" name will match a field in the FieldInfo array, but the cast with throw an exception as we want it to.

Now, with the ability to convert between string and enumeration, you can write your code so that everything internal is performed on the enumeration type, while persistance and other external representation can be strings. But, what if we want a list of all possible values. We might let the user choose from a list, how might we get that? Here's a way to create a string list for the enumeration...


// also requires
using System.Reflection;
...

IEnumerable AnimalsList()
{
Type t = typeof(Animals);
foreach (FieldInfo f in t.GetFields())
{
try
{
if (f.GetRawConstantValue() is Animals)
;
}
catch
{
continue;
}
yield return f.Name;
}
}

...or simply ...

IEnumerable AnimalList()
{
return Enum.GetNames(typeof(Animals)).ToList();
}


Now in your Form constructor you can write the following to show your enumerations as strings.


listBox1.Items.AddRange(AnimalsList().ToArray());


Someone will invariably want to associate a different string to their enumeration than the name. Maybe they want to obscure the meaning. More likely, the programmer wants to change the enumeration after there are already configuration files in production using older enumeration names. Changing the enumeration could break existing configuration files. But you have to deal with that regardless of how you store the data. I won't go into code examples, but a Dictionary object could be used internally. Another possibility is the string name and the enumeration name may differ by case only, but it should be a trivial modification to deal with that condition, so I leave that also as an exercise for the reader.

Finally, these enumeration functions are perfect for making Generic. They could be used on several enumerations in our code, and we wouldn't want to re-implement it each time. So, I will leave you with the following generic example of Enumerations and Strings.


using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Reflection;

namespace EnumExample
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
listBox1.Items.AddRange(EnumList().ToArray());
}

public enum Animals { dog = 1, cat, mouse, bird, }

private void button1_Click(object sender, EventArgs e)
{
Animals myAnimal = Animals.bird;
Console.WriteLine("CurrentAnimal={0}", myAnimal.ToString());

myAnimal = FromString("cat");
try {
myAnimal = FromString("value__");
} catch (Exception ex) {
Console.WriteLine("error: {0}", ex.Message);
}
}

T FromString(string animal) where T : struct
{
Type t = typeof(T);
FieldInfo[] fi = t.GetFields();

try
{
foreach (FieldInfo f in fi)
{
if (f.Name.Equals(animal))
{
return (T)f.GetRawConstantValue();
}
}
}
catch
{
}
throw new Exception("Not a Type " + typeof(T).Name);
}

IEnumerable EnumList()
{
Type t = typeof(T);
foreach (FieldInfo f in t.GetFields())
{
try
{
if (f.GetRawConstantValue() is T)
;
}
catch
{
continue;
}
yield return f.Name;
}
}
}
}

Friday, August 22, 2008

Threading with .NET ThreadPool Part 4

Suppose you've been given the task to write a function that copies the contents of one folder to another. So you set off on your merry way and come up with something like the following.


using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;

namespace ThreadPoolPart4
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

private void button1_Click(object sender, EventArgs e)
{
// Get the Folder names, copy contents from one to the other
FolderBrowserDialog fb = new FolderBrowserDialog();
fb.ShowDialog();
string src = fb.SelectedPath;
fb.ShowDialog();
string dst = fb.SelectedPath;
// no error checking on the names, this is an example only

if (dst != src)
{
if (!Directory.Exists(dst))
Directory.CreateDirectory(dst);
RecurseCopyFolder(new DirectoryInfo(src), new DirectoryInfo(dst), MyCopyCallback);
}
}
private void RecurseCopyFolder(DirectoryInfo src, DirectoryInfo dst, CopyCallbackDelegate cb)
{
bool cancelled = false;
CopyArgs ca;
try
{
foreach (FileInfo fi in src.GetFiles())
{
string newfile = Path.Combine(dst.FullName, fi.Name);

ca = new CopyArgs() { IsDir = false, CurrentObject = newfile };
cb(ref cancelled, ca);
if (cancelled)
break;
if (!ca.Skip)
{
if (Directory.Exists(ca.CurrentObject))
Directory.Delete(ca.CurrentObject);
fi.CopyTo(ca.CurrentObject, true);
}

}
if (!cancelled)
foreach (DirectoryInfo subsrc in src.GetDirectories())
{
ca = new CopyArgs() { IsDir = true, CurrentObject = Path.Combine(dst.FullName, subsrc.Name) };
cb(ref cancelled, ca);

if (cancelled)
break;
if (ca.Skip)
continue;
DirectoryInfo subdst;
if (!Directory.Exists(ca.CurrentObject))
subdst = dst.CreateSubdirectory(subsrc.Name);
else
subdst = new DirectoryInfo(ca.CurrentObject);

RecurseCopyFolder(subsrc, subdst, cb);
}
}
catch (Exception e)
{
throw e;
}
}

public class CopyArgs : EventArgs
{
public bool IsDir;
public bool Skip;
public string CurrentObject;
}

private delegate void CopyCallbackDelegate(ref bool Cancel, CopyArgs args);
private void MyCopyCallback(ref bool Cancel, CopyArgs args)
{
if (args.IsDir)
label1.Text = args.CurrentObject;
else
label2.Text = args.CurrentObject;
}

}
}


But when you test this on a somewhat larger folder, your application appears to hang. What gives? Well, it's very common for programmers to hang up their GUI by waiting, sleeping, reading the network, or in this case, doing intensive file I/O operations on the GUI thread. It's been said hundreds if not thousands of times on the forums, "don't Sleep() in the GUI thread". Because, if your GUI gets stuck in a Sleep() or anywhere else, it does not get a chance to pump its message loop. A GUI that does not service its message queue looks like it is dead. That's why sending your GUI down a recursive file copy excursion like we just did is a really bad idea.

That is also why threading has become a best practice for C# application development. And today, we take another look at the System.Threading.ThreadPool. In my previous articles on the ThreadPool ("Threading with .NET ThreadPool, Part 1, Part 2 and Part 3"), I spent the entire time talking about ThreadPool.QueueUserWorkItem(). But, there are other ways within your power to put the ThreadPool to work for you. Specifically, I am talking about the delegate.BeginInvoke() call. Let us see how it is used.

First off, to solve our dead looking GUI application (if your's doesn't look dead, you didn't try copying a big enough folder), we need to put the "heavy lifting" into a thread. All the hard work occurs in RecurseCopyFolder(), so we will put that call into a thread using BeginInvoke(). We do it by declaring a delegate for the function, then calling the delegate's BeginInvoke() like so...


private delegate void RecurseCopyFolderDelegate(DirectoryInfo src, DirectoryInfo dst, CopyCallbackDelegate cb);
private void RecurseCopyFolder(DirectoryInfo src, DirectoryInfo dst, CopyCallbackDelegate cb)
{ //...
}


You should be familiar with delegates, but if not, just note how the delegate call signature is exactly the same as the function it will delegate for. I name them similarly for the sake of self-documenting code.

Next, we create an instance of the delegate. I create a class scope variable to hold the delegate instance because I need it later. I also create an IAsyncResult variable because I will be needing that, too when the thread completes.


RecurseCopyTreeDelegate dlgt;
IAsyncResult asyncResult;


Now, we instantiate the delegate and get a new thread started with BeginInvoke(). Notice that BeginInvoke() takes the same parameters as the original function (plus two, we'll come back to those later). The "delegate" is implemented by the compiler, so the compiler also does us the favor of letting us call the BeginInvoke() almost like we would call the delegate itself.


// RecurseCopyFolder(new DirectoryInfo(src), new DirectoryInfo(dst), MyCopyCallback);
dlgt = new RecurseCopyFolderDelegate(RecurseCopyFolder);
asyncResult = dlgt.BeginInvoke(new DirectoryInfo(src), new DirectoryInfo(dst), MyCopyCallback, null, null);


We are almost ready to run, but notice we are using a callback function to update our GUI. You never update the GUI directly from a different thread, its not thread-safe. So we will add the code to make the callback check the form's InvokeRequired property and Invoke() the call if necessary. It is common to have one function check the InvokeRequired and call itself again with Invoke(), so I've modified MyCopyCallback() to do just that.


private delegate void CopyCallbackDelegate(ref bool Cancel, CopyArgs args);
private void MyCopyCallback(ref bool Cancel, CopyArgs args)
{
if (this.InvokeRequired)
{
this.Invoke(new CopyCallbackDelegate(MyCopyCallback), new object[] { Cancel, args });
}
else
{
if (args.IsDir)
label1.Text = args.CurrentObject;
else
label2.Text = args.CurrentObject;
}
}


There's still one more requirement for using BeginInvoke(). When using the "delegate" BeginInvoke(), you are required to call the delegate's EndInvoke() when the thread completes. So, in order to know when the function completes, you can use a timer (remember to enable it) and check the asyncResult variable...


private void timer1_Tick(object sender, EventArgs e)
{
if(asyncResult != null)
if (asyncResult.IsCompleted)
{
dlgt.EndInvoke(asyncResult);
asyncResult = null;
timer1.Enabled = false;
}
}


But, even better than a timer would be to use those two nice little parameters at the end of BeginInvoke(). They are the "Thread Completion Callback" and the "User Data Object". You supply a callback that will get called when the thread completes. It is called with the user data object passed in inside of the IAsyncResult parameter. We normally pass the delegate itself as the user data object so that we can use it to call the EndInvoke() with. This allows us to eliminate those class scope variables we added earlier (though I don't, I'll leave it as an exercise). So we still have to write a completion routine.


public void RecurseCopyDone(IAsyncResult result)
{
if (this.InvokeRequired)
{
this.Invoke(new AsyncCallback(RecurseCopyDone), new object[] { result });
}
else
{
RecurseCopyFolderDelegate d = result.AsyncState as RecurseCopyFolderDelegate;
d.EndInvoke(result);
label1.Text = "Done";
label2.Text = "Done";
}
}


The Invoke() above is not necessary to call the delegate's EndInvoke(), but it is needed to update my labels, so I opt to keep all the code together. EndInvoke() can be run from either the GUI thread or the callback thread.

So, there you have it. The complete source that follows of the Form1.cs and the Form1.Designer.cs has an additional button allowing us to cancel the operation. Copying files the way I am doing it cannot be interrupted, so cancellation occurs between files. The final version uses the IAsyncCallback instead of a timer. I hope this has been of use to you. Here's the code...

Form1.cs

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;

namespace ThreadPoolPart4
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

private bool CancelCopyFlag = false;
private IAsyncResult asyncResult = null;
private RecurseCopyFolderDelegate dlgt;
private void button1_Click(object sender, EventArgs e)
{
CancelCopyFlag = false;

// Get the Folder names, copy contents from one to the other
FolderBrowserDialog fb = new FolderBrowserDialog();
fb.ShowDialog();
string src = fb.SelectedPath;
fb.ShowDialog();
string dst = fb.SelectedPath;
// no error checking on the names, this is an example only

if (dst != src)
{
if (!Directory.Exists(dst))
Directory.CreateDirectory(dst);
// RecurseCopyFolder(new DirectoryInfo(src), new DirectoryInfo(dst), MyCopyCallback);
dlgt = new RecurseCopyFolderDelegate(RecurseCopyFolder);
asyncResult = dlgt.BeginInvoke(new DirectoryInfo(src), new DirectoryInfo(dst), MyCopyCallback, new AsyncCallback(RecurseCopyDone), dlgt);
}
}
private delegate void RecurseCopyFolderDelegate(DirectoryInfo src, DirectoryInfo dst, CopyCallbackDelegate cb);
private void RecurseCopyFolder(DirectoryInfo src, DirectoryInfo dst, CopyCallbackDelegate cb)
{
bool cancelled = false;
CopyArgs ca;
try
{
foreach (FileInfo fi in src.GetFiles())
{
string newfile = Path.Combine(dst.FullName, fi.Name);

ca = new CopyArgs() { IsDir = false, CurrentObject = newfile };
cb(ref cancelled, ca);
if (cancelled)
break;
if (!ca.Skip)
{
if (Directory.Exists(ca.CurrentObject))
Directory.Delete(ca.CurrentObject);
fi.CopyTo(ca.CurrentObject, true);
}

}
if (!cancelled)
foreach (DirectoryInfo subsrc in src.GetDirectories())
{
ca = new CopyArgs() { IsDir = true, CurrentObject = Path.Combine(dst.FullName, subsrc.Name) };
cb(ref cancelled, ca);

if (cancelled)
break;
if (ca.Skip)
continue;
DirectoryInfo subdst;
if (!Directory.Exists(ca.CurrentObject))
subdst = dst.CreateSubdirectory(subsrc.Name);
else
subdst = new DirectoryInfo(ca.CurrentObject);

RecurseCopyFolder(subsrc, subdst, cb);
}
}
catch (Exception e)
{
throw e;
}
}

public void RecurseCopyDone(IAsyncResult result)
{
if (this.InvokeRequired)
{
this.Invoke(new AsyncCallback(RecurseCopyDone), new object[] { result });
}
else
{
RecurseCopyFolderDelegate d = result.AsyncState as RecurseCopyFolderDelegate;
d.EndInvoke(result);
label1.Text = "Done";
label2.Text = "Done";
}
}

public class CopyArgs : EventArgs
{
public bool IsDir;
public bool Skip;
public string CurrentObject;
}

private delegate void CopyCallbackDelegate(ref bool Cancel, CopyArgs args);
private void MyCopyCallback(ref bool Cancel, CopyArgs args)
{
if (this.InvokeRequired)
{
this.Invoke(new CopyCallbackDelegate(MyCopyCallback), new object[] { Cancel, args });
}
else
{
if (args.IsDir)
label1.Text = args.CurrentObject;
else
label2.Text = args.CurrentObject;
Cancel = CancelCopyFlag;
if (Cancel)
{
label1.Text = "Cancelled";
label2.Text = "Cancelled";
}
}
}

private void button2_Click(object sender, EventArgs e)
{
CancelCopyFlag = true;
}
}
}


Form1.Designer.cs

namespace ThreadPoolPart4
{
partial class Form1
{
private System.ComponentModel.IContainer components = null;

protected override void Dispose(bool disposing)
{
if (disposing && (components != null))
{
components.Dispose();
}
base.Dispose(disposing);
}

#region Windows Form Designer generated code

private void InitializeComponent()
{
this.components = new System.ComponentModel.Container();
this.button1 = new System.Windows.Forms.Button();
this.label1 = new System.Windows.Forms.Label();
this.label2 = new System.Windows.Forms.Label();
this.button2 = new System.Windows.Forms.Button();
this.SuspendLayout();
//
// button1
//
this.button1.Location = new System.Drawing.Point(18, 17);
this.button1.Name = "button1";
this.button1.Size = new System.Drawing.Size(75, 23);
this.button1.TabIndex = 0;
this.button1.Text = "button1";
this.button1.UseVisualStyleBackColor = true;
this.button1.Click += new System.EventHandler(this.button1_Click);
//
// label1
//
this.label1.AutoSize = true;
this.label1.Location = new System.Drawing.Point(22, 71);
this.label1.Name = "label1";
this.label1.Size = new System.Drawing.Size(35, 13);
this.label1.TabIndex = 1;
this.label1.Text = "label1";
//
// label2
//
this.label2.AutoSize = true;
this.label2.Location = new System.Drawing.Point(22, 113);
this.label2.Name = "label2";
this.label2.Size = new System.Drawing.Size(35, 13);
this.label2.TabIndex = 2;
this.label2.Text = "label2";
//
// button2
//
this.button2.Location = new System.Drawing.Point(169, 22);
this.button2.Name = "button2";
this.button2.Size = new System.Drawing.Size(75, 23);
this.button2.TabIndex = 3;
this.button2.Text = "button2";
this.button2.UseVisualStyleBackColor = true;
this.button2.Click += new System.EventHandler(this.button2_Click);
//
// Form1
//
this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F);
this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;
this.ClientSize = new System.Drawing.Size(292, 266);
this.Controls.Add(this.button2);
this.Controls.Add(this.label2);
this.Controls.Add(this.label1);
this.Controls.Add(this.button1);
this.Name = "Form1";
this.Text = "Form1";
this.ResumeLayout(false);
this.PerformLayout();

}

#endregion

private System.Windows.Forms.Button button1;
private System.Windows.Forms.Label label1;
private System.Windows.Forms.Label label2;
private System.Windows.Forms.Button button2;
}
}

Monday, August 18, 2008

Generics Fast

You are trying to come up to speed in C# fast. Then all of a sudden, you come across the C# Generic Type and wonder how do I master that.

Well, I've got some good news for you. You are in good company. You thought I was going to say I'll help you master generics. No, that's beyond a simple blog like mine. But, maybe I can help you become proficient. We just have to get the main learning hurdles out of the way.

Hurdle #1 - Rationale
Generic Types were added to C# in version 2.0. Now, I am not going to go into any detail on the history of C# or generics. I just mention this because the C# team decided they had a serious enough short coming in 1.1 to add this major feature. That is, many, many programmers were opting to ditch type-safety and decent performance in order to write more versatile data structures. How's that? Well, folks (MS included) were writing things like List and Queue and Stack to take "objects". Then their particular data structure could work on anything, and that would maximize code reuse (and likewise productivity).

The problem is, though re-use is greatly increased, productivity may be closer to a "wash" than a gain. That's because in implementing a "generic" structure that takes only "object" types, type-safety is lost. With lost type-safety, there's an increase in coding errors and run-time bugs and thus more time is spent testing and fixing the code. Okay, reusability is a win, productivity is a tie maybe, and performance is a what? Well performance is a loser. To use such a data structure with integers will require the compiler to "box" the type to make it an object. For reference types, the compiler spends extra time with cast operations. One article states the performance of the object data structure is about 200% slower than if it had been implemented as a specific value type.

Generic types solve the performance issue while giving the programmer back the type safety. And, clearly generic types provide for substantial code reuse.

Hurdle #2 - How to Use a Generic Type
Alright. It's good to know why, but it's better to know how. Let's start by using a generic type defined by the framework. Let's start with the List.


// instantiate a List of Strings
List strList = new List();
strList.Add("One");
strList.Add("Two");
foreach(string s in strList)
Console.WriteLine(s);


Rather than cut and paste this code into your IDE, go ahead and type it out. Intellisense will help you. In declaring your "strList" as a "List", everything about the list is now "string" oriented. The constructor, the Add(), the enumeration, etc. It's as if a wholesale find and replace has be done on the type place holder. Maybe you saw <T> in the intellisense, T is the place holder. More precisely, T is the "generic parameter" to the List<> type. And though, a macro like replacement is done, it's very different from the C++ Template where the compiler does the replacement. In C#, generics are implemented in the CLR, so its the JIT that is doing the final replacement. It will be good to keep this in mind when you "type-cast" your way around compiler errors only to have your code throw an exception at run-time on a bad conversion.

Hurdle #3 - How to Write a Generic Type
Which brings us to writing our own generic types. This next bit of example code makes use of several applications of the generic type syntax.


public class GenericClass
{
private T genProp; // a generic field
public T GenProp { // a generic property
get { return genProp; }
set { genProp = value; }
}
public GenericClass() // a generic constructor
{
}
public T GenMethod() // a generic method
{
return genProp;
}
public string GenMethodT(T val) // you get the picture
{
return val.ToString();
}
// and so on...
}
// ...
GenericClass gc = new GenericClass();
Console.WriteLine(gc.GenMethodT(5));


You can see that the generic parameter T is all over the place and each occurance, if you can imagine, will be filled in with whatever type you end up instantiating this class with. It's behavior will be approriate for whatever "generic argument" you pass.

So, we've got the "generic type", the "generic parameter" and the "generic argument". What are each of these? In the example above, the "generic type" is "GenericClass<T>", the "generic parameter" is "T", and the "generic argument" is "int". There can be multiple generic parameters, and the name of the variable is arbitrary (within naming rule limits, of course). To specify multiple parameters, simply separate them with commas.


Dictionary< int, string> dict = new Dictionary< int, string>();
dict.Add(1,"One");
dict.Add(2,"Two"); // and so on...

//...
public class MyNextGeneric
{
public T myMethod(U arg)
{
//...
}
}


Hurdle #4 - Using Generic Constraints
You are now well on your way to writing generic classes and methods. It won't take long before you start seeing some interesting compiler errors.


public bool GenMethodCompare(T val1, T val2)
{
return (val1 == val2) ? true : false;
}


This bit of code produces the error "Operator '==' cannot be applied to operands of type 'T' and 'T'". You may ask why?! That might be because you are thinking like C++ instead of C#. In C++, maybe you have instantiated the class with "int" which is very easy to compare. The C++ compiler can make the final call. But with C#, the JIT will make the final call, and though the compiler knows what generic arguments you are passing, it does not know what might be passed by an external assembly. The C# team could probably have figured something out, but with deadlines looming, who knows. Regardless, it's situations like these for which "Generic Constraints" were invented.

There are three kinds of Generic Constraint, "Derivation Constraints", "Construction Contraints", and "Reference/Value Type Constraints".

In the last example, the compiler "could" compile the generic method if instead of "==", we used object.Compare() and if the compiler could just be assured that any generic argument supplied for T will be an IComparable type object. This would be a "Derivation Constraint". Notice in this next example that I use a generic IComparable interface as the constraint. Any class or interface can be used as a constraint, even other generic parameters.


public MyClass where T : IComparable
{
public bool GenMethodCompare(T val1, T val2)
{
return (val1.Equals(val2)) ? true : false;
}
}


All the generic constraints use the syntax "where T : constraint, constraint" at the end of the type declaration (after any base class and/or interfaces) and before the open curly brace. If there are multiple generic parameters that each need constraints, the syntax would be


public class GClass : SomeBaseClass, ISomeInterface
where T : IComparable
where U : IEnumerable
{ ... }


"Construction Constraints" are used when your generic class must constuct objects of type T. You may get the error "Cannot create an instance of the variable type 'T' because it does not have the new() constraint". This is very self explanatory message in terms of what you've got to do. But, it occurs because your code is trying to instantiate an object of type "T". Many types are implemented which do not have default (i.e., parameter-less) constructors. Without the "new()" constraint, the compiler will not allow code that instantiates a generic parameter to compile. You can combine new() with other constraints by separating with a comma and placing it after the others.


public class GClass where T : IComparable, new()
{
public T factory() { return new T(); }
}


"Reference/Value Type Constraints" tell the compiler that only reference types or only value types are allowed as the generic argument. "class" indicates a reference type constraint, while "struct" indicates a value type constraint. Notice that it doesn't make any sense to combine the "struct" generic constraint with the "Derivation Constaints", they are incompatible. It doesn't make sense to combine the "class" generic constraint either, it is redundant.


public class RTypeClass where T : class { ... }
public class VTypeClass where T : struct { ... }


Finish Line
That's about all I can cover in an evening. You should have everything you need to start using and creating your own generic types. My article "Thread Synchronized Queing" is reasonable working example of a generic Queue class if you think that would be of any value to you. Just don't try writing anything too complex right away, I wouldn't want you to hurt yourself.

Useful Resources
An Introduction to C# Generics
Generics (C#)
Using C# Generics on Static Classes - Wes' Puzzling Blog
C# Frequently Asked Quetions - How do C# generics compare to C++ templates?

Tuesday, August 12, 2008

Sleep

When I first started programming, I thought "Sleep" to be one of the most useless calls I could make. No, I'm not talking about personal habits, I'm talking about the Sleep() function.


using System.Threading;
Thread.Sleep(int milliSeconds)

We find this little "do nothing" function in C, C++, C#, VB, VB.Net, WScipt, and similar functions in other languages. And, I have since come to understand how useful and confusing "doing nothing" can be. So, let's take a look at it; I'll be focusing on the C# Sleep() function.

Sleep() simply suspends the execution of your thread for a specific period of time. It relinquishes the remainder of its time-slice and becomes "unrunnable" for a period of time. If the time period is zero, it relinquish its time-slice only if another thread is ready and waiting to run. If there are no waiting threads, and the time span is zero, the calling thread remains ready to run and therefore returns from Sleep() immediately.

So, Sleep() is useful for delaying a thread, or for getting a thread out of the way of other threads. Let's say a user enters a password incorrectly, a delay at this point is useful to help dissuade a rapid fire brute force attack on your password input algorithm. Keep in mind that Sleep() also takes away the thread's responsiveness. So, delaying operations should only be implemented with Sleep() if you can tolerate or mitigate the total lack of responsiveness from the sleeping thread.

Sleep() is also useful for making a thread behave nicely by not hogging the CPU. The Operating System will preempt threads for you and let other threads run. But you may still see your thread taking up way too much processor resources and not allowing other threads enough time to run. A Sleep() for some very short interval could be useful in improving overall application responsiveness. Here again, calling Sleep(0) on a thread that is very busy processing messages is about useless because each call to the message queue already allows other waiting threads to run.

You can use Sleep() for very simple timed delays. A pattern where there is a producer and several consumers might use a Sleep() to throttle the producer to let the consumers keep up or catch up. That is, the producer might fill a queue and then stop and Sleep() when the queue gets too full. The amount of time the producer thread sleeps determine how much the consumers have a chance to process. But even so, a more robust implementation would have the producer "wait" for space available on the queue or even the queue reaching a "low water mark", rather than blindly sleeping.

The single most often misuse of Sleep() in C# is when a thread that has created a window then calls Sleep(). Threads that create windows, whether directly or indirectly, must remain awake to process messages. When a thread sleeps, it cannot process those messages and the application appears to hang. The C/C++ documentation for SleepEx() and Sleep() in fact warn that should a thread sleep indefinitely, when it should be processing messages, then the system will deadlock because message broadcasts are sent to all windows in the system.

We are not discussing C/C++ and we certainly are not going to sleep indefinitely. So, C# should keep us out of such a catastrophic situation, but the C# programmer must still avoid putting "GUI" threads to sleep to avoid hanging up his application. The C# documentation for Sleep() simply says "This method does not perform standard COM and SendMessage pumping". This hardly suggests how ugly the application behavior can be if you use Sleep() on your GUI thread. In short, just keep your Sleep() off of the GUI threads, and keep your COM and DDE calls (unless absolutely sure there is no GUI involved) on the GUI threads.

What are the alternatives to calling Sleep() on a GUI thread? First, and foremost, your best alternative is a good architecture starting out. GUI threads never "need" to sleep because they should normally be waiting for messages. If a GUI thread is so busy that it can't service the messages as they arrive, the GUI should offload the work to a separate thread. But, if your GUI is so busy, it's probably not going to sleep anyway. If a GUI needs to wait for something to happen that won't occur on its message/event loop, it should delegate that wait to a thread as well, and then use state variables to modify its behavior until the desired event occurs.

For example, reading from a network could wait for a very long time. A GUI would act like it's hung if it got stuck at a network read for any length of time. The same goes for reading streams. Often a stream may resolve to a network resource. Or, if you are reading a lot of data from a file, the application will similarly appear hung. By delegating the operation to a separate thread, the GUI is free to respond to other events. Its tempting at this point for the GUI to Sleep(), but don't do it. Instead, set a state variable that indicates an operation is in progress and prevent your other dependant GUI operations from proceeding until the operation finishes. Better yet, try to eliminate the GUI dependancy on the completion of the read altogether. Architect your application with threads well chosen for the task, especially if you must wait for various kinds of system events.

Another alternative to Sleep() is a "Timer" function (see my articles on timers). With a timer function, a thread simply schedules a time when it will get back to the item its waiting for. In the meantime, the GUI continues to receive messages and events. In fact, the timer expiration is just an event which you write a handler for.

If your thread must wait for something, your best performance will be achieved through the proper use of "Wait" type calls. WaitOne() and WaitAny() on an appropriate "WaitHandle" is always better performance wise than Sleep() because your thread waits only as long as it takes for the event to fire and the context switch to occur. Furthermore, there is no risk of the thread returning earlier than the event (unless you provide a timeout to "Wait").

How accurate is Sleep()? The Sleep time interval is specified to the millisecond. But Sleep() time expiration is serviced by the Operating System based upon its "tick" size. CPU ticks occur at a constant rate larger than a 1 millisecond granularity. So, threads will not wake at exactly the time span specified due to CPU tick granularity and scheduling priorities. They wake at their first scheduled opportunity after becoming "ready to run". They become ready to run at a time determined by how divisible the sleep time interval is by the CPU tick value. Sleeping threads become runnable on CPU Tick intervals. You can control the "tick" size (and thus the accuracy of Sleep) in C/C++ with timeGetDevCaps, timeBeginPeriod and timeEndPeriod.

By the way, while we are on the topic of C/C++, the call to SleepEx() is worth mentioning. SleepEx() is an alertable sleep function that awakes prior the elaspe of the time period when the same thread's I/O completion callback is called or when an APC function is called. The lines between "Sleep" and "Wait" begin to blur with this function.

In summary, use Sleep() as the "poor man's" Wait() function. Setting up a WaitHandle, synchronizing correctly, and handling timeouts with a "Wait" can be complex. Certainly, the "Wait" type calls would provide much lower latency, but if a few seconds of latency does not adversely affect your application, you don't need that added complexity. Sleep() is simple, use it simply. Never sleep on the GUI, or on the job, the consequences are not pretty.

That about wraps it up. There is no code today. Enjoy.