Testing

webTiger Logo Wide

.NET Tutorials – 9 – Strings and Regex

Microsoft .NET Logo

Strings are a heavily used data type that is able to holding textual data. Even though a lot of developers may think of .NET’s System.String data type as a primitive, it is a reference type and Microsoft don’t class it as one. It does have its own C# keyword (string) which often leads the confusion. This article introduces strings in detail.

In this article:

Introduction to .NET Strings

Strings are a reference type. That means that are treated the same way as objects, and not like value types would be. A string variable is essentially a pointer to a location in memory (more specifically on the Heap) where textual data is stored.

Unlike in earlier programming languages (such as C and C++) where a string is a null-terminated array of 8-bit ASCII character, .NET’s implementation is much more complicated.

Here are some facts about .NET strings:

  • They are immutable. (More about this later on in the article.)
  • They hold 16-bit unicode data, instead of 8-bit ASCII.
  • A string variable is a pointer that points to an array of pointers, and they in turn point to the data held in memory.
  • .NET does all the memory management.

The complexity of this implementation, with a pointer to a set of pointers referencing different memory locations, might seem unnecessary and over-engineered but the flexibility it introduces is significant.

In older programming languages, such as C and C++, if you wanted to create a string and assign a value to it, you would typically need to make a request to reserve a region of contigious (co-located) memory of sufficient size for your string, and then you could assign the text value.

If you wanted to add more text to the value, and you’d only reserved enough for the initial value being stored, you would need to request the contigious region you are using be extended. If memory segments immediately after your block were allocated to another process then your request would fail.

This happened so often that C/C++ programmers would usually create a new string variable, requesting enough space to store the larger text value, then copy the initial value and the new data to the new variable without even trying to request the memory extension. Also, the programmer would have to request release of the resources the old variable was using afterwards too.

On top of the memory allocation issues, C/C++ strings only supported the 8-bit ASCII character set, which is heavily focused on western alphabets.

So, to get around these issues Microsoft made the C# string much more flexible and adaptable. In fact, a single string variable in .NET can theoretically hold up to 2GB of data.

System.String is Immutable

Strings in .NET differ from those in most other languages in that they are immutable. This means that once they are created and their underlying value is stored (where they are stored in memory), those memory locations cannot be modified.

This might not make sense to you since we all know you can create a string variable in .NET, assign a value to it, concatenate it with more text, remove some text from it again, etc.

The thing is, what happens in .NET when you modify a string value is that a completely new memory pointer is created, enough memory for the new string is reserved (not necessarily contigiously though, hence the array of pointers), and the new value is written to a completely separate memory region. Once the activity has completed, the original memory resources are released for the .NET garbage collector to de-allocate and free at its convenience.

Although this might once again seem unnecessary and a disadvantage, consider how strings work in the likes of C/C++. In those languages, you could create a string variable (a pointer), write some data to its allocated memory range, then create a second string variable abd make the second variable equal the first.

What this actually does is give you two string pointers both pointing to the same area of memory. If you modified the second variable in code then you would have inadvertently changed what the first variable stored too.

So, in an effort to avoid all these problems, Microsoft made a conscious decision to implement strings that way they have in .NET to ensure each variable instance (pointer) held its own data.

Creating Strings

The most basic way to define a System.String and assign it a value in C# is using the string keyword with a variable name and then assigning a value encapsulated in speech marks to it. For example:

string myText = "This is text in a string variable!";Code language: C# (cs)

Microsoft recommends using language-specific primitives and keywords in .NET, so the string keyword is preferred to using the System.String type that it aliases. So, these are equivalent definitions:

System.String myText = "This is declared using the type name";
string myText2 = "This is declared using the language-specific alias";Code language: C# (cs)

The System.String / string type also has a few constructor overloads defined that may be useful when initialising your variables, such as:

string text = new string('-', 50);   // Creates a string of 50 hyphens.

char[] alphabet = { 'A', 'B', 'C', 'D', 'E', 'F' /* etc. */ };

// This returns "Easy as ABC".
string text = "Easy as " + new string(alphabet, 0, 2); 

// This returns the full alphabet.
string text = new string(alphabet);Code language: C# (cs)

All types in .NET inherit from the System.Object base class (even value types). As this class exposes a ToString() method, all types can be converted to a string value.

int number = 12345;
string numberAsText = number.ToString();Code language: C# (cs)

At this point it is worth noting that while all .NET types do expose the ToString() method, the default implementation in System.Object is to return the type name, so it is only where the default behaviour has been overridden (e.g. in all primitive value types) that the method is useful.

In your custom types, to ensure a meaningful string representation of the object is returned, you should override the default behaviour with your own ToString implementation wherever possible/reasonable to do so.

Escape Characters

We’ve mentioned escape characters briefly in relation to the primitive char type, in a previous tutorial,but let’s revisit them now and consider them in more detail when working with strings.

Strings support exactly the same set of escape characters as char does, plus the speech mark (\").

Consider the code below:

string prompt = "Click "OK" to continue...";Code language: C# (cs)

If you try to compile this code it will result in compilation errors. This is because the first speech mark opens the string value and the second speech mark closes it again. So everything from OK and after is consider invalid C# syntax.

To fix our code we need to ‘escape’ the speech mark in the value, like so:

string prompt = "Click \"OK\" to continue...";Code language: C# (cs)

Now the code is valid and the compiler won’t complain. Here are the most commonly used escape characters:

Escape Code Meaning
\’ Single quote (inverted comma).
\” Speech mark (double-quotes).
\\ Backslash
\0 Null.
\a Alert (aka Bell).
\b Backspace.
\f Form feed.
\n New line.
\r Carriage return.
\t Horizontal tab.
\v Vertical tab.
\\ Backslash.
\xNN ASCII character code, where NN is hexidecimal (e.g. \x5A = ‘Z’).
\uNNNN Unicode character, where NNNN is hexidecimal (e.g. \u005A = ‘Z’).

Something you may have noticed is that there is an escape code for backslash (which is itself used to indicate an escape code). This makes sense because, if we are defining a value including backslashes, then the compiler could mistake a genuine backslash for an escape character. For example:

string filePath = "C:\\Users\\joe\\Desktop\\myFile.txt";Code language: C# (cs)

.NET provides another way to declare this kind of string (i.e. one with backslashes in it), and that’s the ‘verbatim string format’. Prefixing the value with an ‘at’ character (@) informs the compiler that this should be treated differently to the default declaration style and that escape codes should be ignored. Another benefit of using the verbatim formattting style is that a string value can span multiple lines without causing compiler errors. For example:

// This...
string filePath = "C:\\Users\\joe\\Desktop\\myFile.txt";

// ...is equivalent to this...
string filePath = @"C:\Users\joe\Desktop\myFile.txt";

// This would cause a compiler error...
string myComments = "Some comments that span 
    multiple lines and causes a compilation error.";

// But, this would not...
string myComments = @"Some comments that span
    multiple lines and the compiler accepts it 
    without any issues.";Code language: C# (cs)

Comparing Strings

One way in which string differ from most other reference types is that they override most of the standard comparison operators. For example:

string first = "Potato";
string second = "Potato";

if (first == second)
{
    // TODO: do something.
}
else if (first == second)
{
    // TODO: do something else.
}Code language: C# (cs)

When the comparison operators are executed, the runtime is comparing the strings at character-level instead of doing the default for reference types by checking if the memory references match.

The string type also implements IComparable, so the CompareTo() method is available as an alternative, and the type also defines a static Compare() method as well. For example:

string first = "Potato";
string second = "Potato";

if (first.CompareTo(second) == 0)
{
    Console.WriteLine("The values are equal.");
}
else if (string.Compare(first, second) > 0)
{
    Console.WriteLine("The first value is after the second, in ascending sort order.");
}
else 
{
    Console.WriteLine("The first value is before the second, in ascending sort order.");
}Code language: C# (cs)

Notice that where the equality (==) and inequality (!=) operators return a Boolean value, the compare methods return a numeric value where: 0 indicates equality; -1 indicates the first variable is before the second (in ascending alphanumeric sort-order); and, 1 indicates the first variable is after the second.

The string type also has other methods that can be useful when searching for identifiers in values, but this is covered later on in this tutorial, so won’t be explained now.

Modifying Strings

As previously mentioned, the string type overrides many C# operators to add value and make the data type easier to use. This extends beyond just comparison to assignment and modification too.

Appending Strings

For example:

// Concatenating strings using operators.
string text = "Something ";
text = text + "to ";
text += "think about!"; // After this assignment, text = "Something to think about!".

// After this assignment, text = 'text' "Something else to think about!".
text = "Something else " + "to " + "think " + "about!";

// You can also use string's static 'Concat' method.
text = string.Concat("That ", "worked", ", and ", "so ", "does ", "this!");

// But using string's instance version of Concat would fail, 
// as the return type is a collection of chars
text = "That worked";
text = text.Concat(", but this doesn't!");

// You can make the instance version of Concat work using ToString()...
text = "That worked";
text = text.Concat(", and so did this!").ToString();Code language: C# (cs)

In the code example above, we use operator-based concatenation to append values to a string variable, and also use the string type’s built-in Concat() methods. You may have noticed that the inline comments mention an inconsistency between implementations of Concat. This is indeed the case. The static method returns a string, but the instance version returns an enumerable collection of char. You can still using the instance version, but you need to remember to call ToString() too.

You can also build a string value using string.Format(). This supports a formatting string parameter with placeholders of the form {n} where n is numeric, going from 0-n, and then the replacement value arguments are provided in order as additional parameters. The method also implicitly calls ToString() on the arguments list so you don’t have to do that explicitly in most cases. For example:

string text = string.Format(
    "The temperature is {0}C, the wind direction is {1}°, and the windspeed is {2}mph",
    currentTemperatureInCelsius, windDirectionInDegrees, windSpeedInMph);Code language: C# (cs)

There may be situations where you want to pad out your text value. There are methods for that too:

string name = "Joe Buck";

// Let's say name is a fixed field size of 24 characters, we can pad using...
string frontPaddedName = name.PadLeft(24);
string backPaddedName = name.PadRight(24);

// frontPaddedName = "               Joe Buck"
// backPaddedName  = "Joe Buck               "

// The above pads with a space character, but you can specify the character to use...
string highlighted = name.PadLeft(((24 - name.Length) / 2) + name.Length, '*');
highlighted = highlighted.PadRight(24, '*');

// highlighted = "********Joe Buck********"Code language: C# (cs)

Replacing Text in Strings

The string.Replace() method allows you to match instances of a value in a string variable, and replace them. For example:

string passage = "It was a cold day. The wind blew hard and chilled us " +
    "to the bone. Jane asked if she could borrow my scarf, so I passed it " +
    "to her to help her keep warm. It felt like winter was never going to end.";

string revisedToFiona = passage.Replace("Jane", "Fiona");

// The 'revisedToFiona' variable now holds...
// It was a cold day. The wind blew hard and chilled us to the bone. Fiona 
// asked if she could borrow my scarf, so I passed it to her to help her keep 
// warm. It felt like winter was never going to end.

string revisedToGraham = passage
    .Replace("Jane", "Graham")
    .Replace("her", "him")
    .Replace("she", "he");

// The 'revisedToGraham' variable now holds...
// It was a cold day. The wind blew hard and chilled us to the bone. Graham 
// asked if he could borrow my scarf, so I passed it to him to help him keep 
// warm. It felt like winter was never going to end.Code language: PHP (php)

Replace() finds and replaces all instances of the search term with the new value supplied. As you can see in the code example above, the method can be chained so that multiple changes can be made in one statement.

Inserting Text into Strings

You can insert text into a specified position in an existing string using the Insert() method. For example:

string killerName = "Lee Oswald";
killerName = killerName.Insert(4, "Harvey ");

// killerName now contains "Lee Harvey Oswald".Code language: JavaScript (javascript)

Removing Text From Strings

Complementary to insert text into strings, you may instead want to remove text. This is easily achieved using the Remove() method. For example:

string name = "George 'The Legend' Parker";
string actualName = name.Remove(7, 13);

// actualName contains "George Parker";Code language: JavaScript (javascript)

Searching Strings

There will be many times when you will need to check if a string value contains some identifier or when you need to be able to extract a sub-string from the current value. The string types provides a few methods to help in this regard, such as Contains(), IndexOf(), LastIndexOf(), IndexOfAny(), StartsWith(), EndsWith(), etc.

string text = "Some days are better than others.";

bool outcome = text.Contains("better"); // This would return true.
outcome = text.Contains("people");      // This would return false.

outcome = text.StartsWith("Some");      // This would return true.
outcome = text.EndsWith("!");           // This would return false.

int index = text.IndexOf("Sunday");     // This would return -1 (i.e. not found).
index = text.IndexOf("day");            // This would return 5.
index = text.IndexOf('a', 7);           // This would return 9
index = text.IndexOfAny("a", "b");      // This would return 6.
index = text.LastIndexOf("e");          // This would return 29.Code language: C# (cs)

Substrings

There are many times when you want to extract part of a string value from its whole. The string type offers a few different ways to do this, and we’ll look at the Substring() and Split() methods here.

First, let’s look at the Substring() method. It allows you to extract a single string from a starting position in the current string object, optionally with a specified length. For example:

string text = "It was the best of times,    it was the worst of times";

// This will return everything after the comma.
string subText = text.SubString(text.IndexOf(',') + 1);

// This will return 'was'.
int start = text.IndexOf(' ') + 1;
subText = text.SubString(start, text.IndexOf(' ', start + 1) - start);Code language: C# (cs)

While the Substring() method only returns a single string value, the Split() method can be used to extract an array of sub-strings from the current value. For example:

string text = "It was the best of times,    it was the worst of times";

// 'words' would contain "It", "was", "the", "best", "of", "times", etc.
// 'words' would contains a few blank elements after the comma.
string[] words = text.Split(' ');

// In this revised call, blank entries are removed so all the elements in 
// the returned array would be words (albeit with 'times' including the 
// trailing comma).
words = text.Split(' ', StringSplitOptions.RemoveEmptyEntries);

// This is also valid, and would produce:
// "It "
// " the best of times,    it "
// " the worst of times"
string[] parts = text.Split("was");Code language: C# (cs)

Complementary to the Split() method(s), the string type provides a set of Join() methods too.

Although at first glance it may looks like the string type has instance Join() methods, if you are using IntelliSense in Visual Studio and you type your string variable name and hit the period key, this is not the case. Those Join methods are extension methods provided by LINQ and don’t work the same way as the static versions, so only use the static string.Join() methods. For example:

// After this join, text = "one,two,three,four".
string text = string.Join(",", { "one", "two", "three", "four" });Code language: C# (cs)

Trimming Strings

The string type supports several trimming methods to make it easy to strip extraneous whitespace and other characters from your values. For example:

string text = "     something with a blank prefix and blank suffix      ";

// This would return "something with a blank prefix and blank suffix".
string trimmed = text.Trim();

// This would return "something with a blank prefix and blank suffix      ".
trimmed = text.TrimStart();

// This would return "thing with a blank prefix and blank suffix      ".
trimmed = text.TrimStart({ ' ', 's', 'o', 'm', 'e' });

// This would return "     something with a blank prefix and blank suffix".
trimmed = text.TrimEnd();

// This would return "     something with a blank prefix and blank".
trimmed = text.TrimEnd({ ' ', 's', 'u', 'f', 'i', 'x' });Code language: C# (cs)

Converting to a Character Array

Strings can be converted to an array of Unicode characters using the ToCharArray() method. This may be useful in a number of situations. Here, we’ll specify a set of alternative delimiters as a string value and then convert that to a char array for use with the string.Split() method, etc. For example:

// Support delimiting by comma, semi-colon, pipe, or h-tab.
const string delimiters = ",;|\t";

string text = "one|two|three|four|five,once;I;caught;a;fish;alive";
string[] results = text.Split(delimiters.ToCharArray());Code language: C# (cs)

The above code is not as efficient as if the delimiters were declared as a char array in the first place, but it was an easy example to do, that demonstrates the process in principle.

The StringBuilder Class

Up until now, we’ve focused on the string type, and we explained that it was immutable and what that meant. There are obvious performance impacts of creating complete new string objects each time we modifying a variable’s value, as well as the overhead of the runtime having to release and reclaim all those memory pointers.

For situations where the immutable paradigm is not a good fit (e.g. large string values with frequent changes), the .NET Framework offers a specialised StringBuilder class that can be used instead.

The StringBuilder creates mutable objects, but they are managed by the runtime so you still don’t need to worry about memory allocation or release.

We can use the StringBuilder like this:

// Build a sentence...
StringBuilder builder = new StringBuilder();
builder.Append("What ");
builder.Append("the ");
builder.Append("heart ";
builder.Append("wants");
builder.Append(", ");
builder.Append("the ");
builder.Append("heart ";
builder.Append("wants");
builder.Append(".");

// Copy the output to a string...
string text = builder.ToString();Code language: C# (cs)

The StringBuilder will automatically resize as data is appended to it. By default, it will initialise with a small amount of initial storage. As it reaches capacity, it will automatically request and reserve more space, and this happens in blocks.

When instantiating the class, you can pre-initialise StringBuilder to an initial capacity via the constructor’s parameters, removing the need for, and performance overhead of, multiple requests for capacity increases.

Unless a capacity is requested, the class will initially reserve 16 characters worth of memory (i.e. 32 bytes). As capacity is needed the reservation is doubled, so that the total size of the memory footprint might grow like this: 16, 32, 64, 128, 256, 512, 1024, etc. Knowing this is useful when setting the initial capacity when instantiating the object as you can aim for one of the typical capacity thresholds.

For example:

// Instantiate the variable, immediately reserving ~4k (characters) worth of storage.
StringBuilder builder = new StringBuilder(4096);Code language: C# (cs)

The class can also be instantiated with an initial value; and, optionally, a starting capacity too.

StringBuilder builder = new StringBuilder("What do we want with a light?");
builder.AppendLine();
builder.AppendLine("In the dark we can manage alright!");

// Or (because the above code is likely to have multiple capacity requests)...

builder = new StringBuilder("What do we want with a light?", 100);
builder.AppendLine();
builder.AppendLine("In the dark we can manage alright!");Code language: C# (cs)

StringBuilder has a few of the same methods as string, such as Insert(), Replace(), and Remove(), As you have seen, it also contains an Append() method and an AppendLine() method; and it includes an AppendFormat() method too, that can be used like string.Format(), where you specify a formatting string with placeholders and an array of arguments to replace them with.

You can pre-size the object before modifying it using the EnsureCapacity() method. Similar to initialising a capacity in the constructor, this resizes the existing object’s capacity as necessary. For example:

StringBuilder twoHundredLines = new StringBuilder(1024);
for (i = 0; i < 100; i++)
{
    twoHundredLines.AppendFormat("Line ");
    twoHundredLines.AppendLine(i + 1); 
}

StringBuilder oneHundredLines = new StringBuilder(1024);
for (i = 0; i < 100; i++)
{
    oneHundredLines.Append("Line ");
    oneHundredLines.AppendLine(i + 101); 
}

// Pre-size to reduce the number of memory capacity requests that are required.
twoHundredLines.EnsureCapacity(twoHundredLines.Length + oneHundredLines.Length);
twoHundredLines.Append(oneHundredLines);Code language: C# (cs)

String Performance Considerations

It is probably fairly easy to infer for the tutorial so far that, while it is very flexible and powerful, the immutable string type has the potential to be a performance and memory hog.

Whenever you are programming complex or repetitive operations with string data, take a moment to decide if the string data type is best for the scenario or if StringBuilder might be more appropriate.

Also, avoid declaring string variables inside loop constructs. This is likely to create multiple separate variable instances, one for each iteration, and none of them will be disposed of until the execution block (the loop) is exited or even later on than that. It is better to declare the variable(s) outside the loop code block. For example:

// Bad practice.
foreach (Account account in Accounts)
{
    string summary = "Name: " + account.HolderName + ", Account No. " + 
        account.AccountNumber.ToString().PadLeft('0', 8) + ", Balance: " +
        account.Balance.ToString("F2")
    Console.WriteLine(summary);
}

// Good practice.
string summary;
foreach (Account account in Accounts)
{
    summary = "Name: " + account.HolderName + ", Account No. " + 
        account.AccountNumber.ToString().PadLeft('0', 8) + ", Balance: " +
        account.Balance.ToString("F2")
    Console.WriteLine(summary);
}Code language: C# (cs)

In the ‘good practice’ example above, a single variable is defined. This is re-used each iteration, which would create new objects in memory just like the ‘bad practice’ case, but the difference is that all memory from previous iterations is released when using the variable declared outside the loop, so the garbage collector can clean it up immediately if it wants/needs to.

Regular Expressions

Introduction to Regular Expressions

Regular expressions (often shortened to RegEx or Regex) can be thought of as a mini-language in their own right. The concept has been in use for many years in the UNIX operating system, and many other platforms and languages now support regular expressions too.

Most have informally agreed on a common set of terms and rules. There will be minor differences between, say, C#’s implementation and JavaScript’s one, but for the most part expressions can be interchanged between languages (if you take a little care; and, in some scenarios, do a lot of testing!)

A regular expression can be thought of as a set of terms/conditions that can be applied to a string value, almost like masking them.

Matching Expressions

Regular expressions can be used to find matching values in strings. For example, here’s a simple expression for evaluating a person’s full name (as first name and last name only):

string expression = "^[a-zA-z]+ [a-zA-Z]+$";Code language: C# (cs)

What’s going on there?

  • ^. This anchors the evaluation to the start of the string.
  • [a-zA-Z]. This signifies and characters within that range (alphabetic characters in this case).
  • +. Applied after a constraint (such as [a-z]), this requires one or more matches.
  • [a-zA-Z]+. Combining the previous two parts, we require one or more alphabetic characters.
  • ' ' (space). A space between the two name parts.
  • $. This anchors the evaluation to the end of the string.

If we used this expression to test against “John Smith”, “P Diddy”, “Mr McG”, “Mr T”, “Jeffery Stewart”, etc. it would find a match.

If we tested the following names with the expression, they would all fail: “Madonna”, “Joe ‘The Killer’ Murphy”, “Simon D Graham”, “S P Harvey”, etc.

How can we use this expression to test our strings, and find matches? Let’s find out…

Regex tester = new Regex("^[a-zA-z]+ [a-zA-Z]+$");
string name1 = "John Smith";
string name2 = "Dave";
string name3 = "Fiona Knowles";
string name4 = "J R Hartley";

bool outcome = tester.IsMatch(name1);  // Returns true (match)
outcome = tester.IsMatch(name2);       // Returns false (no match)
outcome = tester.IsMatch(name3);       // Returns true (match)
outcome = tester.IsMatch(name4);       // Returns false (no match)Code language: C# (cs)

That is just the start of what we can do with regular expressions though. Regex is much more powerful as we’ll now find out.

Extracting Data Using Expressions

can be used to find and extract content from a string using expression matching. This means we can specify a pattern like we did above for a ‘full name’, and then we could look for matches on that name. For example, let’s write an expression that identifies words from a passage and then use it to extract those words:

Regex extractor = new Regex("[a-zA-Z]{2,}");
string passage = "Well, Prince, so Genoa and Lucca are now just family estates " +
    "of the Buonapartes. But I warn you, if you don't tell me that this means " + 
    "war, if you still try to defend the infamies and horrors perpetrated by " + 
    "that Antichrist — I really believe he is Antichrist — I will have nothing " +
    "more to do with you and you are no longer my friend, no longer my " +
    "'faithful slave', as you call yourself! But how do you do? I see I " + 
    "have frightened you — sit down and tell me all the news.";

MatchCollection matches = extractor.Matches(passage);

foreach (Match word in matches)
{
    // Match.ToString() is overridden in the class to return the matched text.
    Console.WriteLine(word.ToString()); 
}Code language: C# (cs)

You probably realise the [a-zA-Z] portion indicates any alphabetic character, but you might be wondering what the {2,} portion is all about. Well, its a minimum size and no maxmium size. If a number had been included after the comma that would constrain the maximum length of a match. Since a word is 2 letters or more, we only need to constrain the minimum.

You may also have noticed we’ve dropped the ^ and $ parts. This is because we want to detect words anywhere in the passage, not anchored to the start and end of the value.

What is interesting to note here is what will be output to the console. It would be something like…

Well
Prince
so
Genoa
and
Lucca
are
now
... etc.Code language: plaintext (plaintext)

The punctuation will be ignored. Also, we might have some broken words too. For example, since don't is a contraction (of do not), and we aren’t catering for that, it is likely to have been matched as simply don and not the fully word.

Replacing Data Using Expressions

Instead of just being able to find and extract data within strings, the Regex class can also manipulate the data. For example, thinking about our word finder/extractor code above, we could turn it on its head and match on punctuation. That way we can remove all punctuation from the passage. Let’s give it a go…

Regex replacer = new Regex("[,.?!]+");
string passage = "Well, Prince, so Genoa and Lucca are now just family estates " +
    "of the Buonapartes. But I warn you, if you don't tell me that this means " + 
    "war, if you still try to defend the infamies and horrors perpetrated by " + 
    "that Antichrist — I really believe he is Antichrist — I will have nothing " +
    "more to do with you and you are no longer my friend, no longer my " +
    "'faithful slave', as you call yourself! But how do you do? I see I " + 
    "have frightened you — sit down and tell me all the news.";

passage = replacer.Replace(passage, "");

// The passage would now be:
//    Well Prince so Genoa and Lucca are now just family estates 
//    of the Buonapartes But I warn you if you don't tell me that this means
//    war if you still try to defend the infamies and horrors perpetrated by 
//    that Antichrist — I really believe he is Antichrist — I will have nothing 
//    more to do with you and you are no longer my friend no longer my 
//    'faithful slave' as you call yourself! But how do you do I see I 
//    have frightened you — sit down and tell me all the newsCode language: C# (cs)

Customising Matching With Options

The .NET Framework allows configuration options to be passed to the Regex class using the RegexOptions enumerated type. This enumeration is bit-masked so you can combine the options to customise how the Regex class behaves. The following options are available:

  • IgnoreCase. Specifies case-insensitive matching.
  • Multiline. Changes the behaviour of ^ and $ so they match at the beginning/end of any line.
  • Compiled. Specifies the expression is compiled to MSIL instead of being interpreted.
  • Singleline.Changes the behaviour of . (dot) so it matches every char(instead of omitting \n).
  • IgnorePatternWhitespace. Eliminates unescaped white space from the pattern and enables comments marked with #.
  • RightToLeft. Specifies that the search will be from right to left instead of from left to right.
  • ECMAScript. Enables ECMAScript-compliant behavior for the expression.
  • CultureInvariant. Specifies that cultural differences in language are ignored.

If we revisit our name matching expression, we could re-do it like this:

Regex tester = new Regex("^[a-z]+ [a-z]+$");
string name1 = "John Smith";
string name2 = "Dave";

RegexOptions options = RegexOptions.IgnoreCase;

bool outcome = tester.IsMatch(name1, options);  // Returns true (match).
outcome = tester.IsMatch(name2, options);       // Returns false (no match).
Code language: C# (cs)