A regular expression is a pattern containing specific characters to search for sub-strings in strings. The patterns are at first glance hard to understand but with a little bit of practice very useful.

Imagine you want to extract phone numbers from a text file or a worksheet, however the issue here is that there is also text combined with phone numbers and that makes it hard to filter out. Here a regular expression is coming to rescue.

Looking for 555-412395 in a cell range is easy in excel but looking for phone numbers with 3 digits then a hyphen and then 6 more digits is a lot harder. Each digit can be anything between 0 and 9.

You will find a guide later in this post to most of the characters you can use in regular expressions and how to solve the problem with phone numbers.

What do I need to do to use regular expressions?

Nothing, you don't need to reference anything in VB Editor \ Tools for this to work. I will now demonstrate a simple UDF that counts matching strings in a cell range.


Function CountStringsCellRange(c As Range, pttrn As String)
Rng = c.Value
regexpattern = pttrn
With CreateObject("vbscript.regexp")
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = regexpattern
If c.Cells.CountLarge > 1 Then
    For rr = LBound(Rng, 1) To UBound(Rng, 1)
        For cc = LBound(Rng, 2) To UBound(Rng, 2)
            Set Results = .Execute(Rng(rr, cc))
            i = i + Results.Count
        Next cc
    Next rr
    CountStringsCellRange = i
Else
    Set Results = .Execute(Rng)
    CountStringsCellRange = Results.Count
End If
End With
End Function

 

  1. Copy UDF above
  2. Start/Open Excel
  3. Go to VB Editor (Alt+F11)
  4. Click "Insert" on the menu at the top
  5. Click "Module"
  6. Paste code to code module

How do I start building regular expressions?

I will use the UDF above to show you how to work with patterns using specific characters. It is important to know which characters you can use and what they do.

Search is case sensitive 

AbC - matches this exact string. A upper case letter A, a smaller case letter b and a upper case letter C

A range of letters

A hyphen allows you to search for a range of characters or numbers. [] brackets allow you to search for exactly one character.

[A-Z] finds a single upper case letter between A to Z.

[A-Z][a-z] matches a single uppercase letter between A to Z and then a lower case letter between a to z.

Match at least one or more characters

A+ matches a single A and multiple A's.

A+ matches A and AA and AAA but not B, three matches in total displayed in cell C9 in picture above.

Match 0 or one character before it

A? matches an empty string or one A.

Match 0 or more characters

A* matches an empty string or one or more A.

Match any character except new line

. matches any character

A.A matches AbA and ACA

Multiple matches

Curly brackets allows you to match multiple characters.

[A-Z]{4} is the same as [A-Z][A-Z][A-Z][A-Z] but easier to write. [A-Z]{4} matches 4 upper case letters between A-Z.

ABCD is the only match in string ABCD abcd aBcD to pattern [A-Z]{4}.

NOT operator

^ allows you to exclude certain characters

[^A] matches B, C and D because they are not equal to A.

OR operator

| is an OR operator.

A|C matches A and C in string ABCD.

Escape characters

\ (backslash) escapes characters.

If you are looking for question marks in a string the backslash allows you to escape certain characters that would otherwise have a different functionality.

Match start and end of string

^ matches the start of a string.

$ matches the end of a string.

^ABC matches the three first characters ABC DCE

DCE$ matches the three last characters in string ABC DCE

BC$ does not match ABC DCE because it is not at the end of the string

Group patterns

Parentheses allows you to group characters

W(AB){2} is the same as WABAB

What about phone numbers?

In the beginning of this post I said that regular expressions are good for matching phone numbers.

The phone number I am looking for begins with three digits 0 to 9, the pattern becomes [0-9]{3}.

Next is a hyphen and then 6 more digits between 0 to 9, the pattern becomes -[0-9]{6} and combined [0-9]{3}-[0-9]{6}.

Download excel *.xlsm file

Count matching substrings using reg exp.xlsm