Author: Oscar Cronquist Article last updated on February 07, 2023

Count matching strings using regular expressions

This article demonstrates a User Defined Function (UDF) that counts how many times a string exists in a cell value based on a regex pattern.

Regex stands for regular expression, it is a pattern containing specific characters to search for substrings in strings. The patterns are at first glance hard to understand but with a little bit of practice very useful.

Imagine you want to extract phone numbers from a text file or a worksheet, however, the issue here is that there is also text combined with phone numbers and that makes it hard to filter out. Here a regular expression is coming to the rescue.

Looking for 555-412395 in a cell range is easy in excel but looking for phone numbers with 3 digits then a hyphen and then 6 more digits is a lot harder. Each digit can be anything between 0 and 9.

You will find a guide later in this post to most of the characters you can use in regular expressions and how to solve the problem with phone numbers.

1. What do I need to do to use regular expressions?

Nothing, you don't need to reference anything in VB Editor \ Tools for this to work. I will now demonstrate a simple UDF that counts matching strings in a cell range.

Back to top

2. User Defined Function - CountStringsCellRange

'Name User Defined Function (UDF) and parameters
Function CountStringsCellRange(c As Range, pttrn As String)

'Save value from cell range c to variable Rng
Rng = c.Value

'Save regex pattern to variable regexpattern
regexpattern = pttrn

'With ... End With statement
With CreateObject("vbscript.regexp")
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = regexpattern

'Check if number of cells in variable c is larger than 1
If c.Cells.CountLarge < 1 Then

    'For ... Next statement - based on the number of rows in array variable Rng 
    For rr = LBound(Rng, 1) To UBound(Rng, 1)

        'For ... Next statement - based on the number of columns in array variable Rng 
        For cc = LBound(Rng, 2) To UBound(Rng, 2)

            'Save found matches to Results
            Set Results = .Run(Rng(rr, cc))

            'Save the number of matches to variable i plus i
            i = i + Results.Count
        Next cc
    Next rr

    'Return number stored in variable i to worksheet
    CountStringsCellRange = i

'Go here if number of cells in variable c is smaller than or equal to 1

    'Save found matches to Results
    Set Results = .Run(Rng)

    'Save the number of matches to variable i
    CountStringsCellRange = Results.Count
End If
End With
End Function

Back to top

3. Where to put the VBA code?

  1. Copy UDF above.
  2. Start/Open Excel.
  3. Go to VB Editor (Alt+F11).
  4. Press with left mouse button on "Insert" on the menu at the top.
  5. Press with left mouse button on "Module".
  6. Paste code to the code module.
Note, save the workbook with file extension *.xlsm to keep the code attached.

Back to top

4. How do I start building regular expressions?

I will use the UDF above to show you how to work with patterns using specific characters. It is important to know which characters you can use and what they do.

4.1 Search is case sensitive

AbC - matches this exact string. A upper case letter A, a smaller case letter b and a upper case letter C

Back to top

4.2 A range of letters

A hyphen allows you to search for a range of characters or numbers. [] brackets allow you to search for exactly one character.

[A-Z] finds a single upper case letter between A to Z.

[A-Z][a-z] matches a single uppercase letter between A to Z and then a lower case letter between a to z.

Back to top

4.3 Match at least one or more characters

A+ matches a single A and multiple A's.

A+ matches A and AA and AAA but not B, three matches in total displayed in cell C9 in picture above.

Back to top

4.4 Match 0 or one characters before it

A? matches an empty string or one A.

Back to top

4.5 Match 0 or more characters

A* matches an empty string or one or more A.

Back to top

4.6 Match any character except newline

. matches any character

A.A matches AbA and ACA

Back to top

4.7 Match multiple characters

Curly brackets allow you to match multiple characters.

[A-Z]{4} is the same as [A-Z][A-Z][A-Z][A-Z] but easier to write. [A-Z]{4} matches 4 upper case letters between A-Z.

ABCD is the only match in string ABCD abcd aBcD to pattern [A-Z]{4}.

Back to top

4.8 NOT operator

^ allows you to exclude certain characters

[^A] matches B, C and D because they are not equal to A.

Back to top

4.9 OR operator

| is an OR operator.

A|C matches A and C in string ABCD.

Back to top

4.10 Escape characters

\ (backslash) escapes characters.

If you are looking for question marks in a string the backslash allows you to escape certain characters that would otherwise have different functionality.

Back to top

4.11 Match start and end of string

^ matches the start of a string.

$ matches the end of a string.

^ABC matches the three first characters ABC DCE

DCE$ matches the three last characters in string ABC DCE

BC$ does not match ABC DCE because it is not at the end of the string

Back to top

4.12 Group patterns

Parentheses allows you to group characters

W(AB){2} is the same as WABAB

Back to top

5. What about phone numbers?

In the beginning of this post I said that regular expressions are good for matching phone numbers.

The phone number I am looking for begins with three digits 0 to 9, the pattern becomes [0-9]{3}.

Next is a hyphen and then 6 more digits between 0 to 9, the pattern becomes -[0-9]{6} and combined [0-9]{3}-[0-9]{6}.

Back to top