Count matching strings using regular expressions
This article demonstrates a User Defined Function (UDF) that counts how many times a string exists in a cell value based on a regex pattern.
Regex stands for regular expression, it is a pattern containing specific characters to search for substrings in strings. The patterns are at first glance hard to understand but with a little bit of practice very useful.
Imagine you want to extract phone numbers from a text file or a worksheet, however, the issue here is that there is also text combined with phone numbers and that makes it hard to filter out. Here a regular expression is coming to the rescue.
Looking for 555-412395 in a cell range is easy in excel but looking for phone numbers with 3 digits then a hyphen and then 6 more digits is a lot harder. Each digit can be anything between 0 and 9.
You will find a guide later in this post to most of the characters you can use in regular expressions and how to solve the problem with phone numbers.
Table of Contents
1. What do I need to do to use regular expressions?
Nothing, you don't need to reference anything in VB Editor \ Tools for this to work. I will now demonstrate a simple UDF that counts matching strings in a cell range.
2. User Defined Function - CountStringsCellRange
'Name User Defined Function (UDF) and parameters Function CountStringsCellRange(c As Range, pttrn As String) 'Save value from cell range c to variable Rng Rng = c.Value 'Save regex pattern to variable regexpattern regexpattern = pttrn 'With ... End With statement With CreateObject("vbscript.regexp") .Global = True .MultiLine = True .IgnoreCase = False .Pattern = regexpattern 'Check if number of cells in variable c is larger than 1 If c.Cells.CountLarge < 1 Then 'For ... Next statement - based on the number of rows in array variable Rng For rr = LBound(Rng, 1) To UBound(Rng, 1) 'For ... Next statement - based on the number of columns in array variable Rng For cc = LBound(Rng, 2) To UBound(Rng, 2) 'Save found matches to Results Set Results = .Run(Rng(rr, cc)) 'Save the number of matches to variable i plus i i = i + Results.Count Next cc Next rr 'Return number stored in variable i to worksheet CountStringsCellRange = i 'Go here if number of cells in variable c is smaller than or equal to 1 Else 'Save found matches to Results Set Results = .Run(Rng) 'Save the number of matches to variable i CountStringsCellRange = Results.Count End If End With End Function
3. Where to put the VBA code?
- Copy UDF above.
- Start/Open Excel.
- Go to VB Editor (Alt+F11).
- Press with left mouse button on "Insert" on the menu at the top.
- Press with left mouse button on "Module".
- Paste code to the code module.
4. How do I start building regular expressions?
I will use the UDF above to show you how to work with patterns using specific characters. It is important to know which characters you can use and what they do.
4.1 Search is case sensitive
AbC - matches this exact string. A upper case letter A, a smaller case letter b and a upper case letter C
4.2 A range of letters
A hyphen allows you to search for a range of characters or numbers. [] brackets allow you to search for exactly one character.
[A-Z] finds a single upper case letter between A to Z.
[A-Z][a-z] matches a single uppercase letter between A to Z and then a lower case letter between a to z.
4.3 Match at least one or more characters
A+ matches a single A and multiple A's.
A+ matches A and AA and AAA but not B, three matches in total displayed in cell C9 in picture above.
4.4 Match 0 or one characters before it
A? matches an empty string or one A.
4.5 Match 0 or more characters
A* matches an empty string or one or more A.
4.6 Match any character except newline
. matches any character
A.A matches AbA and ACA
4.7 Match multiple characters
Curly brackets allow you to match multiple characters.
[A-Z]{4} is the same as [A-Z][A-Z][A-Z][A-Z] but easier to write. [A-Z]{4} matches 4 upper case letters between A-Z.
ABCD is the only match in string ABCD abcd aBcD to pattern [A-Z]{4}.
4.8 NOT operator
^ allows you to exclude certain characters
[^A] matches B, C and D because they are not equal to A.
4.9 OR operator
| is an OR operator.
A|C matches A and C in string ABCD.
4.10 Escape characters
\ (backslash) escapes characters.
If you are looking for question marks in a string the backslash allows you to escape certain characters that would otherwise have different functionality.
4.11 Match start and end of string
^ matches the start of a string.
$ matches the end of a string.
^ABC matches the three first characters ABC DCE
DCE$ matches the three last characters in string ABC DCE
BC$ does not match ABC DCE because it is not at the end of the string
4.12 Group patterns
Parentheses allows you to group characters
W(AB){2} is the same as WABAB
5. What about phone numbers?
In the beginning of this post I said that regular expressions are good for matching phone numbers.
The phone number I am looking for begins with three digits 0 to 9, the pattern becomes [0-9]{3}.
Next is a hyphen and then 6 more digits between 0 to 9, the pattern becomes -[0-9]{6} and combined [0-9]{3}-[0-9]{6}.
Regular expressions category
This article demonstrates a User Defined Function that allows you to extract cell references from a formula in a given […]
Picture above shows cheap chinese IoT modules: a temp sensor and an esp8266-01 (25 mm x 15 mm) Thingspeak […]
The LIKE operator allows you to match a string to a pattern using Excel VBA. The image above demonstrates a […]
Excel categories
One Response to “Count matching strings using regular expressions”
Leave a Reply
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
[…] https://www.get-digital-help.com/2017/05/24/count-matching-strings-using-regular-expressions/ […]