Count matching strings using regular expressions
A regular expression is a pattern containing specific characters to search for sub-strings in strings. The patterns are at first glance hard to understand but with a little bit of practice very useful.
Imagine you want to extract phone numbers from a text file or a worksheet, however the issue here is that there is also text combined with phone numbers and that makes it hard to filter out. Here a regular expression is coming to rescue.
Looking for 555-412395 in a cell range is easy in excel but looking for phone numbers with 3 digits then a hyphen and then 6 more digits is a lot harder. Each digit can be anything between 0 and 9.
You will find a guide later in this post to most of the characters you can use in regular expressions and how to solve the problem with phone numbers.
What do I need to do to use regular expressions?
Nothing, you don't need to reference anything in VB Editor \ Tools for this to work. I will now demonstrate a simple UDF that counts matching strings in a cell range.
Function CountStringsCellRange(c As Range, pttrn As String) Rng = c.Value regexpattern = pttrn With CreateObject("vbscript.regexp") .Global = True .MultiLine = True .IgnoreCase = False .Pattern = regexpattern If c.Cells.CountLarge > 1 Then For rr = LBound(Rng, 1) To UBound(Rng, 1) For cc = LBound(Rng, 2) To UBound(Rng, 2) Set Results = .Execute(Rng(rr, cc)) i = i + Results.Count Next cc Next rr CountStringsCellRange = i Else Set Results = .Execute(Rng) CountStringsCellRange = Results.Count End If End With End Function
- Copy UDF above
- Start/Open Excel
- Go to VB Editor (Alt+F11)
- Click "Insert" on the menu at the top
- Click "Module"
- Paste code to code module
How do I start building regular expressions?
I will use the UDF above to show you how to work with patterns using specific characters. It is important to know which characters you can use and what they do.
Search is case sensitive
AbC - matches this exact string. A upper case letter A, a smaller case letter b and a upper case letter C
A range of letters
A hyphen allows you to search for a range of characters or numbers. [] brackets allow you to search for exactly one character.
[A-Z] finds a single upper case letter between A to Z.
[A-Z][a-z] matches a single uppercase letter between A to Z and then a lower case letter between a to z.
Match at least one or more characters
A+ matches a single A and multiple A's.
A+ matches A and AA and AAA but not B, three matches in total displayed in cell C9 in picture above.
Match 0 or one character before it
A? matches an empty string or one A.
Match 0 or more characters
A* matches an empty string or one or more A.
Match any character except new line
. matches any character
A.A matches AbA and ACA
Multiple matches
Curly brackets allows you to match multiple characters.
[A-Z]{4} is the same as [A-Z][A-Z][A-Z][A-Z] but easier to write. [A-Z]{4} matches 4 upper case letters between A-Z.
ABCD is the only match in string ABCD abcd aBcD to pattern [A-Z]{4}.
NOT operator
^ allows you to exclude certain characters
[^A] matches B, C and D because they are not equal to A.
OR operator
| is an OR operator.
A|C matches A and C in string ABCD.
Escape characters
\ (backslash) escapes characters.
If you are looking for question marks in a string the backslash allows you to escape certain characters that would otherwise have a different functionality.
Match start and end of string
^ matches the start of a string.
$ matches the end of a string.
^ABC matches the three first characters ABC DCE
DCE$ matches the three last characters in string ABC DCE
BC$ does not match ABC DCE because it is not at the end of the string
Group patterns
Parentheses allows you to group characters
W(AB){2} is the same as WABAB
What about phone numbers?
In the beginning of this post I said that regular expressions are good for matching phone numbers.
The phone number I am looking for begins with three digits 0 to 9, the pattern becomes [0-9]{3}.
Next is a hyphen and then 6 more digits between 0 to 9, the pattern becomes -[0-9]{6} and combined [0-9]{3}-[0-9]{6}.
Download excel *.xlsm file
Count matching substrings using reg exp.xlsm
Extract cell references from a formula
I am trying to build a regular expression that matches cell references in a formula. A regular expression is a […]
Working with the LIKE OPERATOR
The LIKE operator allows you to match a string to a pattern in excel vba. The following characters are specifically […]
Fetching values from ThingSpeak using vba
Thingspeak is web service that allows you to upload and store data from IoT devices. Picture above shows cheap chinese […]
One Response to “Count matching strings using regular expressions”
Leave a Reply
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
[…] https://www.get-digital-help.com/2017/05/24/count-matching-strings-using-regular-expressions/ […]