Filter strings containing a given substring in a cell range
This post describes ways to extract all matching strings from cells in a given cell range if they contain a specific string. The first one is an Excel 365 LAMBDA function and the second one is a User Defined Function (UDF).
The strings in a cell are separated by a delimiter that you can choose, the examples here use a space character to split a cell value. This is done to all cells specified in the argument. Strings that contain a specific sub-string are extracted and returned, in this example a @ character is used.
This article also demonstrates how to filter unique distinct strings using a user defined function and an Excel 365 formula, with a blank or space character to split cell values.
What is the difference between a cell value and a string in these examples described above?
A cell value may contain multiple strings, it depends on the delimiting characters. For example, consider this string "Hello world". If you use a space character as a delimiting character the strings are "Hello" and "world", however, if the delimiting character is "l" then the strings are "He", "o wor", and "d".
Table of contents
1. Filter words containing a given string in a cell range - Excel 365 LAMBDA function
The cells in the specified range contain strings separated by a delimiter of your choice, with space being used as an example here to split strings in a cell value. This operation is performed on all cells mentioned in the argument.
The objective is to extract and return strings that contain a specific substring, denoted by the "@" character in this example.
Excel 365 LAMBDA function in cell B3:
Change @ in the formula above to whatever search string you want to find. Also change " " in TEXTSPLIT(b,," ") to change the delimiter if you don't want to use the space character. Perhaps you want to split strings by a comma, semicolon or a new row. Use char(10) to find new rows.
Explaining formula
Step 1 -
2. Filter words containing a given string in a cell range - UDF
The image above demonstrates a User Defined Function that extracts all words containing a given string that you can specify. In this case it is a @ sign. A User Defined Function is a custom function that you can build yourself in the visual basic editor.
Example, cell range B1:M50 contains random sentences, I have inserted some random emails in this range, see image above.
Array formula in cell range B3:B7:
To enter an array formula, type the formula in a cell then press and hold CTRL + SHIFT simultaneously, now press Enter once. Release all keys.
The formula bar now shows the formula with a beginning and ending curly bracket telling you that you entered the formula successfully. Don't enter the curly brackets yourself.
VBA code
'Name function Function FilterWords(rng As Range, str As String) As Variant() 'Declare variables Dim x As Variant, Wrds() As Variant, Cells_row As Long Dim Cells_col As Long, Words As Long, y() As Variant 'Redimension variable ReDim y(0) 'Save values in range to array variable Wrds = rng.Value 'Iterate through array variable For Cells_row = LBound(Wrds, 1) To UBound(Wrds, 1) For Cells_col = LBound(Wrds, 2) To UBound(Wrds, 2) 'Extract words in cell to an array x = Split(Wrds(Cells_row, Cells_col)) 'Iterate through word array For Words = LBound(x) To UBound(x) 'Check if value in array is equal to the given string If InStr(x(Words), str) Then 'Save value to another array y(UBound(y)) = x(Words) 'Increase containers in array by 1 ReDim Preserve y(UBound(y) + 1) End If Next Words Next Cells_col Next Cells_row 'Decrease containers in array by 1 ReDim Preserve y(UBound(y) - 1) 'Return array FilterWords = Application.Transpose(y) End Function
Where to do I copy the code?
- Press Alt-F11 to open visual basic editor
- Press with left mouse button on Module on the Insert menu
- Copy and paste the user defined function to module
- Exit visual basic editor
3. Filter unique distinct strings from a cell range - Excel 365
Excel 365 dynamic array formula in cell D3:
Explaining formula in cell D3
Step 1 - Split cell values based on a delimiting space character
The TEXTSPLIT function splits a string into an array based on delimiting values.
Function syntax: TEXTSPLIT(Input_Text, col_delimiter, [row_delimiter], [Ignore_Empty])
TEXTSPLIT(x, , " ", 1)
Step 2 - Stack arrays vertically
The VSTACK function combines cell ranges or arrays. Joins data to the first blank cell at the bottom of a cell range or array (vertical stacking)
Function syntax: VSTACK(array1,[array2],...)
VSTACK(TEXTSPLIT(x, , " ", 1), TEXTSPLIT(y,, " ", 1))
Step 3 - A LAMBDA function is required with the REDUCE function
The LAMBDA function build custom functions without VBA, macros or javascript.
Function syntax: LAMBDA([parameter1, parameter2, …,] calculation)
LAMBDA(x, y, VSTACK(TEXTSPLIT(x, , " ", 1), TEXTSPLIT(y,, " ", 1)))
Step 4 - Send cell values to LAMBDA function
The REDUCE function shrinks an array to an accumulated value, a LAMBDA function is needed to properly accumulate each value in order to return a total.
Function syntax: REDUCE([initial_value], array, lambda(accumulator, value))
REDUCE(, B3:B15, LAMBDA(x, y, VSTACK(TEXTSPLIT(x, , " ", 1), TEXTSPLIT(y,, " ", 1))))
Step 5 - Extract unique distinct strings
The UNIQUE function returns a unique or unique distinct list.
Function syntax: UNIQUE(array,[by_col],[exactly_once])
UNIQUE(REDUCE(, B3:B15, LAMBDA(x, y, VSTACK(TEXTSPLIT(x, , " ", 1), TEXTSPLIT(y,, " ", 1)))))
4. Filter unique distinct strings from a cell range - UDF
This section describes how to create a list of unique distinct strings from a cell range. Unique distinct words are all strings but duplicate strings are only listed once.
Cell range A2:A14 contains values, see the picture above. Values are split into strings using the space character as a delimiter. For example, the value in cell A2 is "3M - Asia". The value becomes the following strings: "3M", "-", and "Asia". Three strings in total.
Now consider all values in cell range A2:A14, some strings will be duplicates and some not.
Rick Rothstein (MVP - Excel) helped me out here with a powerful user defined function (udf).
Array formula in cell B2:B23
=ListOfWords($A$2:$A$18, TRUE) + CTRL + SHIFT + ENTER
Select all the cells to be filled, then type the above formula into the Formula Bar and press CTRL+SHIFT+ENTER
Array formula in cell C2:C23
=ListOfWords($A$2:$A$18, FALSE) + CTRL + SHIFT + ENTER
Select all the cells to be filled, then type the above formula into the Formula Bar and press CTRL+SHIFT+ENTER
User defined function
Instructions
- You can select far more cells to load the formulas in than are required by the list. The empty text string will be displayed for cells not having an entry.
- You can specify a larger range than the there are filled in cells as the argument to these macros to allow for future entries in the column.
- You can specify whether the listing is to be case sensitive or not via the optional second argument with the default value being FALSE, meaning duplicated entries with different casing like One, one, ONE, onE, etc.. will all be treated as if they were the same word with the same spelling. If you pass TRUE for that optional second argument, then those words would all be treated as if they were different words.
- For all the "Case Insensitive" listing, the words are listed in Proper Case (first letter upper case, remaining letters lower case). The reason being if you had One, one and ONE then there is not reason to prefer one version over another, so I solved the problem by using Proper Case throughout.
VBA Code:
Function ListOfWords(Rng As Range, Optional CaseSensitive As Boolean) As Variant Dim X As Long, Index As Long, List As String, Words() As String, LoW As Variant With WorksheetFunction Words = Split(.Trim(Replace(Join(.Transpose(Rng)), Chr(160), " "))) LoW = Split(Space(.Max(UBound(Words), Application.Caller.Count) + 1)) For X = 0 To UBound(Words) If InStr(1, Chr(1) & List & Chr(1), Chr(1) & Words(X) & Chr(1), 1 - Abs(CaseSensitive)) = 0 Then List = List & Chr(1) & Words(X) If CaseSensitive Then LoW(Index) = Words(X) Else LoW(Index) = StrConv(Words(X), vbProperCase) End If Index = Index + 1 End If Next ListOfWords = .Transpose(LoW) End With End Function
How to copy above code to your workbook
- Press Alt-F11 to open visual basic editor
- Press with left mouse button on Module on the Insert menu
- Copy and paste above user defined function to code module
- Exit visual basic editor
- Select a sheet
- Select a cell range
- Type =ListOfWords($A$2:$A$18, TRUE) into formula bar and press CTRL+SHIFT+ENTER
Get Rick Rothstein´s excel example file
Many thanks to Rick Rothstein (Mvp - Excel)!!
Filter emails category
Question: How to extract email addresses from this sheet? Answer: It depends on how the emails are populated in your worksheet? […]
Excel categories
14 Responses to “Filter strings containing a given substring in a cell range”
Leave a Reply
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
Contact Oscar
You can contact me through this contact form
I notice that your UDF requires spaces to delimit the email address. For example, if the email address is surrounded by parentheses or adjacent to punctuation marks, those will remain attached to the returned email addresses. This is because you used a general word parser as the basis for your UDF. I had a robust email address parser function that I wrote awhile ago, so I wrote a front-end function (the UDF) that repeatedly calls it as needed; doing this makes the UDF return only the email addresses no matter what other delimiting characters surround them. Just copy the following two functions into a standard Module, then select a column of cells (more than you think email addresses exist), enter this formula in the Formula Bar (change the range as necessary)...
=FindEmailAddresses(B1:M50)
and then press CTRL+SHIFT+ENTER to commit the array formula. By the way, the GetEmailAddress function below can be used as a stand-alone function by itself... it returns a single email address (the first it finds in the text passed to it). Okay, here are the functions...
Function FindEmailAddresses(Rng As Range) As Variant()
Dim Temp As String, Cell As Range, EM() As Variant
ReDim EM(0)
For Each Cell In Rng
Temp = Cell.Value
Do While InStr(Temp, "@")
EM(UBound(EM)) = GetEmailAddress(Temp)
Temp = Replace(Temp, "@", "", 1, 1)
ReDim Preserve EM(UBound(EM) + 1)
Loop
Next
ReDim Preserve EM(UBound(EM) - 1)
FindEmailAddresses = WorksheetFunction.Transpose(EM)
End Function
Function GetEmailAddress(ByVal S As String) As String
Dim X As Long, AtSign As Long
Dim Locale As String, DomainPart As String
Locale = "[A-Za-z0-9.!#$%&'*/=?^_`{|}~+-]"
Domain = "[A-Za-z0-9._-]"
AtSign = InStr(S, "@")
For X = AtSign To 1 Step -1
If Not Mid(" " & S, X, 1) Like Locale Then
S = Mid(S, X)
If Left(S, 1) = "." Then S = Mid(S, 2)
Exit For
End If
Next
AtSign = InStr(S, "@")
For X = AtSign + 1 To Len(S) + 1
If Not Mid(S & " ", X, 1) Like Domain Then
S = Left(S, X - 1)
If Right(S, 1) = "." Then S = Left(S, Len(S) - 1)
GetEmailAddress = S
Exit For
End If
Next
End Function
Rick Rothstein (MVP - Excel),
Thank you for your valuable comment! I tried your functions and they work as you described!
Hi,
I followed all these steps and ended up with the first word of the first cell of the selected column. Any suggestions to extract all of the unique words? The column I want to extract unique words from has approx. 225,000 rows.
Thanks,
Diana
Diana Bubser,
I am using the example in this blog post.
How to create an array formula.
1. Select cell range B2:B14
2. Type =ListOfWords($A$2:$A$18, FALSE)
3. Press and hold Ctrl + Shift
4. Press Enter once
5. Release all kyes
Greetings,
Are there known limitations to the size of the data in the column that ListofWords can handle? The script works fine when I use it on small columns, but when using it on larger columns, I get #VALUE! errors in my output array. Specifically, Excel reports "A value used in the formula is of the wrong data type." errors when I increase the number of characters past a pre-determined size.
What's odd is that ListofWords runs successfully up until a point. And then if I add an additional character to my list--either in a new row or to an existing word already on my list--the #VALUE! occurs.
I can successfully filter a 3361-row column that consists of 4464 words, 849 of which are unique, so I don't think I'm testing the limits of the Excel memory. I also tested the VBA code in multiple versions of Excel on different computers, all with the same result. Is there a way of tweaking the VBA code to avoid these constraints?
Thanks,
Tom
Tom,
Get the Excel *.xls file
Tom.xls
Excuse me, but are there any limitations towards special characters? (-/_=). I have several cells that contain words "merged together", for example, "Asia-Pacific", and I'm getting the #VALUE! response Tom is mentioning in his post.
These are the great UDF's
Thank you so much
I was trying to modify for the same to get the unique list by cells ( not by words)
Can you please help me out
Chinna,
https://www.get-digital-help.com/how-to-extract-a-unique-list-and-the-duplicates-in-excel-from-one-column/#vba
I'm having the same problem with the limitation, after a certain amount of words everything turns to #VALUE. Has anyone found a solution to this?
Jose,
Transpose function has a limit of 65536 array values.
https://stackoverflow.com/questions/20055784/best-workaround-for-vba-transpose-array-length-limit
I receive an "Ambiguous name detected: ListOfWords" error in VB Any suggestions?
Thomas,
You have two "user defined functions" with the same name in your workbook.
Hello,
I am constantly getting #NAME? this error when I want to use the formula.
WHy is this happening?