Filter unique distinct words from a cell range in excel (udf)
Overview
This blog post describes how to create a list of unique distinct words from a cell range. Unique distinct words are all words but duplicate words are only listed once.
Cell range A2:A14 contains words, see picture.
Rick Rothstein (MVP - Excel) helped me out here with a powerful user defined function (udf).
Array formula in cell B2:B23
=ListOfWords($A$2:$A$18, TRUE) + CTRL + SHIFT + ENTER
Select all the cells to be filled, then type the above formula into the Formula Bar and press CTRL+SHIFT+ENTER
Array formula in cell C2:C23
=ListOfWords($A$2:$A$18, FALSE) + CTRL + SHIFT + ENTER
Select all the cells to be filled, then type the above formula into the Formula Bar and press CTRL+SHIFT+ENTER
User defined function
Instructions
- You can select far more cells to load the formulas in than are required by the list. The empty text string will be displayed for cells not having an entry.
- You can specify a larger range than the there are filled in cells as the argument to these macros to allow for future entries in the column.
- You can specify whether the listing is to be case sensitive or not via the optional second argument with the default value being FALSE, meaning duplicated entries with different casing like One, one, ONE, onE, etc.. will all be treated as if they were the same word with the same spelling. If you pass TRUE for that optional second argument, then those words would all be treated as if they were different words.
- For all the "Case Insensitive" listing, the words are listed in Proper Case (first letter upper case, remaining letters lower case). The reason being if you had One, one and ONE then there is not reason to prefer one version over another, so I solved the problem by using Proper Case throughout.
VBA Code:
Function ListOfWords(Rng As Range, Optional CaseSensitive As Boolean) As Variant
Dim X As Long, Index As Long, List As String, Words() As String, LoW As Variant
With WorksheetFunction
Words = Split(.Trim(Replace(Join(.Transpose(Rng)), Chr(160), " ")))
LoW = Split(Space(.Max(UBound(Words), Application.Caller.Count) + 1))
For X = 0 To UBound(Words)
If InStr(1, Chr(1) & List & Chr(1), Chr(1) & Words(X) & Chr(1), 1 - Abs(CaseSensitive)) = 0 Then
List = List & Chr(1) & Words(X)
If CaseSensitive Then
LoW(Index) = Words(X)
Else
LoW(Index) = StrConv(Words(X), vbProperCase)
End If
Index = Index + 1
End If
Next
ListOfWords = .Transpose(LoW)
End With
End FunctionHow to implement user defined function in excel
- Press Alt-F11 to open visual basic editor
- Click Module on the Insert menu
- Copy and paste the above user defined function
- Exit visual basic editor
- Select a sheet
- Select a cell range
- Type =ListOfWords($A$2:$A$18, TRUE) into formula bar and press CTRL+SHIFT+ENTER
Download excel example file
Unique distinct words.xls
(Excel 97-2003 Workbook *.xls)
Download Rick Rothstein´s excel example file
Word Listing Code.xls
(Excel 97-2003 Workbook *.xls)
Many thanks to Rick Rothstein (Mvp - Excel)!!









January 5th, 2012 at 9:04 pm
Hi,
I followed all these steps and ended up with the first word of the first cell of the selected column. Any suggestions to extract all of the unique words? The column I want to extract unique words from has approx. 225,000 rows.
Thanks,
Diana
January 10th, 2012 at 10:47 am
Diana Bubser,
I am using the example in this blog post.
How to create an array formula.
1. Select cell range B2:B14
2. Type =ListOfWords($A$2:$A$18, FALSE)
3. Press and hold Ctrl + Shift
4. Press Enter once
5. Release all kyes
January 31st, 2012 at 3:40 pm
Greetings,
Are there known limitations to the size of the data in the column that ListofWords can handle? The script works fine when I use it on small columns, but when using it on larger columns, I get #VALUE! errors in my output array. Specifically, Excel reports "A value used in the formula is of the wrong data type." errors when I increase the number of characters past a pre-determined size.
What's odd is that ListofWords runs successfully up until a point. And then if I add an additional character to my list--either in a new row or to an existing word already on my list--the #VALUE! occurs.
I can successfully filter a 3361-row column that consists of 4464 words, 849 of which are unique, so I don't think I'm testing the limits of the Excel memory. I also tested the VBA code in multiple versions of Excel on different computers, all with the same result. Is there a way of tweaking the VBA code to avoid these constraints?
Thanks,
Tom
February 3rd, 2012 at 10:35 am
Tom,
Download *.xls file
Tom.xls