Excel udf: Remove duplicates from a large dataset
I have a sheet with 3000 rows of invoice dates that are out of order. There could be a maximum of 6 dates and a minimum of 1 in each row. The invoice dates are in row A1:F1 and I would like the ordered, unique distinct list to be in G1:L1, and also display a blank once the data runs out. How do I alter your formula to accomplish this?
Answer:
Using the array formula in post: How to extract a unique distinct list from a column in excel is too slow, in my opinon.
Advanced filter does not remove the blanks or work with a cell range containing multiple columns.
I found some useful vba code here: http://www.j-walk.com/ss/excel/tips/tip47.htm and created this simple udf:
Public Function UniqueValues(rng As Variant) As Variant Dim Test As New Collection Dim Value As Variant Dim Item As Variant Dim temp() As Variant ReDim temp(0) rng = rng.Value On Error Resume Next For Each Value In rng If Len(Value) > 0 Then Test.Add Value, CStr(Value) Next Value On Error GoTo 0 For Each Item In Test temp(UBound(temp)) = Item ReDim Preserve temp(UBound(temp) + 1) Next Item UniqueValues = Application.Transpose(temp) End Function
This udf doesn´t sort the values or display blanks once the data runs out.
How to sort the values (Excel 2007):
- Select values
- Copy values (Ctrl + C)
- Right click on an empty column and select "Paste Special.."
- Click "Values"
- Click OK!
- Click "Data" tab on the ribbon
- Click "Sort smallest to largest" button
How to implement user defined function in excel
- Press Alt-F11 to open visual basic editor
- Click Module on the Insert menu
- Copy and paste the above user defined function
- Exit visual basic editor
- Select a sheet
- Select a cell range
- Type =UniqueValues($A$1:$F$3000) into formula bar and press CTRL+SHIFT+ENTER
Download excel sample file for this tutorial.
unique distinct dates large data set.xls (1,1 MB)
(Excel 97-2003 Workbook *.xls)
Related posts:
Excel udf:Filter duplicates in a large dataset in excel
Excel udf: Count unique distinct values in a large dataset
Count unique distinct values in a large dataset with a date criterion


















If you slightly adjust the second loop as below you can get rid of the last entry to temp() which always ends up empty and UDF prints it to spreadsheet as 0.
Dim i As Integer
For i = 1 To Test.Count
temp(UBound(temp)) = Test(i)
If i < Test.Count Then ReDim Preserve temp(UBound(temp) + 1)
Next i
also, following should allow you to pad any value you like to "left-over" cells in the range function is entered to. If you do not provide the second parameter then it should behave just the same (pad #N/As)
Public Function UniqueValues(rng As Variant, Optional strEmptyCell As Variant = CVErr(xlErrNA)) As Variant
...
Dim iRows As Integer ' number of rows in the formula entered area
...
iRows = Range(Application.Caller.Address).Rows.Count
...
For i = Test.Count To iRows
ReDim Preserve temp(UBound(temp) + 1)
temp(UBound(temp)) = strEmptyCell
Next i
This loop needs to go as last bit just before:
UniqueValues = Application.Transpose(temp)
Jan,
Thanks for your contribution!
#1 Another option is: ReDim Preserve temp(UBound(temp) - 1 after my last loop.
#2 Thank you for a valuable comment!