## Remove duplicates from a large dataset [UDF]

*Article updated on January 12, 2018*

I have a sheet with 3000 rows of invoice dates that are out of order. There could be a maximum of 6 dates and a minimum of 1 in each row. The invoice dates are in row A1:F1 and I would like the ordered, unique distinct list to be in G1:L1, and also display a blank once the data runs out. How do I alter your formula to accomplish this?

**Answer:**

Using the array formula in post: How to extract a unique distinct list from a column in excel is too slow, in my opinon.

Advanced filter does not remove the blanks or work with a cell range containing multiple columns.

I found some useful vba code here: http://www.j-walk.com/ss/excel/tips/tip47.htm and created this simple udf:

Public Function UniqueValues(rng As Variant) As Variant Dim Test As New Collection Dim Value As Variant Dim Item As Variant Dim temp() As Variant ReDim temp(0) rng = rng.Value On Error Resume Next For Each Value In rng If Len(Value) > 0 Then Test.Add Value, CStr(Value) Next Value On Error GoTo 0 For Each Item In Test temp(UBound(temp)) = Item ReDim Preserve temp(UBound(temp) + 1) Next Item UniqueValues = Application.Transpose(temp) End Function

This udf doesn´t sort the values or display blanks once the data runs out.

**How to sort the values (Excel 2007):**

- Select values
- Copy values (Ctrl + C)
- Right click on an empty column and select "Paste Special.."
- Click "Values"
- Click OK!
- Click "Data" tab on the ribbon
- Click "Sort smallest to largest" button

**How to implement user defined function in excel**

- Press Alt-F11 to open visual basic editor
- Click Module on the Insert menu
- Copy and paste the above user defined function
- Exit visual basic editor
- Select a sheet
- Select a cell range
- Type =UniqueValues($A$1:$F$3000) into formula bar and press CTRL+SHIFT+ENTER

### Download excel sample file for this tutorial.

unique distinct dates large data set.xls (1,1 MB)

(Excel 97-2003 Workbook *.xls)

Extract a list of duplicates from three columns combined using array formula in excel

Question: I have three ranges or lists and I need to extract duplicates from those ranges combined using excel array […]Extract unique distinct values from a filtered table [udf and array formula]

Robert Jr asks: Oscar, I am using the VBA code & FilterUniqueSort array to generate unique lists that drive Selection […]### 3 Responses to “Remove duplicates from a large dataset [UDF]”

### Leave a Reply

**How to add a formula to your comment:**

<code>your formula</code>

Remember to convert less than and larger than signs to html character entities before you post your comment.

**How to add VBA code to your comment:**

[vb 1="vbnet" language=","]

VBA code

[/vb]

**How to add a picture to your comment:**

Upload picture to postimage.org

Add picture link to comment.

**Contact Oscar**

You can contact me through this webpage

If you slightly adjust the second loop as below you can get rid of the last entry to temp() which always ends up empty and UDF prints it to spreadsheet as 0.

Dim i As Integer

For i = 1 To Test.Count

temp(UBound(temp)) = Test(i)

If i < Test.Count Then ReDim Preserve temp(UBound(temp) + 1)

Next i

also, following should allow you to pad any value you like to "left-over" cells in the range function is entered to. If you do not provide the second parameter then it should behave just the same (pad #N/As)

Public Function UniqueValues(rng As Variant, Optional strEmptyCell As Variant = CVErr(xlErrNA)) As Variant

...

Dim iRows As Integer ' number of rows in the formula entered area

...

iRows = Range(Application.Caller.Address).Rows.Count

...

For i = Test.Count To iRows

ReDim Preserve temp(UBound(temp) + 1)

temp(UBound(temp)) = strEmptyCell

Next i

This loop needs to go as last bit just before:

UniqueValues = Application.Transpose(temp)

Jan,

Thanks for your contribution!

#1 Another option is: ReDim Preserve temp(UBound(temp) - 1 after my last loop.

#2 Thank you for a valuable comment!