Overview

This article describes how to create an array formula to filter duplicate values in a column.

extract-a-list-of-duplicates-from-a-column

Excel 2007 array formula in C2:

=INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0)) + CTRL + SHIFT + ENTER.

Copy cell C2 and paste it down as far as needed.

Remove #num errors:

=IFERROR(INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0)), "") + CTRL + SHIFT + ENTER.

Copy cell C2 and paste it down to D20.

Earlier Excel versions, array formula in C2:

=IF(ISERROR(INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0))), "", INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0)))

How this array formula works

Step 1 - Show a duplicate value only once

=INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0))

COUNTIF(C1:$C$1, List1) contains both a relative and absolute reference (C1:$C$1) to a range.

When you copy a cell reference like this the cell reference expands.

COUNTIF(C1:$C$1, List1) returns this array in cell C2: (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

Step 2 - Filter values in List1 having duplicates

=INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0))

COUNTIF(List1, List1) counts the number of cells within a range that meet the given condition. The returning array is (2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1).

COUNTIF(List1, List1)>1 returns this array: (TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE).

IF(COUNTIF(List1, List1)>1, 0, 1) returns (1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0).

Step 3 - Calculate arrays combined

=INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0))

COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1)

(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) + (1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0) equals

(1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0)

Step 4 - Identify duplicates

=INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0))

MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0) becomes

MATCH(0, (1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0), 0) is 2.

Match returns the relative position of an item in an array that matches a specified value

Step 5 - Return duplicates

=INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0))

becomes

=INDEX(List1, 2) is Federer, Roger.

Index returns a value or reference of the cell at the intersection of a particular row and column, in a given range

Final thoughts

When formula in c2 is copied to c3 the refence changes.

Example

The formula in c2: =INDEX(List1, MATCH(0, COUNTIF(C1:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0))

Then copy the formula to C3.

The formula references changes: =INDEX(List1, MATCH(0, COUNTIF(C2:$C$1, List1)+IF(COUNTIF(List1, List1)>1, 0, 1), 0))

Read more about absolute and relative cell references.

This makes it possible to avoid previous cell values (C2) and only calculate remaining values.

Named ranges

List1 (A2:A20)
What is named ranges?

How to implement array formula to your workbook

Change named ranges. If your duplicates list starts at, for example, F3. Change C1:$C$1 in the above formulas to F2:$F$2.

Download excel sample file for this tutorial.
how-to-extract-a-list-of-duplicates-from-a-column-in-excel.xlsx
(Excel 2007 Workbook *.xlsx)

Functions in this article:

MATCH(lookup_value;lookup_array; [match_type]
Returns the relative position of an item in an array that matches a specified value

INDEX(array,row_num,[column_num])
Returns a value or reference of the cell at the intersection of a particular row and column, in a given range

COUNTIF(range,criteria)
Counts the number of cells within a range that meet the given condition

IFERROR(value;value_if_error) Returns value_if_error if expression is an error and the value of the expression itself otherwise

This blog article is one out of six articles on the same subject.