Extract duplicate records
This article describes how to filter duplicate rows with the use of a formula. It is, in fact, an array formula which is demonstrated below. The function that makes this all possible is the COUNTIFS function, introduced in Excel 2007.
The COUNTIFS function evaluates criteria to cells across multiple ranges and counts the number of times all criteria are met.
Array formula in cell A30:
Copy cell A30 and paste it to the right as far as needed. Then copy cells and paste them down as far as needed.
How this formula works in cell A30
Step 1 - Find duplicates
=INDEX($A$2:$D$25, SMALL(IF(COUNTIFS($A$2:$A$25, $A$2:$A$25, $B$2:$B$25, $B$2:$B$25, $C$2:$C$25, $C$2:$C$25, $D$2:$D$25, $D$2:$D$25)>1, ROW($A$2:$A$25)-MIN(ROW($A$2:$A$25))+1), ROW(A1)), COLUMN(A1)
COUNTIFS($A$2:$A$25, $A$2:$A$25, $B$2:$B$25, $B$2:$B$25, $C$2:$C$25, $C$2:$C$25, $D$2:$D$25, $D$2:$D$25)>1
returns this array:
{False, False, True, False, False, False, True, True, True, True, True, True, True, False, True, True, True, True, True, False, True, True, True, True}
Step 2 - Use array to extract row numbers
IF({False, False, True, False, False, False, True, True, True, True, True, True, True, False, True, True, True, True, True, False, True, True, True, True}, ROW($A$2:$A$25)-MIN(ROW($A$2:$A$25))+1)
becomes
IF({False, False, True, False, False, False, True, True, True, True, True, True, True, False, True, True, True, True, True, False, True, True, True, True}, {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24})
and returns this array:
{False, False, 3, False, False, False, 7, 8, 9, 10, 11, 12, 13, False, 15, 16, 17, 18, 19, False, 21, 22, 23, 24}
Step 3 - Return the k-th smallest value
SMALL(IF(COUNTIFS($A$2:$A$25, $A$2:$A$25, $B$2:$B$25, $B$2:$B$25, $C$2:$C$25, $C$2:$C$25, $D$2:$D$25, $D$2:$D$25)>1, ROW($A$2:$A$25)-MIN(ROW($A$2:$A$25))+1), ROW(A1))
becomes
SMALL({False, False, 3, False, False, False, 7, 8, 9, 10, 11, 12, 13, False, 15, 16, 17, 18, 19, False, 21, 22, 23, 24}, ROW(A1))
and returns 3.
Step 4 - Return a value of the cell at the intersection of a particular row and column
=INDEX($A$2:$D$25, SMALL(IF(COUNTIFS($A$2:$A$25, $A$2:$A$25, $B$2:$B$25, $B$2:$B$25, $C$2:$C$25, $C$2:$C$25, $D$2:$D$25, $D$2:$D$25)>1, ROW($A$2:$A$25)-MIN(ROW($A$2:$A$25))+1), ROW(A1)), COLUMN(A1))
becomes
=INDEX($A$2:$D$25, 3, COLUMN(A1))
becomes
=INDEX($A$2:$D$25, 3, 1) and returns "Sample1"
Final notes
The formula uses relative and absolute cell references.
In cell B30 the array formula changes cell references to:
=INDEX($A$2:$D$25, SMALL(IF(COUNTIFS($A$2:$A$25, $A$2:$A$25, $B$2:$B$25, $B$2:$B$25, $C$2:$C$25, $C$2:$C$25, $D$2:$D$25, $D$2:$D$25)>1, ROW($A$2:$A$25)-MIN(ROW($A$2:$A$25))+1), ROW(A1)), COLUMN(B1))
In cell A31 the formula changes cell references to:
=INDEX($A$2:$D$25, SMALL(IF(COUNTIFS($A$2:$A$25, $A$2:$A$25, $B$2:$B$25, $B$2:$B$25, $C$2:$C$25, $C$2:$C$25, $D$2:$D$25, $D$2:$D$25)>1, ROW($A$2:$A$25)-MIN(ROW($A$2:$A$25))+1), ROW(A2)), COLUMN(A2))
Get excel sample file for this tutorial.
Filter duplicate rows in excel 2007.xlsx
(Excel 2007 Workbook *.xlsx)
Recommended blog post
Automatically filter unique distinct row records
Highlight duplicate rows in excel 2007
Functions used in this formula:
IF(logical_test, [value_if_true], [value_if_false])
Checks whether a condition is met, and returns one value if TRUE, and another value if FALSE
MATCH(lookup_value;lookup_array; [match_type])
Returns the relative position of an item in an array that matches a specified value
INDEX(array,row_num,[column_num])
Returns a value or reference of the cell at the intersection of a particular row and column, in a given range
COUNTIFS(criteria_range1,criteria1, criteria_range2, criteria2...)
Counts the number of cells specified by a given set of conditions or criteria
SMALL(array,k) returns the k-th smallest row number in this data set.
ROW(reference) returns the rownumber of a reference
MIN(number1,[number2])
Returns the smallest number in a set of values. Ignores logical values and text
Functions in this article
More than 1300 Excel formulas
Excel categories
2 Responses to “Extract duplicate records”
Leave a Reply
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
Hi
How would i go about just looking if data matched in just one column ?
so get the row extracted based on matching values in column 3?
richard reeves
Try this:
=INDEX($A$2:$D$25, SMALL(IF(COUNTIF($C$2:$C$25,$C$2:$C$25)>1,ROW($A$2:$A$25)-MIN(ROW($A$2:$A$25))+1),ROW(A1)),COLUMN(A1))
My example has only 10 and 11 in column C so all records are extracted. (10 and 11 are duplicates.)