Label groups of duplicate records
Michael asks:
I need to identify the duplicates based on the Columns D:H and put in Column C a small “d” + a running number for the duplicates which are duplicated the same.
If no duplicates only to put in a “d0” for instance.
Answer
The following array formula assigns unique records the label "d0" and duplicate records the same label
Array formula in cell B3:
To enter an array formula, type the formula in a cell then press and hold CTRL + SHIFT simultaneously, now press Enter once. Release all keys.
The formula bar now shows the formula with a beginning and ending curly bracket telling you that you entered the formula successfully. Don't enter the curly brackets yourself.
How to copy array formula
- Select cell B3.
- Copy cell (Ctrl + c).
- Select cell range B4:B11.
- Paste to cell range (Ctrl + v).
Explaining formula in cell B3
Step 1 - Identify if record is unique
The COUNTIFS function allows you to count cells based on multiple conditions.
COUNTIFS($D$3:$D$11, D3, $E$3:$E$11, E3, F$3:$F$11, F3, $G$3:$G$11, G3, $H$3:$H$11, H3)=1
becomes
COUNTIFS({"Europe"; "Europe"; "Europe"; "Europe"; "Europe"; "Europe"; "Europe"; "Europe"; "Europe"},"Europe",{"Spain"; "Spain"; "Spain"; "Italy"; "Italy"; "Italy"; "Italy"; "Italy"; "Spain"},"Spain",{"Apples"; "Apples"; "Tomato"; "Tomato"; "Tomato"; "Cucumber"; "Carrot"; "Cucumber"; "Tomato"},"Apples",{"Green"; "Green"; "Red"; "Green"; "Red"; "Green"; "Red"; "Green"; "Red"},"Green",{10; 10; 10; 10; 10; 10; 10; 10; 10},10)=1
becomes
2=1
and returns the boolean value FALSE. It means that the record is not unique, in other words, there is at least one other duplicate record.
Step 2 - Return "d0" if record is unique
The IF function has three arguments, the first one must be a logical expression. If the expression evaluates to TRUE then one thing happens (argument 2) and if FALSE another thing happens (argument 3).
The previous step returns TRUE or FALSE an we are going to use that in our IF function.
IF(COUNTIFS($D$3:$D$11, D3, $E$3:$E$11, E3, F$3:$F$11, F3, $G$3:$G$11, G3, $H$3:$H$11, H3)=1, "d0", remaining_formula)
becomes
IF(FALSE, "d0", remaining_formula)
and the calculation continues in the last IF argument.
Step 3 - Is record the first of all duplicate records?
COUNTIFS($D$3:D3, D3, $E$3:E3, E3, $F$3:F3, F3, $G$3:G3, G3)=1
becomes
1=1
and returns TRUE.
Step 4 - Calculate number if record is first record
IF(COUNTIFS($D$3:D3, D3, $E$3:E3, E3, $F$3:F3, F3, $G$3:G3, G3)=1, "d"&(SUM(1/COUNTIFS($D$3:D3, $D$3:D3, $E$3:E3, $E$3:E3, F$3:F3, $F$3:F3, $G$3:G3, $G$3:G3, $H$3:H3, $H$3:H3))-SUM(IF(COUNTIFS($D$3:$D$11, $D$3:D3, $E$3:$E$11, $E$3:E3, F$3:$F$11, $F$3:F3, $G$3:$G$11, $G$3:G3, $H$3:$H$11, $H$3:H3)=1, 1, 0))), remaining_formula)
becomes
IF(TRUE, "d"&(SUM(1/COUNTIFS($D$3:D3, $D$3:D3, $E$3:E3, $E$3:E3, F$3:F3, $F$3:F3, $G$3:G3, $G$3:G3, $H$3:H3, $H$3:H3))-SUM(IF(COUNTIFS($D$3:$D$11, $D$3:D3, $E$3:$E$11, $E$3:E3, F$3:$F$11, $F$3:F3, $G$3:$G$11, $G$3:G3, $H$3:$H$11, $H$3:H3)=1, 1, 0))), remaining_formula)
becomes
IF(TRUE, "d"&(SUM(1/1)-SUM(IF(COUNTIFS($D$3:$D$11, $D$3:D3, $E$3:$E$11, $E$3:E3, F$3:$F$11, $F$3:F3, $G$3:$G$11, $G$3:G3, $H$3:$H$11, $H$3:H3)=1, 1, 0))), remaining_formula)
becomes
IF(TRUE, "d"&(SUM(1/1)-SUM(IF(2=1, 1, 0))), remaining_formula)
becomes
IF(TRUE, "d"&(SUM(1/1)-SUM(IF(FALSE, 1, 0))), remaining_formula)
becomes
IF(TRUE, "d"&(1-0), remaining_formula)
becomes
IF(TRUE, "d"&1, remaining_formula)
and returns "d1" in cell B3.
Step 5 - Get value from the first duplicate record
This part of the formula is rund if the record is NOT the first duplicate of a specific record in the list, it finds the first record and returns that value.
In cell B4 the formula changes to the following line because of absolute and relative cell references.
INDEX($B$3:B4, MIN(IF(COUNTIFS(D4, $D$3:D4, E4, $E$3:E4, F4, $F$3:F4, G4, $G$3:G4, H4, $H$3:H4), MATCH(ROW($D$3:D4), ROW($D$3:D4)), ""))))
becomes
INDEX($B$3:B4, MIN(IF({1;1}, MATCH(ROW($D$3:D4), ROW($D$3:D4)), ""))))
becomes
INDEX($B$3:B4, MIN(IF({1;1}, {1; 2}, ""))))
becomes
INDEX($B$3:B4, MIN({1; 2}))
becomes
INDEX($B$3:B4, 1)
becomes
INDEX({"d1";"d1"}, 1)
and returns "d1" in cell B4.
The INDEX function returns a value based on a cell reference and column/row numbers.
Duplicate values category
The array formula in cell C2 extracts duplicate values from column A. Only one duplicate of each value is displayed […]
This webpage demonstrates formulas that merge three non-adjacent cell ranges. What's on this webpage Extract a list of duplicates from […]
This article explains how to extract values that exist in three different columns, they must occur in each of the […]
Table of Contents How to find the largest duplicate number How to find the largest duplicate number - Excel 365 […]
The image above demonstrates a formula in cell E3 that extracts duplicate items if they are on the same date. […]
The following regular formula extracts duplicates from column B (List1) and column D (List2) combined, the result is shown in […]
This article demonstrates formulas that extract duplicate values from a column, however, column D contains values that you don't want […]
The following array formula extracts duplicate values sorted from A to Z from cell range B3:B21. Excel array formula in […]
Array formula in D2: =INDEX($A$2:$A$21, MATCH(SMALL(IF(COUNTIF($B$2:$B$21, $B$2:$B$21)>1, COUNTIF($A$2:$A$21, "<"&$A$2:$A$21), ""),ROWS($A$1:A1)), COUNTIF($A$2:$A$21, "<"&$A$2:$A$21), 0)) Array formula in E2: =INDEX($B$2:$B$21, MATCH(SMALL(IF(COUNTIF($B$2:$B$21, $B$2:$B$21)>1, […]
The following image shows you a data set in columns B and C. The formula in cell E2 extracts a […]
The array formula in cell B10 extracts duplicate values from cell range B2:D4 if they contain string specified in cell […]
The array formula in cell B10 extracts duplicate values from cell range B2:D4 if they begin with the condition specified […]
This article describes two formulas that extract duplicates from a multi-column cell range, the first one is built for Excel […]
This article demonstrates formulas and Excel tools that extract duplicates based on three conditions. The first and second condition is […]
Question: How do I filter duplicates with a condition? Answer: Column B contains category and column C contains Items. Only […]
Functions in this article
More than 1300 Excel formulas
Excel formula categories
Excel categories
3 Responses to “Label groups of duplicate records”
Leave a Reply
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
Hello Mr. Oscar,
I modified your formula according to my need. I have ~800 row so it takes too much to calculate. Is there a way to speed it up ?
Dear
Thanks for the function! I tried it on m excel and it seems to work. Yet I do not understand how the sum part works. How does it give an other value than one? Or how does it know it can't be one? In my excel it gives negative numbers back as well. They are still grouped but with more illogival denominators.
Kind regards
Josse
Josse,
How does it give an other value than one?
The COUNTIFS functions use relative and absolute cell references that grow when the cell is copied to cells below. In other words, the formula checks only the current row and rows above.
In my excel it gives negative numbers back as well. They are still grouped but with more illogival denominators.
Make sure the cell references contains both absolute and relative cell references.