Extract unique distinct values from a relational table
In this post, I am going to show you how to extract unique distinct values and duplicates using a formula, from a relational table. The image above shows a search value in cell B19, it is used by the formula to look for matches in A2:A16.
The formula finds a match in cell A16 and uses the corresponding value in cell B16 to do a second search in D2:D16, the two tables share values defined in B2:B16 and in D2:D16.
This makes it possible to perform searches across data sets, the value in cell B16 is found in cells D2, D8, D12 and D14. The corresponding cells in E2:E16 are E2, E8, E12 and E14, however, cell E2 and E8 contains duplicate values and the formula returns only one instance of those two values in cell range B21:B23.
In a previous post I described how to do lookups in a related table.
Table of Contents
What is a relational table?
In a relational database (Microsoft Access), the data in one table is related to the data in other tables. In general, tables can be related in one of three different ways: one-to-one, one-to-many or many-to-many. The relationship is used to cross-reference information between tables. Source: University of Sussex
Unique distinct values
Array formula in cell B21:
To enter an array formula you copy above formula and paste to a cell. Press and hold CTRL and Shift simultaneously, then press Enter once. Release all keys.
The formula is now surrounded with curly brackets, don't enter these characters yourself, they appear automatically. {=formula}
Copy cell B21 and paste to cells below or simply select cell B21 and press and hold with left mouse button on the black dot located at the bottom right corner of the cell. Then drag with mouse down to cells below as far as needed, release left mouse button.
Explaining formula in cell B21
The "Evaluate formula" feature lets you examine a formula calculation in more detail, simply select cell B21 and then go to tab "Formulas" on the ribbon.
Press with mouse on "Evaluate formula" button to open the Evaluate Formula dialog box, the "Evaluate" button located on the dialog box takes you through the calculation step by step which is great if you want to troubleshoot or simply understand a formula.
(The image above does not show the actual formula used in this example.)
Step 1 - Find values equal to search value
The IF function has three arguments, IF(logical_test, [value_if_true], [value_if_false]). The first argument contains a logical expression, it returns an array containing values TRUE or FALSE.
The second argument will run if the logical expression returns TRUE and the third will run if the value is FALSE.
IF($B$19=Table1[Item], Table1[Company], "")
becomes
IF("Biscuit joiner"=Table1[Item], Table1[Company], "")
becomes
IF("Biscuit joiner"={"Drill driver"; "Angle grinder"; "Jig saw"; "Gas frame nailer"; "Table saw"; "Router"; "Mitre saw"; "Jointing machine"; "Random orbit sander"; "Planer"; "Combi drill"; "Impact wrench"; "Circlar saw"; "Rotary hammer drill"; "Biscuit joiner"}, Table1[Company], "")
becomes
IF("Biscuit joiner"={"Drill driver"; "Angle grinder"; "Jig saw"; "Gas frame nailer"; "Table saw"; "Router"; "Mitre saw"; "Jointing machine"; "Random orbit sander"; "Planer"; "Combi drill"; "Impact wrench"; "Circlar saw"; "Rotary hammer drill"; "Biscuit joiner"}, {"Kumatsu Motors"; "Roboto Industries"; "Galaxy Corp"; "Galaxy Corp"; "Galaxy Corp"; "Roboto Industries"; "Kumatsu Motors"; "Roboto Industries"; "Roboto Industries"; "Kumatsu Motors"; "Kumatsu Motors"; "Kumatsu Motors"; "Roboto Industries"; "Kumatsu Motors"; "Galaxy Corp"}, "")
and returns
{"";"";"";"";"";"";"";"";"";"";"";"";"";"";"Galaxy Corp"}
Step 2 - Find relational values that corresponds to search value
The MATCH function returns the relative position of an item in an array or cell range that matches a specified value in a specific order.
MATCH(Table2[Company], IF($B$19=Table1[Item], Table1[Company], ""), 0)
becomes
MATCH(Table2[Company], {"";"";"";"";"";"";"";"";"";"";"";"";"";"";"Galaxy Corp"}, 0)
becomes
MATCH({"Galaxy Corp"; "Kumatsu Motors"; "Kumatsu Motors"; "Roboto Industries"; "Roboto Industries"; "Kumatsu Motors"; "Galaxy Corp"; "Kumatsu Motors"; "Roboto Industries"; "Roboto Industries"; "Galaxy Corp"; "Kumatsu Motors"; "Galaxy Corp"; "Kumatsu Motors"; "Roboto Industries"}, {"";"";"";"";"";"";"";"";"";"";"";"";"";"";"Galaxy Corp"}, 0)
and returns
{15; #N/A; #N/A; #N/A; #N/A; #N/A; 15; #N/A; #N/A; #N/A; 15; #N/A; 15; #N/A; #N/A}
This array shows us where the matches are in the column. 15 is in first, seventh, eleventh and thirteenth position.
Step 3 - Identify errors
The formula errors out if we don't get rid of the error values, the ISERROR function returns TRUE if the value is an error and FALSE if not.
ISERROR(MATCH(Table2[Company], IF($B$19=Table1[Item], Table1[Company], ""), 0))
becomes
ISERROR({15; #N/A; #N/A; #N/A; #N/A; #N/A; 15; #N/A; #N/A; #N/A; 15; #N/A; 15; #N/A; #N/A})
and returns
{FALSE; TRUE; TRUE; TRUE; TRUE; TRUE; FALSE; TRUE; TRUE; TRUE; FALSE; TRUE; FALSE; TRUE; TRUE}
step 4 - Avoid duplicate values
The COUNTIF function has two arguments COUNTIF(range, criteria), the first argument has an expanding cell reference that grows when the cell is copied to cells below. This makes it aware of previously displayed values and we can now avoid duplicates.
COUNTIF($B$20:B20,Table2[Sales persons])
becomes
COUNTIF("",Table2[Sales persons])
and returns
{0;0;0;0;0;0;0;0;0;0;0;0;0;0;0}
This array tells us that no values have been shown above cell B21.
Step 5 - Add arrays (OR logic)
By adding the two arrays we apply OR logic meaning that TRUE + TRUE = TRUE. FALSE + TRUE = TRUE. TRUE + FALSE = TRUE and FALSE + FALSE = FALSE.
Excel automatically converts boolean values to their numerical equivalents when the four basic arithmetic operations are preformed. Addition, subtraction, multiplication, and division.
(ISERROR(MATCH(Table2[Company], IF($B$19=Table1[Item], Table1[Company], ""), 0)))+COUNTIF($B$20:B20,Table2[Sales persons])
becomes
{FALSE; TRUE; TRUE; TRUE; TRUE; TRUE; FALSE; TRUE; TRUE; TRUE; FALSE; TRUE; FALSE; TRUE; TRUE} + {0;0;0;0;0;0;0;0;0;0;0;0;0;0;0}
and returns
{0;1;1;1;1;1;0;1;1;1;0;1;0;1;1}
Step 6 - Replace boolean values with row numbers and blanks
IF((ISERROR(MATCH(Table2[Company], IF($B$19=Table1[Item], Table1[Company], ""), 0)))+COUNTIF($B$20:B20,Table2[Sales persons]), "", MATCH(ROW(Table2[Company]), ROW(Table2[Company])))
becomes
IF({0;1;1;1;1;1;0;1;1;1;0;1;0;1;1}, "", MATCH(ROW(Table2[Company]), ROW(Table2[Company])))
becomes
IF({0;1;1;1;1;1;0;1;1;1;0;1;0;1;1}, "", {1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15})
and returns
{1;"";"";"";"";"";7;"";"";"";11;"";13;"";""}
Step 7 - Extract smallest row number
The SMALL function returns the smallest row number in the array.
SMALL(IF((ISERROR(MATCH(Table2[Company], IF($B$19=Table1[Item], Table1[Company], ""), 0)))+COUNTIF($B$20:B20,Table2[Sales persons]), "", MATCH(ROW(Table2[Company]), ROW(Table2[Company]))), 1)
becomes
SMALL({1;"";"";"";"";"";7;"";"";"";11;"";13;"";""}, 1)
and returns 1.
Step 8 - Return value from Table2[Sales persons]
The INDEx function returns a value from an array or cell range based on a row and column number.
INDEX(Table2[Sales persons], SMALL(IF((ISERROR(MATCH(Table2[Company], IF($B$19=Table1[Item], Table1[Company], ""), 0)))+COUNTIF($B$20:B20,Table2[Sales persons]), "", MATCH(ROW(Table2[Company]), ROW(Table2[Company]))), 1))
becomes
INDEX(Table2[Sales persons], 1)
becomes
INDEX({"Cyril Strong"; "Lewis Moya"; "Kason Leonard"; "Triston Rigsby"; "Jaden Bonds"; "Kason Leonard"; "Cyril Strong"; "Lewis Moya"; "Hija Hanna"; "Simeon Fernandes"; "Dana Leak"; "Corydon Keck"; "Doloris Hawley"; "Corydon Keck"; "Hija Hanna"}, 1)
and returns "Cyril Strong" in cell B21.
Step 9 - Remove error values
The IFERROR function removes errors that will show up when the formula runs out of values to display.
IFERROR(INDEX(Table2[Sales persons], SMALL(IF((ISERROR(MATCH(Table2[Company], IF($B$19=Table1[Item], Table1[Company], ""), 0)))+COUNTIF($B$20:B20,Table2[Sales persons]), "", MATCH(ROW(Table2[Company]), ROW(Table2[Company]))), 1)), "")
Duplicate values
Array formula in cell B21:
How to create an array formula
- Copy above array formula
- Select cell B21
- Press with left mouse button on in formula bar
- Paste (Ctrl + v)
- Press and hold Ctrl + Shift
- Press Enter
- Release all keys
The formula in the formula bar is now surrounded by curly brackets: {=array_formula}
How to copy an array formula
- Select cell B21
- Copy cell (Ctrl + c)
- Select cell range B22:B23
- Paste (Ctrl + v)
Related tables category
I will in this article demonstrate four formulas that do lookups, extract unique distinct and duplicate values and sums numbers […]
This article demonstrates how to merge two relational data sets before creating a Pivot table. A Pivot Table is limited […]
Excel 2010 has a PowerPivot feature and DAX formulas that let you work with multiple tables of data. You can […]
Excel categories
10 Responses to “Extract unique distinct values from a relational table”
Leave a Reply
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
Contact Oscar
You can contact me through this contact form
in excel sheet
i need to search three different word in eight sheet in particular cell and paste if present other wise blank
Example EE2203A,EE2204B,ME2201C
if present of any one paste EE2204B, other wise blank
karthikeyan,
can you explain in greater detail?
Hi Oscar,
You helped me out in the past, and I'm trying to understand the =Index(Match functions in an Array formula.
I have a range of data in 2 different columns on Sheet 2.
The ranges are almost identical, so much so that a simple "If" statement can show me differences.
. If they are equal display 1 else dispay 0. I don't want to filter as there is other data records in the sheet that are used in a vlookup formula.
So what would be the easiest way to have my first sheet "Sheet 1(A2)" find the "0" cells on Sheet 2 and display the values which are offset by 1 row?
In other words, for each two records on Sheet 2 that don't match, create a list of the 2nd record on "Sheet 1" with no blanks in between them.
Thanks in advance!
cwrbelis
cwrbelis,
Check the attached sheet:
cwrbelis.xlsx
Hi Oscar,
This is a question specific to your other article "How to extract a unique distinct list from a column in excel," but there wasn't a section to leave a reply, so I will ask it here.
I'm using your vba code for a user-defined function to extract unique distinct sorted values, but I want to use it in a table. As you probably know, multi-cell array formulas are not allowed in tables, so is there a way around this?
Thanks!
Hi Oscar,
I used your formula above, i have a question, it is possible to have a two search criteria?
Julius,
Yes it is.
Array formula in cell B21:
=IFERROR(INDEX(Table2[Sales persons], SMALL(IF((ISERROR(MATCH(Table2[Company], IF(COUNTIF($B$19:$C$19, Table1[Item]), Table1[Company], ""), 0)))+COUNTIF($B$20:B20,Table2[Sales persons]), "", MATCH(ROW(Table2[Company]), ROW(Table2[Company]))), 1)), "")
Hi Oscar,
I have two different tables, some row in both tables have same data, i want to extract unique rows data by comparing both tables... How can I do this...?
Hello Kamran Mumtaz
Great question, see this article:
https://www.get-digital-help.com/2018/06/11/extract-unique-distinct-records-from-two-data-sets/
If I have a Matrix 2x10 in the following way
1 2 3 4 5 6 7 8 9 10
11 12 13 14 5 16 17 18 19 20
and want two arrange in the following way:
1 2
3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
19 20
I have a large data file 8000x10 and the numbers are pair the first two number in first row will become first 2 numbers in first row , third and four in row 1 will be first and second in row 2, etc.