How to compare two data sets
Table of Contents
1. How to compare two data sets - Excel Table and autofilter
This article demonstrates how to quickly compare two data sets in Excel using a formula and Excel defined Tables. The formula will return TRUE if a record is found in the other data set and FALSE if not.
The image above shows you the first data set: Table1 The image below shows you the second data set: Table2
Excel defined tables has many advantages, one is that you only need to enter a formula in one cell and Excel automatically enters the formula in the remaining cells in the same column.
Another one is that it expands accordingly when new data is added, no need to adjust the cell references. You can also easily filter the data using the arrows in the top row of the Excel table.
How to convert a data set to an Excel defined Table:
Type the following formula in the next column adjacent to the Excel defined Table:
The Excel defined Table will automatically expand shown in the image below and enter the formula in all cells below in that column.
Explaining formula in cell E3
The COUNTIFS function calculates the number of cells across multiple ranges that equals all given conditions. There is a criteria range and a condition forming a pair, our Excel table has three columns so wee need three pairs in order to find matching records in the other table.
COUNTIFS(criteria_range1, criteria1, [criteria_range2, criteria2]…)
Step 1 - Build COUNTIFS function
The first argument is a cell reference to the entire column in table2 and the second argument is a cell reference to the cell on the same row as the formula but in column GivenName. Excel automatically enters these cell references (structured references) if you select the cells with your mouse when you build the formula.
You can also type the table name and column name in brackets, the formula bar will guide you in this process.
COUNTIFS(Table2[GivenName],[@GivenName]
Complete the function by adding the remaining cell references to the COUNTIFS function.
COUNTIFS(Table2[GivenName],[@GivenName],Table2[StreetAddress],[@StreetAddress],Table2[City],[@City])
Step 2 - Logical expression
The COUNTIFS function returns a number representing how many records that match the current record in the other table. A number larger than zero indicates that there is at least one match.
COUNTIFS(Table2[GivenName],[@GivenName],Table2[StreetAddress],[@StreetAddress],Table2[City],[@City])>0
TRUE - At least one match.
FALSE - No match.
Filter records occuring in both tables
- Select sheet 1 (Table1)
- Press with left mouse button on black arrow in column Compare
- Disable FALSE
- Press with left mouse button on OK.
The image below shows a filtered table with records that exists in both tables.
Filter records existing in only one table
- Select a sheet
- Press with left mouse button on black arrow in column Compare
- Enable only FALSE
- Press with left mouse button on OK.
Table1
Table2
2. Filter shared records from two tables
I will in this section demonstrate a formula that extracts common records (shared records) from two data sets in Excel. I have demonstrated how to compare two columns and today I want to show you how to filter records that exists in both tables.
You can also use conditional formatting to highlight shared records. If you are looking for records that exist only in one out of two tables then read this article: Filter records occurring in only one table
The first data set is in worksheet: List 1, see image above. The second data set is in worksheet: List 2, see image below. These two data sets have three columns each that I want to compare. One column has a different header name.
The columns I am going to compare are these:
- List 1 : Description - List 2 : Asset
- List 1 : Cost - List 2 : Cost
- List 1 : Acquisition year - List 2 : Acquisition year
The following picture shows the columns I am going to compare. Keep in mind that records must be exactly the same in both data tables to be filtered, except that letter case may differ.
COUNTIFS is the core in the formula I will construct, it is here all the comparisons will take place. It is important to understand what is going on so you can use this technique to create your own powerful array formulas.
The COUNTIFS function may have up to 255 arguments leaving you to compare up to 127 columns, however, if you have that many columns to compare perhaps a UDF (User defined function) is a better option. Array formulas are often quite slow dealing with lots of data.
COUNTIFS(criteria_range1,criteria1, criteria_range2, criteria2...)
Counts the number of cells specified by a given set of conditions or criteria
Here are the cell ranges that I am going to use in the COUNTIFS function:
List 2 - Aquisition year - 'List 2'!$F$3:$F$14
List 1 - Aquisition year - 'List 1'!$B$3:$B$12
List 2 - Asset - 'List 2'!$C$3:$C$14
List 1 - Description - 'List 1'!$C$3:$C$12
List 2 - Cost - 'List 2'!$E$3:$E$14
List 1 - Cost - 'List 1'!$D$3:$D$12
COUNTIFS('List 2'!$F$3:$F$14,'List 1'!$B$3:$B$12,'List 2'!$C$3:$C$14,'List 1'!$C$3:$C$12,'List 2'!$E$3:$E$14,'List 1'!$D$3:$D$12)
Is it important to begin with the second data set in the first argument? No, you can begin with the first data set if you like, remember to change cell reference in the INDEX function so you fetch the right values.
With the COUNTIFS function complete we can now construct the array formula.
Formulas
The image above shows the third worksheet named: Common records
Excel 365 dynamic array formula in cell B3:
Array formula in cell B3:
How to create an array formula
- Select cell B3
- Press with left mouse button on in formula bar
- Copy the array formula above and paste to formula bar
- Press and hold Ctrl + Shift simultaneously
- Press Enter
- Release all keys
You can check using the formula bar that you did above steps right, Excel tells you if a cell contains an array formula by surrounding the formula with a beginning and ending curly brackets, like this: {=array_formula}.
Don't enter these characters yourself they show up automatically if you did above steps correctly.
Copy cell B3 and paste it to the right as far as needed. Copy cell B3:D3 and paste down as far as needed.
Explaining the array formula in cell B3
You can easily examine a formula (or array formula) that you don't understand, select the cell containing the formula. Go to tab "Formulas", press with left mouse button on "Evaluate Formula".
The "Evaluate" button above lets you go to the next "calculation" step.
Step 1 - Find common records
COUNTIFS('List 2'!$F$3:$F$14,'List 1'!$B$3:$B$12,'List 2'!$C$3:$C$14,'List 1'!$C$3:$C$12,'List 2'!$E$3:$E$14,'List 1'!$D$3:$D$12)
returns this array: {1, 1, 1, 0, 1, 1, 1, 1, 0, 0}
The array is shown in column F below, the first 1 in the array means that the three corresponding values on row 3 (B3, C3 and D3) have a match somewhere on the other data table, on the same row.
Recommended articles
Checks multiple conditions against the same number of cell ranges and counts how many times all criteria are met.
Step 2 - Return corresponding relative row numbers
IF(COUNTIFS('List 2'!$F$3:$F$14,'List 1'!$B$3:$B$12,'List 2'!$C$3:$C$14,'List 1'!$C$3:$C$12,'List 2'!$E$3:$E$14,'List 1'!$D$3:$D$12),MATCH(ROW('List 1'!$B$3:$B$12),ROW('List 1'!$B$3:$B$12)),"")
returns {1,2,3,"",5,6,7,"",""}
The image below shows you relative row numbers for records that exist on the other data table.
Now it is really easy for the INDEX function to fetch the values it need, first I need to make this array formula return a single number. The SMALL function helps me with that.
Recommended articles
Checks if a logical expression is met. Returns a specific value if TRUE and another specific value if FALSE.
Step 3 - Return the k-th smallest value
I don't want the SMALL function to extract the same value in every cell, I want it to change so a new row number is extracted in the cell below. This is repeated in every new cell below until all values have been extracted.
SMALL({1,2,3,"",5,6,7,"",""}, ROWS($A$1:A1))
returns 1.
The ROWS function return the number of rows a certain cell range has. If that cell range expands every time I copy the formula to new cells below, the ROWS function will return the old number + 1.
In cell B3 ROWS($A$1:A1) returns 1. In cell B4 it changes to ROWS($A$1:A2) and returns 2. Read more about absolute and relative cell references here.
Why use the ROWS function and not ROW? If you insert new rows above the formula, the ROW function will return the wrong value because the relative cell reference changed. It will change in the ROWS function as well but so will also the absolute cell reference.
Recommended articles
The SMALL function lets you extract a number in a cell range based on how small it is compared to the other numbers in the group.
Step 4 - Return a value of the cell at the intersection of a particular row and column
INDEX('List 1'!$B$3:$D$12,SMALL(IF(COUNTIFS('List 2'!$F$3:$F$14,'List 1'!$B$3:$B$12,'List 2'!$C$3:$C$14,'List 1'!$C$3:$C$12,'List 2'!$E$3:$E$14,'List 1'!$D$3:$D$12),MATCH(ROW('List 1'!$B$3:$B$12),ROW('List 1'!$B$3:$B$12)),""),ROWS($A$1:A1)),COLUMNS($A$1:A1))
returns 1997 in cell B3.
Recommended articles
Gets a value in a specific cell range based on a row and column number.
Table category
This article demonstrates different ways to reference an Excel defined Table in a drop-down list and Conditional Formatting. There are […]
This article demonstrates two formulas that extract distinct values from a filtered Excel Table, one formula for Excel 365 subscribers […]
The image above demonstrates a macro linked to a button. Press with left mouse button on the button and the […]
How to use Excel Tables
5 Responses to “How to compare two data sets”
Leave a Reply
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
Contact Oscar
You can contact me through this contact form
Oscar,
As usual, another great article. Just wanted to say thanks for all the awesome articles you publish. I always learn something from your work--it is appreciated.
Michael Pennington,
Thank you for commenting!!
More complicated if we have 3 columns to compare the duplicates, you have the sample cases or formula for 3 column to retrieve duplicates?
Thanks
Rizky,
The example in this post compares 3 columns. Can you explain in greater detail?
Im sorry not inform clear to you, I mean compare 3 list, and is it possible retrieve common records with criteria? For example I want to retrieve common records between 2 or more list with criteria year 1997?