Compare two columns and extract differences
This article demonstrates a formula that extracts values that exist only in one column out of two columns.
There are text values in column B and column C.
Update!
Excel 365 formula in cell E3:
Excel 365 formula in cell F3:
The formulas above are entered like regular formulas. They contain the SEQUENCE function that older Excel versions are missing.
Copy cell E3 and F3 and paste to cells below as far as needed.
Array formula for older Excel versions
The array formula in cell E3 extracts values existing only in column B, compared to column C:
The array formula in cell F3 extracts values existing only in column C, compared to column B:
How to enter array formula in cell E3
- Copy above array formula (Ctrl + c).
- Select cell E3.
- Click in the formula bar.
- Paste array formula (Ctrl + v) to the formula bar.
- Press and hold CTRL + SHIFT simultaneously.
- Press Enter once.
- Release all keys.
The formula is now an array formula. See the curly brackets, they tell you it is an array formula. Don't enter the curly brackets yourself, they appear if you enter it correctly, like this:
How to copy array formula
- Select cell E3.
- Copy (Ctrl + c).
- Select cell range E4:E8.
- Paste (Ctrl + v).
Explaining array formula in cell E3
I recommend the "Evaluate Formula" tool when you want to understand, troubleshoot or examine a specific formula.
Select the cell containing the formula you want to evaluate. Go to tab "Formulas" on the ribbon, click the "Evaluate Formula" button, see image above.
A dialog box appears, it shows the formula and the button "Evaluate" below the formula allows you to go through the formula calculations step by step.
Step 1 - Count values in column C based on values in column B
The COUNTIF function lets you count values based on a condition, however, it is also possible to use multiple conditions but then the function returns an array of values instead of a single value.
This is what makes the formula an array formula. Here are the arguments in the COUNTIF function:
COUNTIF(range, criteria)
COUNTIF($C$3:$C$11, $B$3:$B$15)
becomes
COUNTIF({"BB"; "DD"; "EE"; "HH"; "II"; "JJ"; "KK"; "VV"; "PP"}, $B$3:$B$15)
becomes
COUNTIF({"BB"; "DD"; "EE"; "HH"; "II"; "JJ"; "KK"; "VV"; "PP"}, {"AA"; "CC"; "DD"; "EE"; "GG"; "HH"; "II"; "JJ"; "KK"; "MM"; "NN"; "OO"; "PP"})
and returns the following array of values:
{0; 0; 1; 1; 0; 1; 1; 1; 1; 0; 0; 0; 1}
The position of each value in the array is very important, they make it possible to identify and extract the values we want. The position of each value in the array corresponds to the value in column B, see image above.
A 0 (zero) means that the value in column B is not found in column C. 1 is that the value in column B is found once in column C.
Step 2 - Check if they are equal to 0 (zero)
The equal sign checks if the values are equal to 0 (zero) and returns the boolean values TRUE or FALSE.
COUNTIF($C$3:$C$11, $B$3:$B$15)=0
becomes
{0; 0; 1; 1; 0; 1; 1; 1; 1; 0; 0; 0; 1}=0
and returns
{TRUE; TRUE; FALSE; FALSE; TRUE; FALSE; FALSE; FALSE; FALSE; TRUE; TRUE; TRUE; FALSE}
Step 3 - If they are equal to zero, return the corresponding relative row number
The IF function allows you to return a specific value if the logical test is TRUE and another value if FALSE.
IF(logical_test, [value_if_true], [value_if_false])
IF(COUNTIF($C$3:$C$11, $B$3:$B$15)=0, MATCH(ROW($B$3:$B$15), ROW($B$3:$B$15)), "")
becomes
IF({TRUE; TRUE; FALSE; FALSE; TRUE; FALSE; FALSE; FALSE; FALSE; TRUE; TRUE; TRUE; FALSE}, MATCH(ROW($B$3:$B$15), ROW($B$3:$B$15)), "")
The MATCH and ROW functions create an array from 1 to 11 which we then will use to extract the correct value from cell range B3:B15.
MATCH(ROW($B$3:$B$15), ROW($B$3:$B$15))
becomes
MATCH({3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15}, {3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15})
and returns
{1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13}.
Step 4 - Return the k-th smallest row number
The SMALL function returns the k-th smallest number from an array or cell range.
SMALL(IF(COUNTIF($C$3:$C$11, $B$3:$B$15)=0, MATCH(ROW($B$3:$B$15), ROW($B$3:$B$15)), ""), ROWS($A$1:A1))
beomes
SMALL({1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13}, ROWS($A$1:A1))
The ROWS function counts the number of rows in a given cell reference. The cell ref in this example expands when you copy the cell and paste to cells below. This makes the SMALL function return a new number in each cell.
SMALL({1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13}, ROWS($A$1:A1))
becomes
SMALL({1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13}, 1)
and returns 1.
Step 5 - Return value
The INDEX function returns a value or multiple values based on a row and/or column number.
INDEX($B$3:$B$15, SMALL(IF(COUNTIF($C$3:$C$11, $B$3:$B$15)=0, MATCH(ROW($B$3:$B$15), ROW($B$3:$B$15)), ""), ROWS($A$1:A1)))
becomes
INDEX($B$3:$B$15, 1)
and returns "AA" in cell E3.
Recommended articles
- How to Compare Two Columns in Excel (for matches & differences)
- How to compare two columns in Excel for matches and differences
- Compare Two Columns
If you are looking for comparing two cell ranges, read this article:
Filter values existing in range 1 but not in range 2 using array formula
Do you want to compare text values in two cell ranges, read this article:
Filter text values existing in range 1 but not in range 2 using array formula
I have also written an article about comparing records between two data tables:
Compare two lists of data: Filter records existing in only one list
Extract shared values between two columns
Question: How can I compare two columns to find values that exists in both cell ranges? The picture above shows […]
Filter common values from three separate columns
Array formula in B15: =INDEX($B$3:$B$12, MATCH(0, COUNTIF($B$14:B14, $B$3:$B$12)+IF(((COUNTIF($D$3:$D$11, $B$3:$B$12)>0)+(COUNTIF($F$3:$F$12, $B$3:$B$12)>0))=2, 0, 1), 0)) Copy cell B15 and paste it to […]
What values are missing in List 1 that exists i List 2?
Question: How to filter out data from List 1 that is missing in list 2? Answer: This formula is useful […]
Automate Excel: Update list with new values
Overview Updating a list using copy/paste is a boring task. This blog article describes how to update values in a price […]
Filter shared records from two tables
I will in this blog post demonstrate a formula that extracts common records (shared records) from two data sets in […]
Sean asks: How would you figure out an unique list where the sum of in one column doesn't match the […]
Filter values that exists in all three columns
This article explains how to extract values that exist in three different columns, they must occur in each of the […]
Compare two columns and return differences
The image above demonstrates an array formula in cell B11 that extracts values that only exist in List 1 (B3:B7) […]
How to highlight differences in price lists
Today I am going to show you how to quickly compare two tables using Conditional Formatting (CF). I am going […]
3 Responses to “Compare two columns and extract differences”
Leave a Reply
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
Hi Oscar,
I started with the solution provided here for obtaining values existing only in one of two lists. I know that, for an ordered done job, one should tend use excel in a 'database-like' fashion, with columns as field and rows for data, and so I do.
Anyway, it happened that I had the necessity to have two lists of data to compare,but they spread horizontally. I also read your solutions for filtering values existing in different ranges, but since I was in a hurry,I adapted the formulas provided here, and wanted to share my solution.Here is the two alternative formulas that do the job in a 'column fashion':
Let's say we have two list to compare in ranges G1:V1 and G2:V2 respectively. In the result's range, I put the formula:
={INDEX($G$1:$V$1;;SMALL(IF(COUNTIF($G$2:$V$2;$G$1:$V$1)=0;MATCH(COLUMN($G$1:$V$1);COLUMN($G$1:$V$1));"");COLUMN(A1)))}
or, alternatively (thanks to another solution found here):
={INDEX($G$1:$V$1; SMALL(IF(ISERROR(MATCH($G$1:$V$1; $G$2:$V$2; 0)); (COLUMN($G$1:$V$1)-MIN(COLUMN($G$1:$V$1))+1); ""); COLUMN(A$1:A$65536)))} . I noticed that, if I use the same size for all three ranges (lists and results), I end up with having some zeroes padding the 2nd result range (Missing data in List 1), whether I use vertical or horizontal lists.
as you can see from the image I provide here:
https://s12.postimg.org/jp0p9y6st/Filter_values_existing_in_column_1_but_not_in_co.jpg
I am wondering how those zeroes appear ?
I uploaded the example excel file.
Bruno,
You are comparing 4 blank cells ($G$2:$V$2) with the values in cell range $G$1:$V$1. Since there are no blank cells the formula returns the blank cells. The INDEX function then returns 0.
Try this formula in cell G2:
=INDEX($G$2:$O$2, , SMALL(IF(COUNTIF($G$1:$S$1, $G$2:$O$2)=0, MATCH(COLUMN($G$2:$O$2), COLUMN($G$2:$O$2)), ""), COLUMN(A1)))
Hi.
I have excel file:
code bookname language bookcode id
1 book1 en 100
2 book2 fa 101
3 book1 ar 102
4 book3 en 103
5 book2 fa 104
6 book4 az 105
...
i have want to filter by book & language columns and when two columns are exist, value of id column equal is last row value of code column. for example:
book2 is true but book1 is not true. so id book1 = 104
thanks