I have 2 spreadsheets in excel. Spreadsheet 1 contains many columns including ssn and employee ID. Spreadsheet 2 contains different columns and has an ssn column but not the employee id. I wanted create a new column in Spreadsheet 2 that is called employee id and match it with the employee's ssn that it is correlated to in Spreadsheet 1. I feel like I would need to use VLOOKUP but I am not entirely sure. Any help ? Thanks
=VLOOKUP(L2, [spreadhseet1.xlsx]spreadhseet1!$A:$P, 2, FALSE)
Above is the formula that I used. L2 is the column in table2 that contains the employee's ssn. I then took the range of all values in table1 from column A-P. Column 2 in spreadhseet2 contains the employee id which is why I entered 2. Not sure why it is providing an error
VLOOKUP is the correct function.
If you have a source similar to
[A] [B]
1] SSN | empid
---------------------
2] 123 | ABC
3] 456 | DEF
....
99999] 987 | QQQ
^------- index = 2 for return value
^------------------ lookup is in first column
and want to populate destination with "empid":
[a] [b] [c]
1] ssn | empid | col2 | col 3 | ..
---------------------------------------------
2] 123 | | B | C |
3] 456 | | d | e |
^----- formula here
^------------- this value is used as lookup
Then you place in the empty "empid" column a formula
=VLOOKUP(A2, 'Sheetname'!$A2:$B9999 , 2 , false)
where
A2 is your source column. 2 obviously same row.
'Sheetname'!$A2:$B99999 is your data source area, excluding header.
2 is your 1-indexed column to return
False is to return a precise match
Notice that you may be need to handle a case where lookup value does not exist in source data. This will return #N/A and can be handled simply
If you absolutely NEED to do this across files, it's possible, but you risk breaking something if the files are not there. It's explained here.
NOTICE If you are on a non-US version of Excel, functions may separate with ";" and have different names
Related
In Excel 365 I'm using an "IFS" statement to scan through a number of columns to find out if a cell's value is in any of the columns. I believe "IFS" will process all your conditions until it reaches the first one that is "TRUE" then output. However, I'd like to be able to find ALL instances where my condition is true and output or evaluate them all somehow. Is there a way to do this with IFS (or some other method)? I think I'd like to output the matching value for each true condition in a separate row, but anything that could help me see how many matched and/or which column each match is in would be helpful.
The code I have is a bit much to share as my columns are in other workbooks, so I'll just share a close example. This formula would be in a cell that outputs the match, column D below.
A | B | C | D | E
------------------------------------
ColA | Col1 | Col2 | Formula | Notes
------------------------------------
1 | 1 | 2 | 1 | Two matches in same column (Col1)
2 | 1 | 2 | 2 | Two matches in same column (Col2)
3 | 3 | 3 | 3 | Two matches in diff column (Col1 & Col2)
=IFS(
NOT(ISERROR(MATCH(INDIRECT("A"&(ROW())),INDIRECT("B:B"),0))),
INDEX(INDIRECT("B:B"),MATCH(INDIRECT("A"&(ROW())),INDIRECT("B:B"),0)),
NOT(ISERROR(MATCH(INDIRECT("A"&(ROW())),INDIRECT("C:C"),0))),
INDEX(INDIRECT("C:C"),MATCH(INDIRECT("A"&(ROW())),INDIRECT("C:C"),0))
)
Of course the expected output is to dump the matching value of the first condition that's true, but I'd like to output all instances the condition is true in separate rows if possible. Maybe something like this...
A | B | C | D | E
------------------------------------
ColA | Col1 | Col2 | Formula | Notes
------------------------------------
1 | 1 | 2 | 1 | Two matches in same column (Col1)
... | ... | ... | 1 | Two matches in same column (Col1)
2 | 1 | 2 | 2 | Two matches in same column (Col2)
... | ... | ... | 2 | Two matches in same column (Col2)
3 | 3 | 3 | 3 | Two matches in diff column (Col1 & Col2)
... | ... | ... | 3 | Two matches in diff column (Col1 & Col2)
In the above and in my actual case the '...' would display what's in the column of that particular row match, which may vary from one row to another row throughout the worksheets. Basically, column D in the example would be on a separate 'results' sheet with the same amount of columns and column value types as all the 'data' sheets being searched. Furthermore, each column of the 'results' sheet would be a formula scanning that one specific column in all sheets, but only outputting the given column value of the matched row. Something like below...
DATA SHEET
A | B | C
----------------------
FName | LName | Amount
----------------------
John | Doe | 10
Jane | Doe | 4
Jack | Black | 10
RESULTS SHEET
(all cells are formulas)
A | B | C
----------------------
FName | LName | Amount
----------------------
John | Doe | 10 < matching value in C
Jack | Black | 10 < but different A & C
I hope that last part answered any "why" questions. ;)
ADDITION (7/25/19):
Below is the complete formula I'm using on sheets like above, but with more columns. It works well with the exception of my requirement to know where ALL matches occur and not just the first match on the IFS statement. Column "F" is the column I'm matching to output the corresponding value from the column cell on the match's row as found on the data sheets (5 sheets) to the formulated 'results' sheet, as displayed above. The only thing that changes in the formula between cells is the "A:A" to "B:B" etc., including "F:F" (the column with the value to be "MATCHED" from "SOURCES!$B$2"), which I made the last condition in the formula case nothing is found in the other data sheets, pasting its own data in lieu of something like 0, N/A, or FALSE.
=IFS(
NOT(ISERROR(MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$3)&"F:F"),0))),
INDEX(INDIRECT((SOURCES!$B$3)&"A:A"),MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$3)&"F:F"),0)),
NOT(ISERROR(MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$4)&"F:F"),0))),
INDEX(INDIRECT((SOURCES!$B$4)&"A:A"),MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$4)&"F:F"),0)),
NOT(ISERROR(MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$12)&"F:F"),0))),
INDEX(INDIRECT((SOURCES!$B$12)&"A:A"),MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$12)&"F:F"),0)),
NOT(ISERROR(MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$13)&"F:F"),0))),
INDEX(INDIRECT((SOURCES!$B$13)&"A:A"),MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$13)&"F:F"),0)),
NOT(ISERROR(MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$14)&"F:F"),0))),
INDEX(INDIRECT((SOURCES!$B$14)&"A:A"),MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$14)&"F:F"),0)),
NOT(ISERROR(MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$2)&"F:F"),0))),
INDEX(INDIRECT((SOURCES!$B$2)&"A:A"),MATCH(INDIRECT((SOURCES!$B$2)&"F"&(ROW())),INDIRECT((SOURCES!$B$2)&"F:F"),0))
)
My formulated "results" workbook also has a worksheet named "SOURCES" that I use to paste file names to connect all the data sheets corresponding columns.
Btw, I'm using this as a tool to 'un-merge' customer data between profiles in our LIVE site/database after obtaining all the tables and columns the customer key has been found (using SQL) to then compare it (using Excel) to our TEST site so I can pull apart the data that doesn't belong on the 'kept' record from the LIVE merge. In this case there were 3 records merged. Two records have a profile in the TEST site, while the kept record from the LIVE site actually does not have a TEST record, giving me 5 sheets of data to examine.
Suppose your data starting from the range A2:C2
I thing this formula can help you,
Array Formula (Use Ctrl+Shift+Enter)
=INDEX($A2:$C2,MATCH("OK",IF(ISNUMBER($A2:$C2),"OK",""),0))
I would like to compare between few columns, what where the top 5 most popular products in year 2015.
I have this kind of data flow to work with:
Client | Product | Date of buy
------------------------------
client1 | A | 15.06.2015
client3 | A | 04.12.2015
client5 | F | 15.06.2015
client9 | G | 15.01.2015
client2 | G | 15.01.2015
client1 | R | 05.07.2015
client3 | G | 15.06.2015
client1 | F | 05.07.2015
client3 | F | 15.06.2016
Results - which products client bought the most with (in same date) the top 5 products communities of them. E.g..
1. Product A + Product H 222 times
2. Product A + Product E 77 times
3. Product B + Product O 70 times
4. etc
5. ...
Greetz,
Making the assumption:
you can use helper columns.
Your Columns up above are A, B and C.
You have two header rows and data starts in row 3.
Your dates are stored in an excel date format and not string values.
In E2 I generated a list of unique product items using the following formula:
=INDEX($B$3:$B$11,MATCH(0,INDEX(COUNTIF($E$2:E2,$B$3:$B$11),0,0),0))
I copied it down to match the number of rows in the initial list. It starts spitting #N/A when all the unique items in the list have been listed. If you want to avoid this you could put the formula inside of:
=IFERROR(insert formula,"")
Now in column F I did a count based on your criteria of each item and within the year 2015. I used a multiple count if function called COUNTIFS:
=COUNTIFS($C$3:$C$11,"<"&DATE(2016,1,1),
$C$3:$C$11,">"&DATE(2014,12,31),
$B$3:$B$11,E3)
I just reformatted that for easier reading. You will have to edit that slightly if you want to copy and paste. If you don't like seeing 0 when there is no product in the adjacent column you could wrap the equation in:
=IF(E3="","", insert formula )
I then skipped a column and sorted the list of counted items from largest to smallest and had it return the numbers in sequence. I only went down two rows, but you could technically do the whole list. The large function does this and the formula in H3 looks like:
=LARGE($F$3:$F$11,ROWS($1:1))
I then went back 1 column and put the product name that corresponds to the count, and then took the next name in the list when products had equal count. I put that in column F as normally when I read I want to read the product name first then read the quantity. If you want it the other way around just swap the columns. The formula in G1 is:
=INDEX($E$3:$E$11,MATCH(H3,$F$3:$F$11,0)+COUNTIF($H$3:$H3,H3)-1)
Copy E3 and F3 down as far as you need. Copy G3 and H3 down one row and you will have top two. down two rows and you have top three etc.
This is how it looks...The dates are displayed according to my computers date format.
I have a longitudinal spreadsheet of adolescent growth.
ID | CollectionDate | DOB | MOTHER ID | Sex
1 | 1Aug03 | 3Apr90 | 12 | 1
1 | 4Sept04 | 3Apr90 | 12 | 1
1 | 1Sept05 | 3Apr90 | 12 | 1
2 | 1Aug03 | 21Dec91 | 12 | 0
2 | 4Sept04 | 21Dec91 | 12 | 0
2 | 1Sept05 | 21Dec91 | 12 | 0
3 | 1Aug03 | 30Jan89 | 23 | 0
3 | 4Sept04 | 30Jan89 | 23 | 0
This is a sample of how my data is formatted and some of the variables that I have. As you can see, since it is longitudinal, each individual has multiple measurements. In the actual database there are over 10 measurements per individual and over 250 individuals.
What I am wanting to do is input a value signifying the number of older brothers and older sisters each individual has. That is why I have included the Mother ID (because it represents genetic relatedness) and sex. These new variable columns would just say how many older siblings of each sex each individual has. Is there a formula that I could use to do this quickly?
=COUNTIFS($B:$B,"<>"&$B2,$H:$H,$H2,$AI:$AI,$AI2,$J:$J,"<"&$J2)
Create a column named Distinct with this formula
=1/COUNTIF([ID],[#ID])
Then you can find all the older 0-sexed siblings like this
=SUMPRODUCT(([DOB]>[#DOB])*([MOTHERID]=[#MOTHERID])*([Sex]=0)*([Distinct]))
Note that I made the data a Table and used table notation. If you're not familiar [COLUMNNAME] refers to the whole column and [#COLUMNNAME] refers to the value in that column on the current row. It's similar to saying $A:$A and A2 if you're dealing with column A.
The first formula gives you a value to count that will always result in 1 for a particular ID. So ID=1 has three lines and Distinct will result in .33333 for each line. When you add up the three lines you get 1. This is similar to a SELECT DISTINCT in Sql parlance.
The SUMPRODUCT formula sums [Distinct] for every row where the DOB is greater than the current DOB, the Mother is the same as the current Mother, and the Sex is zero.
I have a possible solution. It involves adding two columns -- One for "# older siblings" and one for "unique?". So here are all the headings I have currently:
A -- ID
B -- CollectionDate
C -- DOB
D -- MOTHER ID
E -- Sex
F -- # older siblings
G -- unique?
In G2, I added the following formula:
=IF(A2=A1,0,1)
And dragged down. As long as the data is sorted by ID, this will only display "1" once for each unique person.
In F2, I added the following formula:
=COUNTIFS(G:G,"=1",D:D,"="&D2,C:C,"<"&C2)
And dragged down. It seemed to work correctly for the sample data you provided.
The stipulations are:
You would need the two columns.
The data would need to be sorted by ID
I hope this helps.
You need a formula like this (for example, for row 2):
=COUNTIFS($A:$A,"<>"&$A2,$E:$E,$E2,$D:$D,$D2,$C:$C,"<"&$C2)
Assuming E:E is column for sex, D:D is column for mother ID and C:C is column for DOB.
Write this formula in H2 cell for example and drag it down.
I have over 100k rows of data like below:
ALLA,ALLA,"Company1, Inc.","Company1, Inc.",PSA,PSA,1,1,FALSE,FALSE
BCCO,BCCO,"Company2, Inc.","Company2, Inc.",PSB,PSB,1,1,FALSE,FALSE
CTTP,CTTP,"Company3, Inc.","Company3, Inc.",PSC,PSC,1,1,FALSE,FALSE
CMMZ,CMMZ,"Company4, Inc.","Company4, Inc.",PSD,PSD,1,1,FALSE,FALSE
I want to know how to figure if data in column 1 is the same as column 2, column 3 as column 4 and so on. How could I do that in excel?
Following Cory's formula, I found that I can compare whole columns using:
=if(A:A=B:B, "yay", "aww")
Problem is I have a header in the file:
c - symbol, symbol, c - companyname, companyname, c - tradingvenue, tradingvenue, c - tierrank, tierrank, c - iscaveatemptor, iscaveatemptor
Shouldn't this cause A:A=B:B to be false?
Given this:
| A | B |
---+-----+-----+
1 | X | X |
---+-----+-----+
2 | Y | Y |
---+-----+-----+
3 | Z | Z |
The formula =SUMPRODUCT(--(A1:A3=B1:B3)) will tell you how many times the A value matches the B value.
You should get 3 as a result here. If, for example, you change B3 to Q then it will give you 2.
To do this on two columns without specifying the end of the range, try:
=SUMPRODUCT(--(A:A=B:B),--(LEN(A:A)>0))
I've been using Excel since 1991, and unless you want to write a VB macro, I think the best way is to do the simple IF statement suggested in the comments. If you need to test several columns at once, which is what your question suggests, then I'd do
=IF(AND(A1=B1,C1=D1,E1=F1,G1=H1),0,1)
Fill that formula down the column and then you'll be able toinstantly count the number of rows that don't matchwith a data-filter, select all the rows which have a '1', so you'll be able to examine the rows that don't match
I am working on simplifying a daily task and I wanted to pick up your brain. I have three different data sources (two from my company and the third from the costumer) with products, dates, category and volume. So far I have converted what I could and left out irrelevant data and combined two of the data into one pivot table (internal data) and the other one in a separate pivot table (external data download). My task is to compare the data and make sure that the internal data is exactly the same as external and if there is a difference investigate and change.
I have been comparing the two pivot tables manually with a ruler. The data changes daily and sometimes it is a long task. From reading up on the forum I found ways to compare values with formulas but I have values as well as text and dates and don’t know how to incorporate it all. Any thoughts will be really appreciated?
I'm assuming that you have some identifier to tell that this product and that are the same since you're using pivot table.
Something you can use will be vlookup.
The syntax is =vlookup(lookup_value, range, column_index, false)
Insert the number of columns for each item you will compare in a sheet where you have the internal data. If you're comparing dates, category and volume, this makes 6 columns you'll add and you can name them 'ext date,ext catandext vol`.
Sheet to make comparison (let's called it Comparison):
A B C D E F G H I J
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
1|Product |Int date|Int Cat |Int Vol |Ext date|Ext Cat |Ext Vol |Dif date|Dif Cat |Dif Vol |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
2|Item1 |01/01/12|Cat1 |23 |vlookup1|vlookup2| | | | |
| | | | | | | | | | |
Sheet where the external data is found (let's called it External):
A B C D
+--------+--------+--------+--------+
1|Product |Ext date|Ext Cat |Ext Vol |
+--------+--------+--------+--------+
2|Item1 |01/01/12|Cat1 |23 |
| | | | |
In the cell E2 of the sheet Comparison, you'll put:
=vlookup(A2, External!A:B, 2, false)
And the result will be 01/01/2012
What the code does is looks for Item1 in column A of External and returns the value in the second column (that's the purpose of the 2 in the formula) in the row it found Item1. You can specify any range (A:B in this formula) as long as the first column contains the value you're looking for, and the value you want to return in one of the columns included in the range. For instance, you could have written =vlookup(A2, External!A:D, 2, false) and it would return the same value since the index 2 is within that range, it won't work with =vlookup(A2, External!A:D, 5, false) since D is the 4th column, if that makes sense?
In the cell F2 of the sheet Comparison, you'll put:
=vlookup(A2, External!A:C, 3, false)
And in cell G2
=vlookup(A2, External!A:D, 4, false)
false in the formula means exact match. You can also use 0 instead; it's the same thing.
Then you can put =B2=E2 in the cell H2 to get the comparison between the dates, or any other formula you are already using which might be more suitable.
I hope it helps! :)