Comapring data from several spreadsheets via pivot tables - excel

I am working on simplifying a daily task and I wanted to pick up your brain. I have three different data sources (two from my company and the third from the costumer) with products, dates, category and volume. So far I have converted what I could and left out irrelevant data and combined two of the data into one pivot table (internal data) and the other one in a separate pivot table (external data download). My task is to compare the data and make sure that the internal data is exactly the same as external and if there is a difference investigate and change.
I have been comparing the two pivot tables manually with a ruler. The data changes daily and sometimes it is a long task. From reading up on the forum I found ways to compare values with formulas but I have values as well as text and dates and don’t know how to incorporate it all. Any thoughts will be really appreciated?

I'm assuming that you have some identifier to tell that this product and that are the same since you're using pivot table.
Something you can use will be vlookup.
The syntax is =vlookup(lookup_value, range, column_index, false)
Insert the number of columns for each item you will compare in a sheet where you have the internal data. If you're comparing dates, category and volume, this makes 6 columns you'll add and you can name them 'ext date,ext catandext vol`.
Sheet to make comparison (let's called it Comparison):
A B C D E F G H I J
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
1|Product |Int date|Int Cat |Int Vol |Ext date|Ext Cat |Ext Vol |Dif date|Dif Cat |Dif Vol |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
2|Item1 |01/01/12|Cat1 |23 |vlookup1|vlookup2| | | | |
| | | | | | | | | | |
Sheet where the external data is found (let's called it External):
A B C D
+--------+--------+--------+--------+
1|Product |Ext date|Ext Cat |Ext Vol |
+--------+--------+--------+--------+
2|Item1 |01/01/12|Cat1 |23 |
| | | | |
In the cell E2 of the sheet Comparison, you'll put:
=vlookup(A2, External!A:B, 2, false)
And the result will be 01/01/2012
What the code does is looks for Item1 in column A of External and returns the value in the second column (that's the purpose of the 2 in the formula) in the row it found Item1. You can specify any range (A:B in this formula) as long as the first column contains the value you're looking for, and the value you want to return in one of the columns included in the range. For instance, you could have written =vlookup(A2, External!A:D, 2, false) and it would return the same value since the index 2 is within that range, it won't work with =vlookup(A2, External!A:D, 5, false) since D is the 4th column, if that makes sense?
In the cell F2 of the sheet Comparison, you'll put:
=vlookup(A2, External!A:C, 3, false)
And in cell G2
=vlookup(A2, External!A:D, 4, false)
false in the formula means exact match. You can also use 0 instead; it's the same thing.
Then you can put =B2=E2 in the cell H2 to get the comparison between the dates, or any other formula you are already using which might be more suitable.
I hope it helps! :)

Related

adding another column in excel being compared to another

I have 2 spreadsheets in excel. Spreadsheet 1 contains many columns including ssn and employee ID. Spreadsheet 2 contains different columns and has an ssn column but not the employee id. I wanted create a new column in Spreadsheet 2 that is called employee id and match it with the employee's ssn that it is correlated to in Spreadsheet 1. I feel like I would need to use VLOOKUP but I am not entirely sure. Any help ? Thanks
=VLOOKUP(L2, [spreadhseet1.xlsx]spreadhseet1!$A:$P, 2, FALSE)
Above is the formula that I used. L2 is the column in table2 that contains the employee's ssn. I then took the range of all values in table1 from column A-P. Column 2 in spreadhseet2 contains the employee id which is why I entered 2. Not sure why it is providing an error
VLOOKUP is the correct function.
If you have a source similar to
[A] [B]
1] SSN | empid
---------------------
2] 123 | ABC
3] 456 | DEF
....
99999] 987 | QQQ
^------- index = 2 for return value
^------------------ lookup is in first column
and want to populate destination with "empid":
[a] [b] [c]
1] ssn | empid | col2 | col 3 | ..
---------------------------------------------
2] 123 | | B | C |
3] 456 | | d | e |
^----- formula here
^------------- this value is used as lookup
Then you place in the empty "empid" column a formula
=VLOOKUP(A2, 'Sheetname'!$A2:$B9999 , 2 , false)
where
A2 is your source column. 2 obviously same row.
'Sheetname'!$A2:$B99999 is your data source area, excluding header.
2 is your 1-indexed column to return
False is to return a precise match
Notice that you may be need to handle a case where lookup value does not exist in source data. This will return #N/A and can be handled simply
If you absolutely NEED to do this across files, it's possible, but you risk breaking something if the files are not there. It's explained here.
NOTICE If you are on a non-US version of Excel, functions may separate with ";" and have different names

Countif criteria with IF/OR logic across multiple sheets and columns

I've been struggling between the SUMPRODUCT and COUNTIFS formulas as there are a lot of specific dependencies in my data. Wondering if anyone can shed a bit more light on this issue.
Have tried SUMPRODUCT and COUNTIFS which give me calculations based on 1 set, but I need to include additional if/or statements.
I have the following:
| ID | Size | Dead/Alive | Duration | Days | Pass/Fil | Reason |
|----|---------|------------|-----------|------|----------|----------|
| 1 | Full | Dead | Permanent | 125 | Pass | Comments |
| 2 | Partial | Alive | Permanent | 500 | Pass | |
| 3 | Other | Dead | Temporary | 180 | Fail | Comments |
| 4 | No | Dead | Temporary | 225 | Fail | Comments |
| 5 | Yes | Alive | Permanent | 200 | Pass | |
with the following rules:
Only Count the ID/ROW if:
1) Values in column A = Full, Partial or Other
OR...
2) Values in column A = No AND values in column B = Dead
OR...
3) If values in column C = Permanent AND values in column D = >=100 or <=200
OR
4) If values in column C = Temporary AND values in column E = Pass, Fail AND column F=not blank
By my calculations, the total should be 5, but this is just a small sampling of my total data. Just not sure how to get that in Excel with either Sumproduct, Countifs or even someone suggested a Lookup function, although Ive never used that one.
Given that you have so many different conditions, I have to break it down one by one and create a few helper columns to account for each condition.
In my solution I created 10 helper columns as shown below, and I have added some sample data (ID 6 to 29) to test the solution.
I also named 7 conditions in my solution:
Cond_1 Values in column A = Full, Partial or Other
Cond_2 Values in column A = No AND values in column B = Dead
Cond_3A Values in column C = Permanent
Cond_3B Values in column D >=100
Cond_3C Values in column D <=200
Cond_3A, Cond_3B and Cond_3C must be TRUE at the same time
Cond_4 Values in column C = Temporary AND values in column E = Pass
Cond_5A Values in column C = Temporary AND values in column E = Fail
Cond_5B Column F is not blank (I did not give a name to this condition)
Cond_5A and Cond_5B must be TRUE at the same time
Please note my Cond_4, Cond_5A and Cond_5B are all related to your original condition 4), which reads a bit odd, and I am not 100% sure if my interpretation of the condition is correct. If not please re-state your last condition and I can amend my answer accordingly.
As shown in my screen-shot, the formulas in I2 to Q2 are listed in Column U. I only used MAX, AND, SUM, =, &, and/or <> to interpret each condition. Please note some of the formulas are Array Formula so you need to press Ctrl+Shift+Enter to make it work.
The To Count column is simply asking whether the SUM of the previous 9 columns is greater than 1, which means at least one of the conditions is met. If so returns 1 otherwise 0.
Then you just need to work out the total of To Count column. In my example it is 22. I have highlighted the entries that did not meet any of the given condition.
You can use only one helper column to capture all conditions in one formula, but I would not recommend it as it would be too long to be easily understood and modified in future.
{=--(SUM(MAX(--(A2=Cond_1)),MAX(--(A2&B2=Cond_2)),--(SUM(--(C2=Cond_3A),--(AND(D2>=Cond_3B,D2<=Cond_3C)))=2),MAX(--((C2&E2)=Cond_4)),--(SUM(MAX(--((C2&E2)=Cond_5)),--(F2<>""))=2))>0)}
Ps. I would also wonder if there is a formula-based solution without using any helper column...? :)

Structured Reference in an Index/Match Calculated Column Formula for Header Cell Range in a Table

I have this formula in a calculated column that is working great:
=IFERROR(INDEX(Allocation_of_Funds[[#Headers],[End.Nursing]:[Unassigned14]],MATCH(TRUE,INDEX(Allocation_of_Funds[#[End.Nursing]:[Unassigned14]]>0,0),0)),"")
But this formula is giving me trouble and represents what I want in the next calculated column, (based on the value in the previous column above) but it returns a #REF! error:
=INDEX(INDIRECT("Allocation_of_Funds[[#Headers]"&"["&[#[SOURCE 1]]&"]"):[#Unassigned14],MATCH(TRUE,INDIRECT("Allocation_of_Funds[[#Headers]"&"["&[#[SOURCE 1]]&"]"):[#Unassigned14]>0,0),0)
The details of the tables setup is as follows, in case it's helpful:
I have a table with a range of columns and each column represents a different type of account. For each row, any combination of these columns could contain values or blanks, so I've got another set of columns that I want to identify the table column headers for the non-blank columns for each record.
SOURCE1 | SOURCE2 | SOURCE3 | ACCT1 | ACCT2 | ACCT3 | ACCT4 | ACCT5
ACCT1 | ACCT2 | ACCT4 | 500 | 300 | | 100 |
ACCT2 | ACCT3 | | | 200 | 100 | |
ACCT3 | | | | | 500 | |
| | | | | | |
ACCT3 | ACCT4 | ACCT5 | | | 200 | 300 | 50
ACCT1 | ACCT3 | ACCT4 | 123 | | 332 | 100 |
So I need the SOURCE2 column to use the value in the SOURCE1 column to identify the start of the range where I am looking for the next cell with a value, whereby the column header above that value will be returned for the SOURCE2 row value. The same formula will apply to the SOURCE3 column, using the value of the SOURCE2 column to identify the start of the next range.
Thank in advance for picking your brain!
-Lindsay
I used the following formula to pull the headers and place them under the source numbers:
=IFERROR(INDEX($D$1:$H$1,AGGREGATE(15,6,COLUMN($D2:$H2)/ISNUMBER($D2:$H2)-COLUMN($D$1)+1,RIGHT(A$1,1)*1)),"")
I assumed your table's top left corner was in A1 with 1 being the header row and A-C being your source columns and D to H being account columns. The above formula can be placed in cell A2 and copied to the right and down as need be.
You seem to have a grasp of the IFERROR and INDEX function so I will explain the AGGREGATE function:
=AGGREGATE(15,6,COLUMN($D2:$H2)/ISNUMBER($D2:$H2)-COLUMN($D$1)+1,RIGHT(A$1,1)*1)
The AGGREGATE function is a mixture of a bunch of different functions rolled into with the ability to ignore some calculations. Another added feature is that some of the built in functions perform array calculations without the need for arrays.
In this particular case I chose aggregate function 15 which is the same as the SMALL function. I have also told aggregate to ignore calculations which generate errors by using the "6". For the array calculation I have asked it to divide the column number it is working with by the True or False result of that column being a number:
COLUMN($D2:$H2)/ISNUMBER($D2:$H2)-COLUMN($D$1)+1
True in excel math is the same as 1 and False is the same as 0. Anytime the cell is not a number it will try to divide by zero, generate an error, and be ignored by Aggregate function. This basically generates a list of column numbers that meet the criteria of having a number in their column. The subtraction of the D1 followed by a +1 is to convert the column number that is determined, to a relative column under your accounts headers.
The next part of the aggregate function is telling the SMALL operation which number in sorted order needs to be returned. I used the last character in your source header to determine which column number to return. For SOURCE1 the last character is 1 so I want the smallest column number returned. For SOURCE2, the second smallest number is returned. The *1 at the end converts the character to a number instead of 1 as text.
RIGHT(A$1,1)*1
Ergo, if you want to use up to 9 sources you can. You can do more sources as well but you would need to revise this formula or come up with a different way of providing which number of the small list you want returned. And you can expand the D2:H2 reference to be all your accounts, and adjust the D1:H1 reference to cover all your account headers.
Proof of Concept

How to get two+ rows to link together? Excel 2010 (Example)

I have a parts list with competitor pricing. One part number brings multiple brands up with the location of the company.
As you can see from the picture, I have part numbers for one item with three companies. I want to sort by part type. So for example I want to list only the brake pads. When I do this the blanks get sent to the bottom, but the blanks are not really blanks because they have additional info with them for that part number.
Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 | Column 7
Part No | Company A | Price | Company B | Price | Company C | Price
4656546 | Brand A | $5 | Brand A | $5 | Brand A | $5
(BLANK) | Brand b | $8 | Brand b | $8 | Brand b | $8
I have tried to use a helper column, but I have 1,000+ rows.
Does anyone know if you can link or have a relationship between two+ rows?
I hope you understand and if not. I can try to explain better.
I asume that a "blank" in PartNo means "take the PartNo from the cell above" ...
In order to normalize the PartNo (= get rid of the blanks) use another PartNo-Normalized column (e.g. [K:K]) and normalize as following:
K1 ="PartNo-Normalized"
K2:Kxx =IF(A2<>"",A2,K1)
Next convert all formulas in [K:K] into values !!! (Copy / PasteAs - Values) before sorting ... as a sort operation will destroy the calculated values.
After conversion to values it's save to sort, and you may create a filter on that column.
Depending on how well organized your data is, it might be a good idea to add one more column and fill it with 1, 2, 3, 4, 5 ... before any sorting so you can restore the original sort order just in case something nasty happens.

Find value using multiple criteria

I've got a table with columns, each containing customer contact information. I've also got a formula that finds a phone number using multiple criteria: customer ID, type (mobile, home etc), and primary Y/N. The problem is this information can occur several times but with a different date, in which case the newest occurrence needs to be selected. The current CSE formula is:
=INDEX($C$6:$BZ$18;10;MATCH(<client_ID>;IF(($C$8:$BZ$8=<client_ID>)*($C$17:$BZ$17="home")*($C$18:$BZ$18="Y");$C$8:$BZ$8);0))
where
$C$6:$BZ$18 contains all data
$C$8:$BZ$8 contains all client IDs
$C$17:$BZ$17 contains the types of phone numbers
$C$18:$BZ$18 contains whether this number is the primary number of that type
$C$8:$BZ$8 contains the date a number was entered
The data looks like this:
B C D
---------------------------------------------------------------------
8 CLIENTID |Client1 |Client1 |
9 other | | |
10 other | | |
11 other | | |
12 other | | |
13 other | | |
14 other | | |
15 PHONE NUMBER |9876543210 |1234567890 |
16 DATE |2015-04-15 |2015-04-16 |
17 TYPE |Home |Home |
18 Primary |Y |Y |
The above formula selects phone number 9876543210 but it needs to select 1234567890 because that is the latest entry.
Any ideas on how to proceed from here?
The underlying value of dates are numbers so we can look for the furthest date to the right in a row by searching for an impossibly high number with the MATCH function without looking for an exact match.
      
The array formula in F6 is,
=INDEX($B$8:$BZ$18, MATCH(F$5, $B$8:$B$18, 0), MATCH(1E+99, IF($B$8:$BZ$8=$C6, IF($B$17:$BZ$17=$D6, IF($B$18:$BZ$18=$E6, $B$16:$BZ$16)))))
Array formulas need to be finalized with Ctrl+Shift+Enter↵.
If your dates are in ascending order (left-to-right) then an exact match will have to be sought. A three criteria pseudo-MAXIF formula can return that into the original formula modified to look for an exact match. If the maximum date is duplicated, the first one is returned.
=INDEX($C$8:$BZ$18, MATCH(F$5, $B$8:$B$18, 0), MATCH(MAX(INDEX($C$16:$BZ$16*($C$8:$BZ$8=$C6)*($C$17:$BZ$17=$D6)*($C$18:$BZ$18=$E6), , )), IF($C$8:$BZ$8=$C6, IF($C$17:$BZ$17=$D6, IF($C$18:$BZ$18=$E6, $C$16:$BZ$16))), 0))
In order to provide some maths without errors, I've shifted the calculation ranges to C:BZ. Array formulas still need to be finalized with Ctrl+Shift+Enter↵.
By appropriately locking either the row, column or both of the cell addresses, we can use the column header to identify a different category from column B as I have done with DATA LINE. The formula can be simply filled right.

Resources