Sum rows containing only the first appearance of value in separate column - excel

I have a table that looks something like this:
Column A | Column B
A 1
B 2
C 3
A 4
What I want to do is get the sum of the values in Column B, but only first the first occurrence of each value in Column A. Thus, the result I want to get is (1 + 2 + 3) = 6 (Adding the first three rows, but omitting the fourth, because a row with 'A' in Column A has already included in the sum).
I've tried looking at Frequency, but I haven't been able to figure out how to use it properly to get the result I want.

Use SUMPRODUCT and MATCH:
=SUMPRODUCT(--(ROW(A1:A4)=MATCH(A1:A4,A:A,0)),B1:B4)

Related

How to recursively calculate mean of values based on values present in another cell of the same row in Excel

I have a spreadsheet with two columns of values, containing 1000 values each. The first column (column A) contains these values:
W
W
W
T
T
T
The second column (column B) contains these values:
1
2
3
4
5
6
Here is an image of the spreadsheet:
Can you please tell me if there is a way to recursively calculate the mean of those values in column B that have the same value in column A? In my case, the output should be a new column that looks like this:
2
5
As you can see, "2" is the mean of values in column B that has "W" in column A, while "5" is the mean of values in column B that has "T" in column A.
This type of question can be handled using the SubTotals feature, using Average() as the function to be used (instead of the default Sum() function).

Excel: Merge two columns into one column with alternating values

how can I merge two columns of data into one like the following:
Col1 Col2 Col3
========================
A 1 A
B 2 1
C 3 B
2
C
3
You can use the following formula in column D as per my example. Keep in mind to increase the $A$1:$B$6 range according to your data.
=INDEX($A$1:$B$6,INT((ROWS(D$2:D2)-1)/2)+1,MOD(ROWS(D$2:D2)-1,2)+1)
Result:
Thank you to #Koby Douek for the answer. Just an addition--if you are using Open Office Calc, you replace the commas with semi-colons.
=INDEX($A$1:$B$6;INT((ROWS(D$2:D2)-1)/2)+1;MOD(ROWS(D$2:D2)-1;2)+1)
Expanding #koby Douek's answer to more columns and explaining some of the terms
Original Code for 2 columns to 1 alternating
=INDEX($A$1:$B$6,INT((ROWS(D$2:D2)-1)/2)+1,MOD(ROWS(D$2:D2)-1,2)+1)
$A$1:$B$6 Defines the columns and rows to source the final set of data from, the $s are only present to keep the formula from changing the columns and rows selects if it is copied and pasted or dragged.
To extend to work on any values you dump into the columns instead of having to expand the range every time it should be amended to $A:$B or A:B so you can easily copy it to other sets of columns and create new merges, but it will also give the 1st value in every column as one of the alternating values so if you instead have headers you would be able to do this by instead using a large number so $A$1:$B$99999 or A$1:B$99999 if you want to past and move the columns ymmv which is better by situation.
lets assume you are fine including the values in the 1st row
This changes the formula to
=INDEX($A:$B,INT((ROWS(D$2:D2)-1)/2)+1,MOD(ROWS(D$2:D2)-1,2)+1)
Now on to D$2:D2
This is the row that is being used to calculate the difference between the current row the formula is in (D2) and the reference row (D$2) The important thing to make sure you do is to set the reference row number to the 1st row you will be putting values in, so if your 1st row is a header in the sort column you will use the 2nd row as the reference, if your values in the combined column D begin on the 3rd row then the reference row would be D$3
Since I like the more general form where the 1st row isn't a header row I'll use D$1:D1 but you could still mix source rows without headers into a combined row with a header of as many rows as you like just by incrementing that reference row number to be the 1st row where your values should begin.
This changes the formula to
=INDEX($A:$B,INT((ROWS(D$1:D1)-1)/2)+1,MOD(ROWS(D$1:D1)-1,2)+1)
Now INT((ROWS(D$1:D1)-1)/2)+1 and MOD(ROWS(D$1:D1)-1,2)+1
INT returns an integer value so any decimal places are dropped, it essentially functions like rounding down to the nearest whole number
MOD functions by returning the remainder of a division, it's result will be a whole number between 0 and n-1 where n is the number we are dividing by. (eg: 0/3=0; 1/3=1; 2/3=2; 3/3=0; 4/3=1 ... etc)
So -1)/2)+1 and -1,2)+1
the first value is again the difference between the current row and the reference row. but D$1:D1 is going to be the count of the rows, which is 1 so we have to correct for the rows count starting at 1 instead of 0 which would throw off our calculations, so both are using the -1 to reduce the count of the rows by 1
in the case of /2 and ,2 both are because we are dividing by 2 in the first statement it's a normal division by 2 /2 in the modulus statement it's an argument of the Mod function so ,2
finally we need to add 1 using +1 to correct for the index's need to have a value series which begins at 1.
INT((ROWS(D$2:D2)-1)/2)+1 is finding the row number to select the value from.
MOD(ROWS(D$1:D1)-1,2)+1 is finding the column number to select the value from
Thus we can change /2 and ,2 to /3 and ,3 to do this with 3 columns
This yields:
=INDEX($A:$B,INT((ROWS(D$1:D1)-1)/3)+1,MOD(ROWS(D$1:D1)-1,3)+1)
So maybe that's the confusing way to look at it but it's closer to how my mind works on it. Here is an alternative view:
=INDEX([RANGE],[ROW_#],[COLUMN_#]) returns the value from a range of rows and columns
Using the example:
=INDEX($A:$B,INT((ROWS(D$1:D1)-1)/3)+1,MOD(ROWS(D$1:D1)-1,3)+1)
[RANGE] = $A:$B this is the range of source columns.
[ROW_#] = INT((ROWS(D$1:D1)-1)/3)+1
INT([VALUE_A])+1 returns an integer value so any decimal places are dropped. Then adds one to it. we add one to the value because the result of the next steps will be 1 less than the value we need.
[Value_A] = (ROWS(D$1:D1)-1)/3
ROWS(D$1:D1) returns the number of rows in the Range to the current row in the results column, we use D$1 to designate the row number where the values in the results column begin. D1 is the current row in the results column giving us a range from the source row, allowing us to count the rows. we have to subtract 1 from this value using -1 to get the difference between the source and current. This is then divided by /3 because we have three columns we want to look through in this example so we only change rows when the result is divisible by 3. the INT drops any decimal places as mentioned so it only increments when cleanly divisible by 3.
[COLUMN_#] = MOD(ROWS(D$1:D1)-1,3)+1
MOD([VALUE],[Divisor])+1 returns the remainder of the value when divided by the divisor.
Using the example:
MOD(ROWS(D$1:D1)-1,3)+1
In this case we still divide by 3 but it's an argument to the MOD function, we still need to count the number of rows and subtract 1 before dividing it, this will return a 0, 1, or 2 for the column, but as above we are shifted backwards by 1 as the column numbers begin with the number 1, so as before we must add 1
And here we add column A and D
two different formulas depending on if you add the formula to an odd row or an even row.
https://1drv.ms/x/s!AncAhUkdErOkguUaToQkVkl5Qw-l_g?e=5d9gVM
Odd Start row
=INDEX($A$2:$D$9;ROUND(ROW(A1)/2;0);IF(MOD(ROW()-ROW($A$2);2)=1;4;1))
Even Start row
=INDEX($A$2:$D$9;ROUND(ROW(A1)/2;0);IF(MOD(ROW()-ROW($A$1);2)=1;4;1))
What is A1 in the picture is the cell directly above your first data cell.
If you want to place it on a different sheet you just add the sheet name:
=INDEX(MySheet!$A$2:$D$9;ROUND(ROW(MySheet!A1)/2;0);IF(MOD(ROW()-ROW(MySheet!$A$2);2)=1;4;1))
=INDEX(MySheet!$A$2:$D$9;ROUND(ROW(MySheet!A1)/2;0);IF(MOD(ROW()-ROW(MySheet!$A$1);2)=1;4;1))

Look up for highest value from another column if values are equal excel

***1 2 3***
a 2 3
b 3 4
c 4 3
d 5 2
so I know to get the highest value I do
=INDEX(column1, MATCH(MAX(column3), column3, 0))
... which would give me 'b'
now I want to get the second highest value based on the column 3 but because there are two cells with 3 (which is the second highest value) I want to use the one that has the lowest value in column 2 based on those two rows. Is this possible?
Use a 'helper' column that adds column C + (column B ÷ 10) and use a modification of your original formula on that column.
        
The standard formula in F5 is,
=INDEX(A$2:A$5, MATCH(AGGREGATE(14, 6, D$2:D$5, ROW(1:1)), D$2:D$5, 0))
Fill down as necessary.

Keep on summing corresponding cell and compare with first column for summing up first column based on comparision in excel

Consider the following data setup:
_A_ _B_ _C_
1 1
2 1 1
3 3
Such that a formula would return the following results for columns B and C respectively:
_A_ _B_ _C_
4 2
Now I want to sum column A if A-(B+C) is equal to 0.
so for above example sum would be 1+3 = 4 on column B, since row 1 and 3 satisfy 1-1=0 first row, 3-3=0 third row. so A value on 1st and 2nd row is 1+3=4. Row 2nd doesn't satisfy 2-1=1 not 0 so ignore.
on column C, B+C in second row 2-(1+1) = 0 ,So it would be sum 2 in that column C, ignoring first and third row since it already has been counted on column B.
columns continue like D E....
So sum up from B to current column..so if i am in column B it will sum up till B.If in C B+C....If in D B+C+D etc and then compare with column A
Insufficient rep to comment, this is at least a partial answer and perhaps full.
I think you're looking for this to happen in some lower row in B:
=SUMPRODUCT(--(A1:A3=B1:B3),A1:A3)
And this in C:
=SUMPRODUCT(--(A1:A3-(B1:B3+C1:C3)=0),A1:A3)
Although as EEM points out all the sample rows satisfy this condition so you get "6" instead of "2"

Compare two Excel columns, output cells in A that do not appear in B

I am trying to compare two columns in excel, A and B. Column A contains a complete list of customer numbers. Column B contains an incomplete list of the same customer numbers. So if a customer number is in A, but not in B then output that number to column C.
I'd use the MATCH function in combination with ISNA.
If I have the following table
A B C
1 4
2 3
3 1
4 7
5 2 5
6 6
7
I put in column 'A' the full customer list, and in column B is a random ordered partial list. I then put the function in C1 (and the rest of column C):
=IF(ISNA(MATCH(A1,B:B,0)),A1, "")
Now I see the values '5' and '6' only in column C, because those are the only two numbers that don't appear in column B.
In Cel C1 =IF(ISERROR(VLOOKUP(A1,$B$1:$B$10,1,FALSE)),A1,"")
Adjust for row counts and fill down against column A.
I think you're looking for something like this:
=IF(ISERROR(MATCH(A1,B1,0)),A1,"")
Propegate that formula along your new column and it'll reprint the populated Column A when Column B is a no match.
Reference URL: http://support.microsoft.com/kb/213367
(I believe I read the original question wrong, and am going on the assumption that column A and B are already sorted where the values will line up.)

Resources