I have the following table in Excel.
A B C D E
1 i x y mu sigma
2 0 1 2 =average(b$1:b1,b3:b$12) =stdev.s(b$1:b1,b3:b$12)
3 1 3 4 =average(b$1:b2,b4:b$12) =stdev.s(b$1:b2,b4:b$12)
4 2 2 1 =average(b$1:b3,b5:b$12) =stdev.s(b$1:b3,b5:b$12)
5 3 1 2 ... ...
6 4 2 5
7 5 4 7
8 6 8 1
9 7 2 3
10 8 5 9
11 9 1 3
The ith mu calculates the average without the ith observation—the leave-one-out average. I can also calculate the leave-one-out standard deviations, but how can I do this for the leave-one-out correlations then? correl requires two arrays and I cannot feed two or more distant cells as one array using commas. Can I input non-consecutive cells as one array? For example, I tried =correl((b$1:b1,b3:b$12),(c$1:c1,c3:c$12)), but failed. Thanks for your reading.
Because CORREL takes arrays we can use an array formula:
=CORREL(INDEX(B:B,N(IF({1},MODE.MULT(IF(ROW($B$2:$B$11)<>ROW(),ROW($B$2:$B$11)*{1,1}))))),INDEX(C:C,N(IF({1},MODE.MULT(IF(ROW($C$2:$C$11)<>ROW(),ROW($C$2:$C$11)*{1,1}))))))
Use this easier array formula as per the OP(I overthought this):
=CORREL(IF(ROW($B$2:$B$11)=ROW(),"",$B$2:$B$11),IF(ROW($C$2:$C$11)=ROW(),"",$C$2:$C$11))
Being an array formula one needs to use Ctrl-shift-Enter instead of Enter when exiting edit mode.
This will pass two arrays of only the numbers not on the row where the formula is placed.
Now, eventually when microsoft releases the dynamic array formula into Office 365 this can be simplified using FILTER()
=CORREL(FILTER($B$2:$B$11,ROW($B$2:$B$11)<>ROW()),FILTER($C$2:$C$11,ROW($C$2:$C$11)<>ROW()))
Related
I have a list of part numbers that are used in 4 different top level assemblies. The parts can be used in 1 to 4 of the top level assemblies. I'm trying to write a formula that will count how many unique top level assemblies a part number occurs in. I had previously written a formula that worked, but it uses UNIQUE and FILTER, and my coworkers don't have Excel 365, so those formulas aren't supported for them. I've been trying to come up with a workaround and would really appreciate any help :)
I have an example (I can't provide any real data) section of our spreadsheet and an image of the formula I had that was working
Top Level Assy
Part Number
Qty
Number of times used
02554
01622
4
3
89975
01622
4
3
95665
01622
4
3
98886
01723
4
1
98886
01723
10
1
98886
01723
4
1
02554
01734
4
3
89975
01734
4
3
95665
01734
4
3
02554
01740
6
3
89975
01740
6
3
95665
01740
6
3
02554
01746
5
3
89975
01746
5
3
95665
01746
5
3
02554
01835
2
3
89975
01835
2
3
95665
01835
2
3
02554
51205
4
3
=SUM(--(LEN(UNIQUE(FILTER(A:A, C:C=C2, "")))>0))
Picture of the excel sheet
Picture of working formula
Use the following formula in row 2: =SUMPRODUCT(--(FREQUENCY(IF($B$2:$B$20=$B2,$A$2:$A$20),$A$2:$A$20)>0))
*I think it doesn't require ctrl+shift-enter in older Excel versions, since SUMPRODUCT is an array formula by default.
The formula checks the frequency of values in column A where column B matches the value in the current row. It returns the count per unique value meeting the condition. Wrapping it in -- & >0 returns 1 for each unique value. SUMPRODUCT sums them.
Edit:
I realized that the top level assembly values are actual text, not numeric values. In that case (since it's all numeric values stored as text) you can use this workaround:
=SUMPRODUCT(--(FREQUENCY(IF($B$2:$B$20=$B2,--($A$2:$A$20)),--($A$2:$A$20))>0))
It converts the text to numbers.
Sidenotes to this workaround:
If any value would contain a character other than numeric it will not get counted.
If you have both values like 02554 and 2554 they'll both get converted to 2554 and counted likewise.
Edit 2:
For text use the following:
=SUMPRODUCT(IF($B$2:$B$20=$B2, 1/(COUNTIFS($B$2:$B$20, $B2, $A$2:$A$20, $A$2:$A$20)), 0))
I am working at moving my data over from Excel to Matlab. I have some data that I want to average based on multiple criteria. I can accomplish this by looping, but want to do it with matrix operations only, if possible.
Thus far, I have managed to do so with a single criterion, using accumarray as follows:
data=[
1 3
1 3
1 3
2 3
2 6
2 9];
accumarray(data(:,1),data(:,2))./accumarray(data(:,1),1);
Which returns:
3
6
Corresponding to the averages of items 1 and 2, respectively. I have at least three other columns that I need to include in this averaging but don't know how I can add that in. Any help is much appreciated.
For your single column, you don't need to call accumarray twice, you can provide a function handle to mean as the fourth input
mu = accumarray(data(:,1), data(:,2), [], #mean);
For multiple columns, you can use the row indices as the second input to accumarray and then use those from within the anonymous function to access the rows of your data to operate on.
data = [1 3 5
1 3 10
1 3 8
2 3 7
2 6 9
2 9 12];
tmp = accumarray(data(:,1), 1:size(data, 1), [], #(rows){mean(data(rows,2:end), 1)});
means = cat(1, tmp{:});
% 3.0000 7.6667
% 6.0000 9.3333
I'm looking for a way to compare multiple rows with data to each other, trying to find the best possible match. Each number in every column must be an approximately match the other numbers in the same column.
Example:
Customer #1: 1 5 10 9 7 7 8 2 3
Customer #2: 10 5 9 3 5 7 4 3 2
Customer #3: 1 4 10 9 8 7 6 2 2
Customer #4: 9 5 6 7 2 1 10 5 6
In this example customer #1 and #3 is quite similar, and I need to find a way to highlight or sort the rows so I can easily find the best match.
I've tried using conditional formatting to highlight the numbers that are the similar, but that is quite confusing, because the amount of data is quite big.
Any ideas of how I could solve this?
Thanks!
The following formula entered in (say) L1 and pulled down gives the best match with the current row based on the sum of the absolute differences between corresponding cells:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(ABS($C1:$K1-$C$1:$K$4),TRANSPOSE(COLUMN($C$1:$K$4))^0))))
It is an array formula and must be entered with CtrlShiftEnter.
You can then sort on column L to bring the customers with lowest similarity scores to the top or use conditional formatting to highlight rows with a certain similarity value.
EDIT
If you wanted to penalise large differences in individual columns more heavily than small differences to try and avoid pairs of customers which are fairly similar except for having some columns very different, you could try something like the square of the differences:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(($C1:$K1-$C$1:$K$4)^2,TRANSPOSE(COLUMN($C$1:$K$4))^0))))
then the scores for your test data would come out as 7,127,7,127.
I'm assuming you want to compare customers 2-4 with customer 1 and that you are comparing only within each column. In this case, you could implement a 'scoring system' using multiple IFs. For example,:
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2
3 Customer 3 0 1 0
you could use in E2
=if(B2=$B$1,1,0)+if(C2=$C$1,1,0)+if(D2=$D$1,1,0)
This will return a 'score' of 1 when you have a match and a 'score' of 0 when you don't. It then adds up the scores and your highest value will be your best match. Copying down would then give
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2 2
3 Customer 3 0 1 0 1
so customer 2 is the best match.
I am trying to find the equation of a plane of best fit to a set of x,y,z data using the LINEST function. Some of the z data is missing, meaning that there are #N/As in the z column. For example:
A B C
(x) (y) (z)
1 1 1 5.1
2 2 1 5.4
3 3 1 5.7
4 1 2 #N/A
5 2 2 5.2
6 3 2 5.5
7 1 3 4.7
8 2 3 5
9 3 3 5.3
I would like to do =LINEST(C1:C9,A1:B9), but the #N/A causes this to return a value error.
I found a solution for a single independent variable (one column of known_x's, i.e. fitting a line to x,y data), but I have not been able to extend it for two independent variables (two known_x's columns, i.e. fitting a plane to x,y,z data). The solution I found is here: http://www.excelforum.com/excel-general/647448-linest-question.html, and the formula (slightly modified for my application) is:
=LINEST(
N(OFFSET(C1:C9,SMALL(IF(ISNUMBER(C1:C9),ROW(C1:C9)-ROW(C1)),
ROW(INDIRECT("1:"&COUNT(C1:C9)))),0,1)),
N(OFFSET(A1:A9,SMALL(IF(ISNUMBER(C1:C9),ROW(C1:C9)-ROW(C1)),
ROW(INDIRECT("1:"&COUNT(C1:C9)))),0,1)),
)
which is equivalent to =LINEST(C1:C9,A1:A9), ignoring the row containing the #N/A.
The formula from the posted link could probably be adapted but it is unwieldy. Least squares with missing data can be viewed as a regression with weight 1 for numeric values and weight 0 for non-numeric values. Based on this observation you could try this (with Ctrl+Shift+Enter in a 1x3 range):
=LINEST(IF(ISNUMBER(C1:C9),C1:C9,),IF(ISNUMBER(C1:C9),CHOOSE({1,2,3},1,A1:A9,B1:B9),),)
This gives the equation of the plane as z=-0.2x+0.3y+5 which can be checked against the results of using LINEST(C1:C8,A1:B8) with the error row removed.
I'm not sure how to ask this question without illustrating it, so here goes:
I have results from a test which has tested peoples attitudes (scores 1-5) to different voices on a 16 different scales. The data set looks like this (where P1,P2,P3 are participants, A, B, C are voices)
Aformal Apleasant Acool Bformal etc
P1 2 3 1 4
P2 5 4 2 4
P3 1 2 4 3
However, I want to rearrange my data to look like this:
formal pleasant cool
P1A 3 3 5
P1B 2 1 6
P1C etc
P1D
This would mean a lot more rows (multiple rows per participant), and a lot fewer columns. Is it doable without having to manually reenter all the scores in a new excel file?
Sure, no problem. I just hacked this solution:
L M N O P Q
person# voice# formal pleasant cool
1 1 P1A 2 3 1
1 2 P1B 4 5 2
1 3 P1C 9 9 9
2 1 P2A 5 4 2
2 2 P2B 4 4 1
2 3 P2C 9 9 9
3 1 P3A 1 2 4
3 2 P3B 3 3 2
3 3 P3C 9 9 9
Basically, in columns L and M, I made two columns with index numbers. Voice numbers go from 1 to 3 and repeat every 3 rows because there are nv=3 voices (increase this if you have voices F, G, H...). Person numbers are also repeated for 3 rows each, because there are nv=3 voices.
To make the row headers (P1A, etc.), I used this formula: ="P" & L2 & CHAR(64+M2) at P1A and copied down.
To make the new table of values, I used this formula: =OFFSET(B$2,$L2-1,($M2-1)*3) at P1A-formal, and copied down and across. Note that B$2 corresponds to the cell address for P1-Aformal in the original table of values (your example).
I've used this indexing trick I don't know how many times to quickly rearrange tables of data inherited from other people.
EDIT: Note that the index columns are also made (semi)automatically using a formula. The first nv=3 rows are made manually, and then subsequent rows refer to them. For example, the formula in L5 is =L2+1 and the formula in M5 is =M2. Both are copied down to the end of the table.