How to recreate Excel's "Index(,,Match())" function in SPSS? - excel

I am trying to recreate Excel's "Index(,,Match())" function in SPSS. My data is organized as follows:
The "Position" variables indicate what column (T:V) the value in the "Value" variables should go.
In the 1st row, the positions are in order 1-3, so the values in columns T:V are in the same order as the "Value" variables.
In the second row the positions are 2,3,1; so the value in "Value1" should go in column U (the second column in that last block of variables), the value in "Value2" should go in the column V, and the value in "Value 3 should go in column T. And so on.
After looking into doing this in SPSS, SPSS' Index and Match functions will not help.
Do any Excel/SPSS users know how to accomplish this in SPSS with syntax?

There are probably several ways to approach the problem, depending on how many columns you're dealing with and whether they're all numeric or if there are strings (there's probably a matrix algebra answer, I just can't think of it).
If you only have 3 sets of 3 columns, the simplest approach would be to write 9 (3*3) if-statements (you don't have column names for cols T/U/V, so I'm just referencing their Excel column):
if (Position1 = 1) T = Value1.
if (Position1 = 2) T = Value2.
if (Position1 = 3) T = Value3.
if (Position2 = 1) U = Value1.
if (Position2 = 2) U = Value2.
...
This should work. If you have many more columns, you can also use vector loops to define the sets of variables.

Here's a scalable approach:
vector match(3).
do repeat p = position1 to position3 / v= value1 to value3 / y = #y1 to #y3.
compute y = v*p.
end repeat.
loop #i = 1 to 3.
compute match(#i) = any(#i, #y1 to #y3).
end loop.
exe.

Related

Search data in variable table Excel

In my Excel file, I have data split up over different tables for different values of parameter X.
I have tables for parameter X for values 0.1, 0.5, 1, 5 and 10. Each table has a parameter Y at the far left that I want to able to search for with a few data cells right of it. Like so:
X = 0.1
Y
Data_0
Data_1
Data_2
1
0.071251
0.681281
0.238509
2
0.283393
0.509497
0.397196
3
0.678296
0.789879
0.439004
4
0.788525
0.363215
0.248953
etc.
Now I want to find Data_0, Data_1 and Data_2 for a given X and Y value (in two separate cells).
My thought was naming the tables X0.1 X0.5 etc. and when defining the matrix for the lookup function use some syntax that would change the table it searches in. With three of these functions in adjacent cells, I would obtain the three values desired.
Is that possible, or is there some other method that would give me the result I want?
Thanks in advance
On the question what would be my desired result from this data:
I would like A1 to give the value for the X I'm searching for (so 0.1 in this case)
A2 would be the value of Y (let's pick 3)
then I want C1:E1 to give the values 0.678... 0.789... 0.439...
Now from usmanhaq, I think it should be something like:
=vlookup(A2,concatenate("X",A1),2)
=vlookup(A2,concatenate("X",A1),3)
=vlookup(A2,concatenate("X",A1),4)
for the three cells.
This exact formulation doesn't work and I can't find the formulation that does work.

2D vector lookup [x;y]

I've got a matrix, with two coordinates [i;j]
I'm trying to automatize a lookup:
As an example, this would have the coordinates of [1;2]
Here's a table of all the coordinates:
So here, obviously [1;2] would equate to 143,33
To simplify the issue:
I'll try to go step by step over what I'm trying to do to make the question bit less confusing.
Think of what I'm trying to do as a function, lookup(i, j) => value
Now, refer to the second picture (table)
I find all rows containing index [i] (inside column C) and then
only for those rows find row containing index [j] (inside column D ∩ for rows from previous step)
Return [i;j] value
So if u invoked lookup(2, 4)
Find all rows matching i = 2
Row 5: i = 2 ; j = 3
Row 6: i = 2 ; j = 4
Row 7: i = 2 ; j = 5
Lookup column j for j=4 from found rows
Found row 6: i = 2 ; j = 4.
Return value (offset for yij column = 143,33)
Now this isn't an issue algorhitmically speaking, but I have no idea how to go about doing this with excel formulas.
PS: I know this is reltively simple vba issue but I would prefer formulas
PSS: I removed what I tried to make the question more readable.
You can use SUMPRODUCT, which return 0 for not found values:
=SUMPRODUCT(($C$4:$C$18=$I4)*($D$4:$D$18=J$3)*$E$4:$E$18)
or AGGREGATE, which returns an error that can be hidden by the IFERROR function:
=IFERROR(AGGREGATE(15,6,(1/(($C$4:$C$18=$I12)*($D$4:$D$18=J$3)))*$E$4:$E$18,1),"")
You can use SUMIFS here assuming you will not have exact duplicate combinations of [i, j]. If you did have a duplicate combination, the amounts will be summed and placed in the corresponding cell
In cell B2 place this equation: =SUMIFS($Q$2:$Q$16,$P$2:$P$16,B$1,$O$2:$O$16,$A2) and drag across and over as needed
IF you want to convert the 0's to blanks you can nest the above formula inside a text formatter like so:
=TEXT([formula], "0;-0;;#")

How to find common elements in string cells?

I want to find the common elements in multiple (>=2) cell arrays of strings.
A related question is here, and the answer proposes to use the function intersect(), however it works for only 2 inputs.
In my case, I have more than two cells, and I want to obtain a single common subset. Here is an example of what I want to achieve:
c1 = {'a','b','c','d'}
c2 = {'b','c','d'}
c3 = {'c','d'}
c_common = my_fun({c1,c2,c3});
in the end, I want c_common={'c','d'}, since only these two strings occur in all the inputs.
How can I do this with MATLAB?
Thanks in advance,
P.S. I also need the indices from each input, but I can probably do that myself using the output c_common, so not necessary in the answer. But if anyone wants to tackle that too, my actual output will be like this:
[c_common, indices] = my_fun({c1,c2,c3});
where indices = {[3,4], [2,3], [1,2]} for this case.
Thanks,
Listed in this post is a vectorized approach to give us the common strings and indices using unique and accumarray. This would work even when the strings are not sorted within each cell array to give us indices corresponding to their positions within it, but they have to be unique. Please have a look at the sample input, output section* to see such a case run. Here's the implementation -
C = {c1,c2,c3}; % Add more cell arrays here
% Get unique strings and ID each of the strings based on their uniqueness
[unqC,~,unqID] = unique([C{:}]);
% Get count of each ID and the IDs that have counts equal to the number of
% cells arrays in C indicate that they are present in all cell arrays and
% thus are the ones to be finally selected
match_ID = find(accumarray(unqID(:),1)==numel(C));
common_str = unqC(match_ID)
% ------------ Additional work to get indices ----------------
N_str = numel(common_str);
% Store matches as a logical array to be used at later stages
matches = ismember(unqID,match_ID);
% Use ismember to find all those indices in unqID and subtract group
% lengths from them to give us the indices within each cell array
clens = [0 cumsum(cellfun('length',C(1:end-1)))];
match_index = reshape(find(matches),N_str,[]);
% Sort match_index along each column based on the respective unqID elements
[m,n] = size(match_index);
[~,sidx] = sort(reshape(unqID(matches),N_str,[]),1);
sorted_match_index = match_index(bsxfun(#plus,sidx,(0:n-1)*m));
% Subtract cumulative group lens to give us indices corres. to each cell array
common_idx = bsxfun(#minus,sorted_match_index,clens).'
Please note that at the step that calculates match_ID : accumarray(unqID(:),1) could be replaced by histc(unqID,1:max(unqID)). Also, histcounts be another alternative there.
*Sample input, output -
c1 =
'a' 'b' 'c' 'd'
c2 =
'b' 'c' 'a' 'd'
c3 =
'c' 'd' 'a'
common_str =
'a' 'c' 'd'
common_idx =
1 3 4
3 2 4
3 1 2
As noted in the comments to this question, there is a file in File Exchange called "MINTERSECT -- Multiple set intersection." at http://www.mathworks.com/matlabcentral/fileexchange/6144-mintersect-multiple-set-intersection that contains simple code to generalize intersect to multiple sets. In a nutshell, the code gets the output from performing intersect on the first pair of cells and then perform intersect on this output with the next cell. This process continues until all cells have been compared. Note that the author points out that the code is not particularly efficient but it may be sufficient for your use case.

excel macro, need an algorithm to compare two lists

I need to make a macro to compare two columns looking for duplicate cells.
I'm currently using this simple double for loop algorithm
for i = 0 To ColumnASize
Cell1 = Sheet.getCellByPosition(0,i)
for j = 0 to ColumnBSi
Cell2 = Sheet.getCellByPosition(1,j)
' Comparison happens here
Next j
Next i
However, as I have 1000+ items in each column this algorithm is quite slow and inefficient. Does anyone here know/have an idea for a more efficient way to do this?
If you want to ensure that no string in col A is equal to any string in col B, then your existing algorithm is order n^2. You may be able to improve that by the following:
1) Sort col A or a copy of it (order nlogn)
2)Sort col B or a copy of it (order nlogn)
3) Look for duplicates by list traversal, see this previous answer (order n).
That should give you an order nlogn solution and I don't think you can do much better than that.

Union of cell array of cells

I'm looking for the way to do the union of two cell arrays of cell arrays of strings. For example:
A = {{'one' 'two'};{'three' 'four'};{'five' 'six'}};
B = {{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
And I'd like to get something like:
C = {{'one' 'two'};{'three' 'four'};{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
But when I use C = union(A, B) MATLAB returns an error saying:
Input A of class cell and input B of class cell must be cell arrays of strings, unless one is a string.
Does anyone know how to do something like this in a hopefully simple way? I'd greatly appreciate it.
ALTERNATIVE: A way to have a cell array of separated strings in any other way than a cell array of cell array of strings would be also useful, but as far as I know, it's not possible.
Thank you!
C=[A;B]
allWords=unique([A{:};B{:}])
F=cell2mat(cellfun(#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))',C,'uni',false))
[~,uniqueindices,~]=unique(F,'rows')
C(sort(uniqueindices))
What my code does: it builds up a list of all words allwords, then this list is used to build up a matrix which contains the correlation between the rows and which word they contain. 1=Match for first wird, 2=Match for second word. Finally, on this numeric matrix unique can be applied to get the indices.
Including my update, now the 2 words per cell is hardcoded. To get rid of this limitation it would be neseccary to replace the anonymous function (#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))) with a more generic implementation. Probably using cellfun again.
Union doesn't seem like compatible for cell arrays of cells. So, we need to look for some workaround.
One approach would be to get the data from A and B concatenated vertically. Then, along each column assign each cell of strings an unique ID. Those IDs can then be combined into a double array that opens up the possibility of of using unique with 'rows' option to get us the desired output. This is precisely achieved here.
%// Slightly complicated input for safest verification of results
A = {{'three' 'four'};
{'five' 'six'};
{'five' 'seven'};
{'one' 'two'}};
B = {{'seven' 'eight'};
{'five' 'six'};
{'nine' 'ten'};
{'three' 'six'};};
t1 = [A ; B] %// concatenate all cells from A and B vertically
t2 = vertcat(t1{:}) %// Get all the cells of strings from A and B
t22 = mat2cell(t2,size(t2,1),ones(1,size(t2,2)));
[~,~,row_ind] = cellfun(#(x) unique(x,'stable'),t22,'uni',0)
mat1 = horzcat(row_ind{:})
[~,ind] = unique(mat1,'rows','stable')
out1 = t2(ind,:) %// output as a cell array of strings, used for verification too
out = mat2cell(out1, ones(1,size(out1,1)),size(out1,2)) %//desired output
Output -
out1 =
'three' 'four'
'five' 'six'
'five' 'seven'
'one' 'two'
'seven' 'eight'
'nine' 'ten'
'three' 'six'

Resources