How to copy a smaller INDArray to a bigger one - nd4j

I have an INDArray with shape {7,2,3}. I would like to increase one or more of the dimensions {8,3,4} or {7,3,3}, etc and insert the values into the resized array. I understand that the same array cannot be resized to increase length so I intend to create a bigger array with the same rank and inserting the values into it, but even different Nd4j.put methods expect a scalar only for insertion into the new array and for Nd4j.copy to work the shape of the two arrays need to be the same. How can I go about inserting a smaller array into a bigger array, where the indices for any given value would be the same for both and the newer one only allows me to bring in new indices for the array?

Easiest way is probably getting a subarray out of your big array, and calling subarray_of_big_array.assign(small_array)

Related

How to reshape column to matrix using dynamic array function in excel

I have an excel sheet with some column data that I would like to use for some matrix multiplications using MMULT-function. For that purpose I need to reshape the column data first. I would like to do the reshaping using a dynamic array function since that could then feed directly into the MMULT function without having to actually display the reshaped matrix in the sheet (i.e. keeping only the column with the input data visible for the user). I am aware of ideas such as the one outlined here http://www.cpearson.com/excel/VectorToMatrix.aspx however however as far as I can see that requires having the reshaped data displayed in the sheet which I do not want. An alternative could be to enter the arrays directly in the formula using curly brackets, however as far as I can see this notation does not allow cell-references, i.e. something like MMULT({A1,A2,A3;A4,A5,A6},{A7,A8;A9,A10;A11,A12}) is not allowed. Any ideas for solving this issue?
An example is shown below, basically I have the column-data in my sheet, but do not want to repeat the data (as reshaped data), however, I would still like to be able to do display the square of the reshaped matrix.
Reshaped data and matrix multiplication:
For reshaping a 9x1 array into a 3x3 array:
INDEX(B3:B11,SEQUENCE(ROWS(B3:B11)/3,3))

python - changing existing elements in matrix (without array)

I am given a matrix of ints and I need to create a function that does the following -
if a number in a cell = 0, the whole column and row needs to change to 0.
but I need to change the original matrix, not allowed to make a copy.
I can use only very basic stuff like list of lists, for, if, split I don't know array or complicated numbers.
I hope you can help me please :)

Comparing two arrays - Horizontal vs Vertical

What is the case
I'm trying to compare two arrays. For simplicity sake let's assume we want to know how often the values of one array exist in the other array.
My referenced/lookup array data sits in A1:A3
Apple
Lemon
Pear
My search array is NOT in the worksheet, but written {"Apple","Pear"}
Problem
So to know how often our search values exists in the lookuparray we can apply a formula like:
{=SUMPRODUCT(--(range1=range2))}
However, {=SUMPRODUCT(--({"Apple","Pear"}=A1:A3))} produces an error. In other words the lookup array wasn't working as expected.
What did work was using TRANSPOSE() function to create a horizontal array from my data first using {=SUMPRODUCT(--({"Apple","Pear"}=TRANSPOSE(A1:A3)))} resulting in the correct answer of 2!
It seems as though my typed array is automatically handled as an horizontal array, and my data obviously was originally vertical.
To test my hypotheses I tried another formula:
{=SUMPRODUCT(--({"Apple","Pear"}={"Apple","Lemon","Pear"}))}
Both are typed arrays, so with above logic it would both be horizontal arrays, perfectly able to work without using TRANSPOSE(), however this returns an error! #N/A
Again {=SUMPRODUCT(--({"Apple","Pear"}=TRANSPOSE({"Apple","Lemon","Pear"})))} gave a correct answer of 2.
Question
Can someone please explain to me:
The reasoning why horizontal can't be compared to vertical arrays.
Why a typed array would automatically be handled as horizontal
Why in my test of the hypotheses the second typed array was handled as vertical.
I'm really curious, and would also be happy to be linked to appropriate documentation as so far I have not been able to find any.
This might be an easy one to answer, though I can't seem to get my head around the logic.
Can someone please explain to me:
The reasoning why horizontal can't be compared to vertical arrays.
This is actually possible, and you can also compare horizontal arrays with other horizontal arrays.
The reason you have been getting the error is because of the mismatch in the length of the array. Consider the following arrays:
Doing =SUMPRODUCT(--(B3:D3=F3:G3)) is the same (on excel's english version, I'm not 100% sure on the delimiters on other versions) as =SUMPRODUCT(--({"Apple","Lemon","Pear"}={"Apple","Pear"})) and results in =SUMPRODUCT(--(Apple=Apple, Lemon=Pear, Pear=???)), that is the nth element of the first array is compared to the nth element of the second array, and if there is nothing to match --the 3rd element in the 1st array is Pear but there is no 3rd element for the 2nd array-- then you get N/A.
When you compare two arrays, one vertical and one horizontal, excel actually 'expands' the final array. Consider the following (1row x 3col and 2row x 1col):
Doing =SUMPRODUCT(--(B3:D3=F3:F4)) is the same as =SUMPRODUCT(--({"Apple","Lemon","Pear"}={"Apple";"Pear"})) and results in =SUMPRODUCT(--(Apple=Apple, Lemon=Apple, Pear=Apple; Apple=Pear, Lemon=Pear, Pear=Pear)). Basically it feels like Excel expanded the two arrays like this (3col x 2row):
This 'expansion' only happens when one array is 1 row high and the other is 1 column wide I believe, so if you take arrays that have something different, then excel will go back to trying to compare an element with 'nothing' to give N/A (you can use the Evaluate Formula feature under Formula tab to help):
So essentially excel is getting something a bit similar to this, where the first array is multiplied to the second array, giving the result array:
But since the last row and last column involve blanks, you get N/A there.
Why a typed array would automatically be handled as horizontal
In your question, it would seem that , delimit rows, so with =SUMPRODUCT(--({"Apple","Pear"}=A1:A3)) you are observing similar to the comparison of two rows in my first example, while with =SUMPRODUCT(--({"Apple","Pear"}=TRANSPOSE(A1:A3))), you are getting the 'expansion' occurring.
As stated in the comments, on the English version of excel, , delimits columns and ; delimits rows, as can be observed in this simple example where I supply an array with 2 rows and 3 columns, excel shows {0,0,0;0,0,0}:
Why in my test of the hypotheses the second typed array was handled as vertical.
TRANSPOSE simply switches an array from vertical to horizontal (and vice versa), but depending on what you are trying to do, you'll get different results as per the first part of my answer, so you'll either have N/A when excel cannot match an item of an array with another item of the other array, or 'expansion' of the two arrays that results in a bigger array.

Frequency() with arrays: adds an element to return arrays

I'm using the following formula as named formula (via name manager). It is then used in a larger sumproduct(). The goal is to ensure that with an array calculation, the calculation is only made once for certain groups of rows (e.g. you have the same data repeated accross many rows for category A. I only need to know how many people are in category A once).
=IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],
tdata[reportUUID],0),0),IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],
tdata[reportUUID],0),0))>0,TRUE)
Let's step through the results one by one with the evaluate formula in Excel. Sorry for the screenshot, but Excel doesn't allow to copy actual steps with real data....
In order of steps:
In the last image, there's now a 7th item in my array. I only have 6 row of data, hence why for the previous steps I only had 6 items in the array, as was expect.
This is messing up my calculations, because the return array from this function gets multiplied by others arrays which all have 6 items (or whatever is the number of data rows I have).
What is this 7th item, and how can I either get ride of it or prevent it from return errors?
I did try to wrap some formula into iferror() or ifna(), however it doesn't feel clean. I feel this might backfire and isn't a strong way to handle this. I rather take it at the source....
EDIT: For example of use with other arrays:
{=SUMPRODUCT(--IFERROR(((tdata[_isVisible]=1)*(f_uniqueUUIDfactor),0))}
Where f_uniqueUUIDfactor is the formula from the initial post. tdata[_isVisible]=1 is used as a way to filter data on the dashboard (e.g. through dropdown, the users can set ranges for dates, and with VBA I hide the rows in the raw data NOT within the range).
The point is that sumproduct() ends up multipliying each raw data row thogheter as 0 & 1 s, so that only those meeting all the criterias get returned. The IFERROR() above is the workaround for the extra array element introduced by frequency(). It works as is, but if a cleaner way exists I'd prefer that. I would also be keen on understanding why that elements get added.
This is a good example of why it is preferable to use multiple, recursive IF statements when evaluating arrays over multiple criteria, rather than form the product of those arrays.
Firstly, though, before coming to the reason for that statement, I should point out a few minor technical inaccuracies/flaws with your construction also.
1) By including a value_if_false clause in your constructions being passed as FREQUENCY's data_array and bins_array parameters, you are risking incorrect results, since zero is a valid numerical to be considered by FREQUENCY, whereas a Boolean FALSE (which would be the equivalent entry in the resulting array had you omitted the value_if_false clause altogether) is disregarded by this function.
2) MATCH with an exact (i.e. 0, or FALSE) match_type parameter is a relatively resource-heavy construction, particularly if the range to be considered is quite large. As such, and since it is not necessary to use this construction for FREQUENCY's bins_array parameter, it is preferable to use the more efficient:
ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1
Moreover, note that repetition of the IF(LEN construction is also not necessary within this second parameter.
In all, then:
IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],tdata[reportUUID],0)),ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1)>0,TRUE)
is considerably more rigorous and more efficient than the version you give.
To answer your main question, it is well-documented that FREQUENCY always returns an array having a number of entries one greater than that of the bins_array passed.
As mentioned in my comment to your post, the resolution to the problem you are facing largely depends on precisely what further manipulation you are intending for the resulting array.
However, let's assume for the sake of an explanation that you simply wish to multiply the array resulting from your FREQUENCY construction by some other column within your table, tdata[Column2] say, and then sum the result.
The difference between:
=SUM(IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],tdata[reportUUID],0)),ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1)>0,TRUE)*tdata[Column2])
i.e. using multiplication of the two arrays, and:
=SUM(IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],tdata[reportUUID],0)),ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1)>0,tdata[Column2]))
i.e. using a straightforward IF clause, is here crucial.
In fact, the former will always return an error, whereas the latter, in general, will not.
The reason is that the former will resolve to (assuming that your table has e.g. 10 rows' worth of data and assuming some random Boolean results to the FREQUENCY construction):
=SUM(IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE},TRUE)*tdata[Column2])
which is, since the value_if_true clause is superfluous here:
=SUM({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}*tdata[Column2])
whereas the second construction I give will resolve to:
=SUM(IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE},tdata[Column2]))
The two may look identical, but the fact that the former is using multiplication to resolve the array, whereas the latter is not, is the key difference.
Although in both cases the array resulting from the FREQUENCY construction, i.e.:
{TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}
comprises 11 entries (i.e. 1 more than the number of entries in the second array being considered), the difference is that, when you then attempt to multiply an 11-element array with a 10-element array (i.e. tdata[Column2]), Excel, rather than outright disallowing such an operation, artificially redimensions the smaller of the two arrays such that it matches the dimensions of the larger.
In doing so, however, any additional entries are automatically set as #N/A error values.
Effectively, then:
=SUM({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}*tdata[Column2])
is resolved as:
=SUM({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}*{38;67;49;3;10;11;97;20;3;57;#N/A})
i.e., as mentioned, the second, 10-element array is redimensioned to one of 11 elements in an attempt to form a legitimate operation. And, as also mentioned, that 11th element is #N/A, which means of course that the entire construction will also result in that value.
In the non-multiplication version, however, i.e.:
=SUM(IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE},tdata[Column2]))
although the same redimensiong also takes place, we are saved by our use of an IF clause in place of multiplication, since the above resolves to:
=SUM(IF({TRUE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;FALSE},{38;67;49;3;10;11;97;20;3;57;#N/A}))
and the Boolean FALSE in the 11th position here 'overrides' the error value in the equivalent position from the second array, since the above resolves to:
=SUM({38;FALSE;49;FALSE;10;11;97;FALSE;3;FALSE;FALSE})
Regards

use offset to refer an array in excel

I was trying to calculate the R-squared value of two arrays using RSQ function. One array is fixed, another is located in different columns. I want to generate a code so that i will be able to generate R-squared value for all variables by drag down the cell.
i tried
=RSQ($H$4:$H$102,OFFSET($A$4:$A$102,0,ROW(Z3)-2))
where ROW(Z3)-2 = 1 and the offset part should refer to B4:B102.
The result of RSQ was #N/A. But when I tried SUM(OFFSET($A$4:$A$102,0,ROW(Z3)-2)) it do gives me a correct sum for B4:B102. Can anyone help me out with this problem?
thanks!!!
=RSQ($H$4:$H$102,OFFSET($A$4:$A$102,0,MAX(ROW(Z3)-2)))
The problem seems to be that ROW(n) returns a 1x1 array. I'm guessing Excel is complaining that the 1x1 array is not sized the same as other arrays you are using. Wrapping it in MAX seems to work around this by returning the value in that array, and calculation goes on.
I must say I have not noticed this behavior before. Good question.

Resources