Is there a way to update multiple indexes of Jax array at once? - jax

Since array is immutable in Jax, so when one updates N indexes, it creates N arrays with
x = x.at[idx].set(y)
With hundreds of updates per training cycle, it will ultimately create hundreds of arrays if not millions.
This seems a little wasteful, is there a way to update multiple index at one go?
Does anyone know if there is overhead or if it's significant? Am I overlook at this?

You can perform multiple updates in a single operation using the syntax you mention. For example:
import jax.numpy as jnp
x = jnp.zeros(10)
idx = jnp.array([3, 5, 7, 9])
y = jnp.array([1, 2, 3, 4])
x = x.at[idx].set(y)
print(x)
# [0. 0. 0. 1. 0. 2. 0. 3. 0. 4.]
You're correct that outside JIT, each update operation will create an array copy. But within JIT-compiled functions, the compiler is able to perform such updates in-place when it is possible (for example, when the original array is not referenced again). You can read more at JAX Sharp Bits: Array Updates.

This sounds very like a job for scatter update. I'm not really familiar with Jax itself, but major frameworks have it:
https://jax.readthedocs.io/en/latest/_autosummary/jax.lax.scatter.html
What it does in a nutshell:
setup your output tensor (x)
accumulate required updates in the other tensor (y in your case)
accumulate in list/tensor indices where to apply you updates (create tensor/list full of index)
feed 1)-3) to scatter_updated

Related

Build two vectors when iterating once over a iterable

I have an iterator of enum that has two variants. The iterator can be quite big, so I'd prefer to avoid iterating over it more than once. Is it possible to collect two vectors by iterating over it once?
Let's say I have a vector of numbers that are positive and negative. I'd like to kind-of sort them during iteration, which will result of two vectors - one with positive and other with negative numbers. Here's an example pseudocode:
let input = vec![1, -2, 4, -5, 3];
let (positive, negative) = input
.iter()
.some_magic_that_will_make_two_iterators()
.collect::<(Vec<_>, Vec<_>)>();
assert_eq!(positive, vec![1, 4, 3]);
assert_eq!(negative, vec![-2, -5])
Is there any way to achieve that? I know that I can define positive and negative first, and just push items during iterating, but that won't be the optimal solution for big collections. In my real case, I expect that there may be around million enums in the initial iterable.
Iterator::partition maybe?
It's eager so there is no collect step. It also can not map the elements when partitioning, so that might not work for your needs: it would work fine for partitioning positive and negative values, but it would not work to partition and unwrap two variants of an enum.
Is there any way to achieve that? I know that I can define positive and negative first, and just push items during iterating, but that won't be the optimal solution for big collections. In my real case, I expect that there may be around million enums in the initial iterable.
I don't see why not, it's pretty much what a partition function will do. And you might be able to size the target collections based on your understanding of the distribution, whereas a partition function would not be.

scipy.ndimage.correlate but updating of values while computing

I need an array of the sums of 3x3 neighboring cells with products based on a kernel of a different array with the same size (this is exactly scipy.ndimage.correlate up to this point). But when a value for the new array is calculated it has to be updated immediately instead of using the value from the original array for the next computation involving that value. I have written this slow code to implement it myself, which is working perfectly fine (although too slow for me) and delivering the expected result:
for x in range(width):
for y in range(height):
AArr[y,x] += laplaceNeighborDifference(x,y)
def laplaceNeighborDifference(x,y,z):
global w, h, AArr
return -AArr[y,x]+AArr[(y+1)%h,x]*.2+AArr[(y-1)%h,x]*.2+AArr[y,(x+1)%w]*.2+AArr[y,(x-1)%w]*.2+AArr[(y+1)%h,(x+1)%w]*.05+AArr[(y-1)%h,(x+1)%w]*.05+AArr[(y+1)%h,(x-1)%w]*.05+AArr[(y-1)%h,(x-1)%w]*.05
In my approach the kernel is coded directly. Although as an array (to be used as a kernel) it would be written like this:
[[.05,.2,.05],
[.2 ,-1,.2 ],
[.05,.2,.05]]
The SciPy implementation would work like this:
AArr += correlate(AArr, kernel, mode='wrap')
But obviously when I use scipy.ndimage.correlate it calculates the values entirely based on the original array and doesn't update them as it computes them. At least I think that is the difference between my implementation and the SciPy implementation, feel free to point out other differences if I've missed one. My question is if there is a similar function to the aforementioned with desired results or if there is an approach to code it which is faster than mine?
Thank you for your time!
You can use Numba to do that efficiently:
import numba as nb
#nb.njit
def laplaceNeighborDifference(AArr,w,h,x,y):
return -AArr[y,x]+AArr[(y+1)%h,x]*.2+AArr[(y-1)%h,x]*.2+AArr[y,(x+1)%w]*.2+AArr[y,(x-1)%w]*.2+AArr[(y+1)%h,(x+1)%w]*.05+AArr[(y-1)%h,(x+1)%w]*.05+AArr[(y+1)%h,(x-1)%w]*.05+AArr[(y-1)%h,(x-1)%w]*.05
#nb.njit('void(float64[:,::1],int64,int64)')
def compute(AArr,width,height):
for x in range(width):
for y in range(height):
AArr[y,x] += laplaceNeighborDifference(AArr,width,height,x,y)
Note that modulus are generally very slow. This is better to remove them by computing the border separately of the main loop. The resulting code should be much faster without any modulus.

View on portion of tensor

I have a multiple dimensions tensor, let's take this simple one as example:
out = torch.Tensor(3, 4, 5)
I have to get a portion/subpart of this tensor out[:,0,:] and then apply the method view(-1), but it's not possible:
out[:,0,:].view(-1)
RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at ../aten/src/TH/generic/THTensor.cpp:203
A solution is to clone the subpart:
out[:,0,:].clone().view(-1)
Is there a better/faster solution than cloning?
What you did will work fine. That said, a more portable approach would be to use reshape which will return a view when possible, but will create a contiguous copy if necessary. That way it will do the fastest thing possible. In your case the data must be copied, but by always using reshape there are cases where a copy won't be produced.
So you could use
out[:,0,:].reshape(-1)
Gotcha
There's one important gotcha here. If you perform in-place operations on the output of reshape then that may or may not affect the original tensor, depending on whether or not a view or copy was returned.
For example, assuming out is already contiguous then in this case
>>> x = out[:,0,:].reshape(-1) # returns a copy
>>> x[0] = 10
>>> print(out[0,0,0].item() == 10)
False
x is a copy so changes to it don't affect out. But in this case
>>> x = out[:,:,0].reshape(-1) # returns a view
>>> x[0] = 10
>>> print(out[0,0,0].item() == 10)
True
x is a view, so in-place changes to x will change out as well.
Alternatives
A couple alternative are
out[:,0,:].flatten() # .flatten is just a special case of .reshape
and
out[:,0,:].contiguous().view(-1)
Though if you want the fastest approach I recommend against the latter method using contiguous().view since, in general, it is more likely than reshape or flatten to return a copy. This is because contiguous will create a copy even if the underlying data has the same number of bytes between subsequent entries. Therefore, there's a difference between
out[:,:,0].contiguous().view(-1) # creates a copy
and
out[:,:,0].flatten() # creates a non-contiguous view (b/c underlying data has uniform spacing of out.shape[2] values between entries)
where the contiguous().view approach forces a copy since out[:,:,0] is not contiguous, but flatten/reshape would create a view since the underlying data is uniformly spaced.
Sometimes contiguous() won't create a copy, for example compare
out[0,:,:].contiguous().view(-1) # creates a view b/c out[0,:,:] already is contiguous
and
out[0,:,:].flatten() # creates a view
which both produce a view of the original data without copying since out[0,:,:] is already contiguous.
If you want to ensure that the out is decoupled completely from its flattened counterpart then the original approach using .clone() is the way to go.

Random Index from a Tensor (Sampling with Replacement from a Tensor)

I'm trying to manipulate individual weights of different neural nets to see how their performance degrades. As part of these experiments, I'm required to sample randomly from their weight tensors, which I've come to understand as sampling with replacement (in the statistical sense). However, since it's high-dimensional, I've been stumped by how to do this in a fair manner. Here are the approaches and research I've put into considering this problem:
This was previously implemented by selecting a random layer and then selecting a random weight in that layer (ignore the implementation of picking a random weight). Since layers are different sizes, we discovered that weights were being sampled unevenly.
I considered what would happen if we sampled according to the numpy.shape of the tensor; however, I realize now that this encounters the same problem as above.
Consider what happens to a rank 2 tensor like this:
[[*, *, *],
[*, *, *, *]]
Selecting a row randomly and then a value from that row results in an unfair selection. This method could work if you're able to assert that this scenario never occurs, but it's far from a general solution.
Note that this possible duplicate actually implements it in this fashion.
I found people suggesting flattening the tensor and use numpy.random.choice to select randomly from a 1D array. That's a simple solution, except I have no idea how to invert the flattened tensor back into its original shape. Further, flattening millions of weights would be a somewhat slow implementation.
I found someone discussing tf.random.multinomial here, but I don't understand enough of it to know whether it's applicable or not.
I ran into this paper about resevoir sampling, but again, it went over my head.
I found another paper which specifically discusses tensors and sampling techniques, but it went even further over my head.
A teammate found this other paper which talks about random sampling from a tensor, but it's only for rank 3 tensors.
Any help understanding how to do this? I'm working in Python with Keras, but I'll take an algorithm in any form that it exists. Thank you in advance.
Before I forget to document the solution we arrived at, I'll talk about the two different paths I see for implementing this:
Use a total ordering on scalar elements of the tensor. This is effectively enumerating your elements, i.e. flattening them. However, you can do this while maintaining the original shape. Consider this pseudocode (in Python-like syntax):
def sample_tensor(tensor, chosen_index: int) -> Tuple[int]:
"""Maps a chosen random number to its index in the given tensor.
Args:
tensor: A ragged-array n-tensor.
chosen_index: An integer in [0, num_scalar_elements_in_tensor).
Returns:
The index that accesses this element in the tensor.
NOTE: Entirely untested, expect it to be fundamentally flawed.
"""
remaining = chosen_index
for (i, sub_list) in enumerate(tensor):
if type(sub_list) is an iterable:
if |sub_list| > remaining:
remaining -= |sub_list|
else:
return i joined with sample_tensor(sub_list, remaining)
else:
if len(sub_list) <= remaining:
return tuple(remaining)
First of all, I'm aware this isn't a sound algorithm. The idea is to count down until you reach your element, with bookkeeping for indices.
We need to make crucial assumptions here. 1) All lists will eventually contain only scalars. 2) By direct consequence, if a list contains lists, assume that it also doesn't contain scalars at the same level. (Stop and convince yourself for (2).)
We also need to make a critical note here too: We are unable to measure the number of scalars in any given list, unless the list is homogeneously consisting of scalars. In order to avoid measuring this magnitude at every point, my algorithm above should be refactored to descend first, and subtract later.
This algorithm has some consequences:
It's the fastest in its entire style of approaching the problem. If you want to write a function f: [0, total_elems) -> Tuple[int], you must know the number of preceding scalar elements along the total ordering of the tensor. This is effectively bound at Theta(l) where l is the number of lists in the tensor (since we can call len on a list of scalars).
It's slow. It's too slow compared to sampling nicer tensors that have a defined shape to them.
It begs the question: can we do better? See the next solution.
Use a probability distribution in conjunction with numpy.random.choice. The idea here is that if we know ahead of time what the distribution of scalars is already like, we can sample fairly at each level of descending the tensor. The hard problem here is building this distribution.
I won't write pseudocode for this, but lay out some objectives:
This can be called only once to build the data structure.
The algorithm needs to combine iterative and recursive techniques to a) build distributions for sibling lists and b) build distributions for descendants, respectively.
The algorithm will need to map indices to a probability distribution respective to sibling lists (note the assumptions discussed above). This does require knowing the number of elements in an arbitrary sub-tensor.
At lower levels where lists contain only scalars, we can simplify by just storing the number of elements in said list (as opposed to storing probabilities of selecting scalars randomly from a 1D array).
You will likely need 2-3 functions: one that utilizes the probability distribution to return an index, a function that builds the distribution object, and possibly a function that just counts elements to help build the distribution.
This is also faster at O(n) where n is the rank of the tensor. I'm convinced this is the fastest possible algorithm, but I lack the time to try to prove it.
You might choose to store the distribution as an ordered dictionary that maps a probability to either another dictionary or the number of elements in a 1D array. I think this might be the most sensible structure.
Note that (2) is truly the same as (1), but we pre-compute knowledge about the densities of the tensor.
I hope this helps.

Is It Possible to the Take the Mode of a Tensor in Tensorflow?

I'm trying to construct a DAG in Tensorflow where I need to take the mode (most frequent value) of individual regions of my target. This is in order to construct a downsampled target.
Right now, I'm pre-processing the downsampled targets for every individual situation I might encounter, saving them, and then loading them. Obviously, this would all be much easier if it was integrated into my Tensorflow graph, so that I could downsample at runtime.
But I've looked everywhere, and I can find no evidence of a tf.reduce_mode, that would function the same as tf.reduce_mean. Is there any way to construct this functionality in a Tensorflow graph?
My idea is that we get the unique numbers and their counts. We then find the numbers that appear most frequently. Finally we fetch those numbers (could be more than one) out by using their indices in the number-count tensor.
samples = tf.constant([10, 32, 10, 5, 7, 9, 9, 9])
unique, _, count = tf.unique_with_counts(samples)
max_occurrences = tf.reduce_max(count)
max_cond = tf.equal(count, max_occurrences)
max_numbers = tf.squeeze(tf.gather(unique, tf.where(max_cond)))
with tf.Session() as sess:
print 'Most frequent Numbers\n', sess.run(max_numbers)
> Most frequent Numbers
9

Resources