I have a multiple dimensions tensor, let's take this simple one as example:
out = torch.Tensor(3, 4, 5)
I have to get a portion/subpart of this tensor out[:,0,:] and then apply the method view(-1), but it's not possible:
out[:,0,:].view(-1)
RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at ../aten/src/TH/generic/THTensor.cpp:203
A solution is to clone the subpart:
out[:,0,:].clone().view(-1)
Is there a better/faster solution than cloning?
What you did will work fine. That said, a more portable approach would be to use reshape which will return a view when possible, but will create a contiguous copy if necessary. That way it will do the fastest thing possible. In your case the data must be copied, but by always using reshape there are cases where a copy won't be produced.
So you could use
out[:,0,:].reshape(-1)
Gotcha
There's one important gotcha here. If you perform in-place operations on the output of reshape then that may or may not affect the original tensor, depending on whether or not a view or copy was returned.
For example, assuming out is already contiguous then in this case
>>> x = out[:,0,:].reshape(-1) # returns a copy
>>> x[0] = 10
>>> print(out[0,0,0].item() == 10)
False
x is a copy so changes to it don't affect out. But in this case
>>> x = out[:,:,0].reshape(-1) # returns a view
>>> x[0] = 10
>>> print(out[0,0,0].item() == 10)
True
x is a view, so in-place changes to x will change out as well.
Alternatives
A couple alternative are
out[:,0,:].flatten() # .flatten is just a special case of .reshape
and
out[:,0,:].contiguous().view(-1)
Though if you want the fastest approach I recommend against the latter method using contiguous().view since, in general, it is more likely than reshape or flatten to return a copy. This is because contiguous will create a copy even if the underlying data has the same number of bytes between subsequent entries. Therefore, there's a difference between
out[:,:,0].contiguous().view(-1) # creates a copy
and
out[:,:,0].flatten() # creates a non-contiguous view (b/c underlying data has uniform spacing of out.shape[2] values between entries)
where the contiguous().view approach forces a copy since out[:,:,0] is not contiguous, but flatten/reshape would create a view since the underlying data is uniformly spaced.
Sometimes contiguous() won't create a copy, for example compare
out[0,:,:].contiguous().view(-1) # creates a view b/c out[0,:,:] already is contiguous
and
out[0,:,:].flatten() # creates a view
which both produce a view of the original data without copying since out[0,:,:] is already contiguous.
If you want to ensure that the out is decoupled completely from its flattened counterpart then the original approach using .clone() is the way to go.
Related
I need an array of the sums of 3x3 neighboring cells with products based on a kernel of a different array with the same size (this is exactly scipy.ndimage.correlate up to this point). But when a value for the new array is calculated it has to be updated immediately instead of using the value from the original array for the next computation involving that value. I have written this slow code to implement it myself, which is working perfectly fine (although too slow for me) and delivering the expected result:
for x in range(width):
for y in range(height):
AArr[y,x] += laplaceNeighborDifference(x,y)
def laplaceNeighborDifference(x,y,z):
global w, h, AArr
return -AArr[y,x]+AArr[(y+1)%h,x]*.2+AArr[(y-1)%h,x]*.2+AArr[y,(x+1)%w]*.2+AArr[y,(x-1)%w]*.2+AArr[(y+1)%h,(x+1)%w]*.05+AArr[(y-1)%h,(x+1)%w]*.05+AArr[(y+1)%h,(x-1)%w]*.05+AArr[(y-1)%h,(x-1)%w]*.05
In my approach the kernel is coded directly. Although as an array (to be used as a kernel) it would be written like this:
[[.05,.2,.05],
[.2 ,-1,.2 ],
[.05,.2,.05]]
The SciPy implementation would work like this:
AArr += correlate(AArr, kernel, mode='wrap')
But obviously when I use scipy.ndimage.correlate it calculates the values entirely based on the original array and doesn't update them as it computes them. At least I think that is the difference between my implementation and the SciPy implementation, feel free to point out other differences if I've missed one. My question is if there is a similar function to the aforementioned with desired results or if there is an approach to code it which is faster than mine?
Thank you for your time!
You can use Numba to do that efficiently:
import numba as nb
#nb.njit
def laplaceNeighborDifference(AArr,w,h,x,y):
return -AArr[y,x]+AArr[(y+1)%h,x]*.2+AArr[(y-1)%h,x]*.2+AArr[y,(x+1)%w]*.2+AArr[y,(x-1)%w]*.2+AArr[(y+1)%h,(x+1)%w]*.05+AArr[(y-1)%h,(x+1)%w]*.05+AArr[(y+1)%h,(x-1)%w]*.05+AArr[(y-1)%h,(x-1)%w]*.05
#nb.njit('void(float64[:,::1],int64,int64)')
def compute(AArr,width,height):
for x in range(width):
for y in range(height):
AArr[y,x] += laplaceNeighborDifference(AArr,width,height,x,y)
Note that modulus are generally very slow. This is better to remove them by computing the border separately of the main loop. The resulting code should be much faster without any modulus.
I came across an h5py tutorial wherein a particular index of an hdf5 file is accessed as follows:
f = h5py.File('random.hdf5', 'r')
data = f['default'][()]
f.close()
print(data[10])
In this manner, even when the file is closed, the data is still accessible. It seems adding [()] no longer makes data a simple pointer, but rather the data object itself. What is the meaning of [()]?
() is an empty tuple. HDF5 datasets can have an arbitrary number of dimensions and support indexing, but some datasets are zero-dimensional (they store a single scalar value). For these, h5py uses indexing with an empty tuple [()] to access that value. You can't use [0] or even [:] because that implies at least one dimension to slice along.
() is an empty tuple, and indexing with an empty tuple is documented in h5py's documentation:
An empty dataset has shape defined as None, which is the best way of determining whether > a dataset is empty or not. An empty dataset can be “read” in a similar way to scalar > datasets, i.e. if empty_dataset is an empty dataset,:
>>> empty_dataset[()]
h5py.Empty(dtype="f")
The dtype of the dataset can be accessed via .dtype as per normal. As empty > datasets cannot be sliced, some methods of datasets such as read_direct will raise an exception if used on a empty dataset.
I'm trying to manipulate individual weights of different neural nets to see how their performance degrades. As part of these experiments, I'm required to sample randomly from their weight tensors, which I've come to understand as sampling with replacement (in the statistical sense). However, since it's high-dimensional, I've been stumped by how to do this in a fair manner. Here are the approaches and research I've put into considering this problem:
This was previously implemented by selecting a random layer and then selecting a random weight in that layer (ignore the implementation of picking a random weight). Since layers are different sizes, we discovered that weights were being sampled unevenly.
I considered what would happen if we sampled according to the numpy.shape of the tensor; however, I realize now that this encounters the same problem as above.
Consider what happens to a rank 2 tensor like this:
[[*, *, *],
[*, *, *, *]]
Selecting a row randomly and then a value from that row results in an unfair selection. This method could work if you're able to assert that this scenario never occurs, but it's far from a general solution.
Note that this possible duplicate actually implements it in this fashion.
I found people suggesting flattening the tensor and use numpy.random.choice to select randomly from a 1D array. That's a simple solution, except I have no idea how to invert the flattened tensor back into its original shape. Further, flattening millions of weights would be a somewhat slow implementation.
I found someone discussing tf.random.multinomial here, but I don't understand enough of it to know whether it's applicable or not.
I ran into this paper about resevoir sampling, but again, it went over my head.
I found another paper which specifically discusses tensors and sampling techniques, but it went even further over my head.
A teammate found this other paper which talks about random sampling from a tensor, but it's only for rank 3 tensors.
Any help understanding how to do this? I'm working in Python with Keras, but I'll take an algorithm in any form that it exists. Thank you in advance.
Before I forget to document the solution we arrived at, I'll talk about the two different paths I see for implementing this:
Use a total ordering on scalar elements of the tensor. This is effectively enumerating your elements, i.e. flattening them. However, you can do this while maintaining the original shape. Consider this pseudocode (in Python-like syntax):
def sample_tensor(tensor, chosen_index: int) -> Tuple[int]:
"""Maps a chosen random number to its index in the given tensor.
Args:
tensor: A ragged-array n-tensor.
chosen_index: An integer in [0, num_scalar_elements_in_tensor).
Returns:
The index that accesses this element in the tensor.
NOTE: Entirely untested, expect it to be fundamentally flawed.
"""
remaining = chosen_index
for (i, sub_list) in enumerate(tensor):
if type(sub_list) is an iterable:
if |sub_list| > remaining:
remaining -= |sub_list|
else:
return i joined with sample_tensor(sub_list, remaining)
else:
if len(sub_list) <= remaining:
return tuple(remaining)
First of all, I'm aware this isn't a sound algorithm. The idea is to count down until you reach your element, with bookkeeping for indices.
We need to make crucial assumptions here. 1) All lists will eventually contain only scalars. 2) By direct consequence, if a list contains lists, assume that it also doesn't contain scalars at the same level. (Stop and convince yourself for (2).)
We also need to make a critical note here too: We are unable to measure the number of scalars in any given list, unless the list is homogeneously consisting of scalars. In order to avoid measuring this magnitude at every point, my algorithm above should be refactored to descend first, and subtract later.
This algorithm has some consequences:
It's the fastest in its entire style of approaching the problem. If you want to write a function f: [0, total_elems) -> Tuple[int], you must know the number of preceding scalar elements along the total ordering of the tensor. This is effectively bound at Theta(l) where l is the number of lists in the tensor (since we can call len on a list of scalars).
It's slow. It's too slow compared to sampling nicer tensors that have a defined shape to them.
It begs the question: can we do better? See the next solution.
Use a probability distribution in conjunction with numpy.random.choice. The idea here is that if we know ahead of time what the distribution of scalars is already like, we can sample fairly at each level of descending the tensor. The hard problem here is building this distribution.
I won't write pseudocode for this, but lay out some objectives:
This can be called only once to build the data structure.
The algorithm needs to combine iterative and recursive techniques to a) build distributions for sibling lists and b) build distributions for descendants, respectively.
The algorithm will need to map indices to a probability distribution respective to sibling lists (note the assumptions discussed above). This does require knowing the number of elements in an arbitrary sub-tensor.
At lower levels where lists contain only scalars, we can simplify by just storing the number of elements in said list (as opposed to storing probabilities of selecting scalars randomly from a 1D array).
You will likely need 2-3 functions: one that utilizes the probability distribution to return an index, a function that builds the distribution object, and possibly a function that just counts elements to help build the distribution.
This is also faster at O(n) where n is the rank of the tensor. I'm convinced this is the fastest possible algorithm, but I lack the time to try to prove it.
You might choose to store the distribution as an ordered dictionary that maps a probability to either another dictionary or the number of elements in a 1D array. I think this might be the most sensible structure.
Note that (2) is truly the same as (1), but we pre-compute knowledge about the densities of the tensor.
I hope this helps.
img[::a,::b] can reduce resolution of image in PIL,but why?
img[::a, ::b] resolution or size is x and y then you will get image which is x/a ,y/b
who knows why or how?
That's the syntax for a multi-dimensional slice. The result includes every ath pixel along the x-axis and every bth pixel along the y-axis.
Slice notation is not too difficult to understand. It looks like start:stop:step, where any of the values can be omitted to get a default. If you're omitting step at the end, the second colon is not required (you can just write start:stop). Slice notation is only allowed when indexing (e.g. foo[start:stop:step]). To make a slice outside that context, you can call the slice constructor (though you may need to pass None for omitted values rather than just skipping them).
The default step value is 1. The default start and stop depend on the sign of step. If step is positive, then start's default is 0 and stop's is the size of the object being sliced. If step is negative, start will default to one less than the size of the object (the largest valid index) and stop will default to an index "just before the first value" (this is not -1 as you might expect, because negative indexes wrap around to the end; the only way you could specify the index normally is -len(...)-1).
Not all Python objects allow multi-dimensional indexing (where the index is a tuple of indexes or slices). Normal lists don't support it (not even if they're nested). The PIL image does however, probably because it is replicating the behavior of a numpy multi-dimensional array.
Imaging you are doing a BFS over a grid (e.g. shortest distance between two cells). Two data structures can be used to host the visited info:
1) List of lists, i.e. data = [[False for _ in range(cols)] for _ in range(rows)]. Later we can access the data in a certain cell by data[r][c].
2) Dict, i.e. data = dict(). Later we can access the data in a certain cell by data[(r, c)].
My question is: which is computationally more efficient in such BFS scenario?
Coding wise it seems the dict approach saves more characters/lines. Memory wise the dict approach can potentially save some space for untouched cells, but can also waste some space for hashtable's extra space.
EDIT
#Peteris mentioned numpy arrays. The advantage over list of lists is obvious: numpy arrays operate on continuous blocks of memory, which allow faster addressing and more cache hits. However I'm not sure how they compare to hashtables (i.e. dict). If the algorithm touches relatively small number of elements, hashtables might provide more cache hits given it's potentially smaller memory footprint.
Also, the truth is that numpy arrays are unavailable to me. So I really need to compare list of lists against dict.
A 2D array
The efficient answer to storing 2D data is 2D arrays/matrixes allocated into a continuous area of memory (not like a list of lists). This avoids the multiple memory lookups required otherwise, and the calculation of a hash value at every lookup that's needed for a dict.
The standard way to do this in python is with the numpy library, here's a simple example
import numpy as np
data = np.zeros( (100, 200) ) # create a 100x200 array and initialize it with zeroes
data[10,20] = 1 # set element at coordinates 10,20 to 1