Tensorflow histogram with custom bins - python-3.x

I have two tensors - one with bin specification and the other one with observed values. I'd like to count how many values are in each bin.
I know how to do this in either NumPy or bare Python, but I need to do this in pure TensorFlow. Is there a more sophisticated version of tf.histogram_fixed_width with an argument for bin specification?
Example:
# Input - 3 bins and 2 observed values
bin_spec = [0, 0.5, 1, 2]
values = [0.1, 1.1]
# Histogram
[1, 0, 1]

This seems to work, although I consider it to be quite memory- and time-consuming.
import tensorflow as tf
bins = [-1000, 1, 3, 10000]
vals = [-3, 0, 2, 4, 5, 10, 12]
vals = tf.constant(vals, dtype=tf.float64, name="values")
bins = tf.constant(bins, dtype=tf.float64, name="bins")
resh_bins = tf.reshape(bins, shape=(-1, 1), name="bins-reshaped")
resh_vals = tf.reshape(vals, shape=(1, -1), name="values-reshaped")
left_bin = tf.less_equal(resh_bins, resh_vals, name="left-edge")
right_bin = tf.greater(resh_bins, resh_vals, name="right-edge")
resu = tf.logical_and(left_bin[:-1, :], right_bin[1:, :], name="bool-bins")
counts = tf.reduce_sum(tf.to_float(resu), axis=1, name="count-in-bins")
with tf.Session() as sess:
print(sess.run(counts))

Related

How to reorder tensor based on indexes tensor from the same size

Say I have tensor A, and indexes Tensor: A = [1, 2, 3, 4], indexes = [1, 0, 3, 2]
I want to create a new Tensor from these two with the following result : [2, 1, 4, 3]
Each element of the result is element from A and the order is defined by the indexes Tensor.
Is there a way to do it with PyTorch tensor ops without loops?
My goal is to do it for 2D Tensor, but I don't think there is a way to do it without loops, so I thought to project it to 1D, do the work and project it back to the 2D.
You can use scatter:
A = torch.tensor([1, 2, 3, 4])
indices = torch.tensor([1, 0, 3, 2])
result = torch.tensor([0, 0, 0, 0])
print(result.scatter_(0, indices, A))
In 1D you can simply perform A[indexes].
In 2D it is still doable in this way:
A = torch.arange(5, 10).repeat(3, 1) # shape: (3, 5)
indexes = torch.stack([torch.randperm(5) for _ in range(3)]) # shape (3, 5)
A_sort = A[torch.arange(3).unsqueeze(1), indexes]
print(A_sort)

Is there any Softmax implementation with sections along the dim (blocky Softmax) in PyTorch?

For example, given logits, dim, and boundary,
boundary = torch.tensor([[0, 3, 4, 8, 0]
[1, 3, 5, 7, 9]]
# representing sections look like:
# [[00012222_]
# [_00112233]
# in shape: (2, 9)
# (sections cannot be sliced)
logits = torch.rand(2, 9, 100)
result = blocky_softmax(logits, dim = 1, boundary = boundary)
# result[:, :, 0] may look like:
# [[0.33, 0.33, 0.33, 1.00, 0.25, 0.25, 0.25, 0.25, 0.0 ]
# [0.0, 0.50, 0.50, 0.50, 0.50, 0.50, 0.50, 0.50, 0.50]]
# other 99 slices looks similar with each blocks sum to 1.
we hope the Softmax is applied to dim = 1, but sections are also applied to this dim.
My current implementation with PyTorch is using for. It is slow and cost too much memory,
which looks like:
def blocky_softmax(logits, splits, map_inf_to = None):
_, batch_len, _ = logits.shape
exp_logits = logits.exp() # [2, 9, 100]
batch_seq_idx = torch.arange(batch_len, device = logits.device)[None, :]
base = torch.zeros_like(logits)
_, n_blocks = splits.shape
for nid in range(1, n_blocks):
start = splits[:, nid - 1, None]
end = splits[:, nid, None]
area = batch_seq_idx >= start
area &= batch_seq_idx < end
area.unsqueeze_(dim = 2)
blocky_z = area * blocky_z
base = base + blocky_z
if map_inf_to is not None:
good_base = base > 0
ones = torch.ones_like(base)
base = torch.where(good_base, base, ones)
exp_logits = torch.where(good_base, exp_logits, ones * map_inf_to)
return exp_logits / base
This implementation is slowed and fattened by n_blocks times. But it could be parallel with each section.
If there is no off-the-shelf function, should I write a CUDA/C++ library? I hope you could help with my issue.
For further generalization, I hope there are discontinuities in boundary/sections.
sections = torch.tensor([[ 0, 0, 0, -1, 2, 3, 2, 3, 0, 3]
[-1, 0, 0, 1, 2, 1, 2, 1, -1, 1]]
# [[000_232303]
# [_0012121_1]]
Thank you for reading:)
I realize that scatter_add and gather perfectly solve the problem.

How to do the following entry gathering?

I want out[b,i,j,c]:=params[indices[b,i,j,c],b,i,j,c]. I am aware of tf.gather and tf.gather_nd but not sure how to achieve this.
You can do that like this:
import tensorflow as tf
# 5D or more tensor
params = tf.placeholder(tf.float32, [2, 3, 4, 5, 6])
# 4D tensor
indices = tf.placeholder(tf.int32, [5, 4, 3, 2])
# We assume the number of dimensions of indices is statically known
# Otherwise you would need to use tf.while_loop
ndims = indices.shape.ndims
# Get shape of indices
s = tf.shape(indices, out_type=indices.dtype)
# Make grid of additional indices
ranges = [tf.range(s[i]) for i in range(ndims)]
grid = tf.meshgrid(*ranges, indexing='ij')
# Put grid together with indices
indices_all = tf.stack([indices] + grid, axis=-1)
# Gather result
out = tf.gather_nd(params, indices_all)
print(out)
# Tensor("GatherNd:0", shape=(5, 4, 3, 2), dtype=float32)

How to use tf.gather in batch?

I have a A = 10x1000 tensor and a B = 10x1000 index tensor. The tensor B has values between 0-999 and it's used to gather values from A (B[0,:] gathers from A[0,:], B[1,:] from A[1,:], etc...).
However, if I use tf.gather(A, B) I get an array of shape (10, 1000, 1000) when I'm expecting a 10x1000 tensor back. Any ideas how I could fix this?
EDIT
Let's say A= [[1, 2, 3],[4,5,6]] and B = [[0, 1, 1],[2,1,0]] What I want is to be able to sample A using the corresponding B. This should result in C = [[1, 2, 2],[6,5,4]].
Dimensions of tensors are known in advance.
First we 'unstack' both the parameters and indices (A and B respectively) along the first dimension. Then we apply tf.gather() such that rows of A correspond to the rows of B. Finally, we stack together the result.
import tensorflow as tf
import numpy as np
def custom_gather(a, b):
unstacked_a = tf.unstack(a, axis=0)
unstacked_b = tf.unstack(b, axis=0)
gathered = [tf.gather(x, y) for x, y in zip(unstacked_a, unstacked_b)]
return tf.stack(gathered, axis=0)
a = tf.convert_to_tensor(np.array([[1, 2, 3], [4, 5, 6]]), tf.float32)
b = tf.convert_to_tensor(np.array([[0, 1, 1], [2, 1, 0]]), dtype=tf.int32)
gathered = custom_gather(a, b)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(gathered))
# [[1. 2. 2.]
# [6. 5. 4.]]
For you initial case with shapes 1000x10 we get:
a = tf.convert_to_tensor(np.random.normal(size=(10, 1000)), tf.float32)
b = tf.convert_to_tensor(np.random.randint(low=0, high=999, size=(10, 1000)), dtype=tf.int32)
gathered = custom_gather(a, b)
print(gathered.get_shape().as_list()) # [10, 1000]
Update
The first dimension is unknown (i.e. None)
The previous solution works only if the first dimension is known in advance. If the dimension is unknown we solve it as follows:
We stack together two tensors such that the rows of both tensors are stacked together:
# A = [[1, 2, 3], [4, 5, 6]] [[[1 2 3]
# ---> [0 1 1]]
# [[4 5 6]
# B = [[0, 1, 1], [2, 1, 0]] [2 1 0]]]
We iterate over the elements of this stacked tensor (which consists of stacked together rows of A and B) and using tf.map_fn() function we apply tf.gather().
We stack back the elements we get with tf.stack()
import tensorflow as tf
import numpy as np
def custom_gather_v2(a, b):
def apply_gather(x):
return tf.gather(x[0], tf.cast(x[1], tf.int32))
a = tf.cast(a, dtype=tf.float32)
b = tf.cast(b, dtype=tf.float32)
stacked = tf.stack([a, b], axis=1)
gathered = tf.map_fn(apply_gather, stacked)
return tf.stack(gathered, axis=0)
a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)
b = np.array([[0, 1, 1], [2, 1, 0]], dtype=np.int32)
x = tf.placeholder(tf.float32, shape=(None, 3))
y = tf.placeholder(tf.int32, shape=(None, 3))
gathered = custom_gather_v2(x, y)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(gathered, feed_dict={x:a, y:b}))
# [[1. 2. 2.]
# [6. 5. 4.]]
Use tf.gather with batch_dims=-1:
import numpy as np
import tensorflow as tf
rois = np.array([[1, 2, 3],[3, 2, 1]])
ind = np.array([[0, 2, 1, 1, 2, 0, 0, 1, 1, 2],
[0, 1, 2, 0, 2, 0, 1, 2, 2, 2]])
tf.gather(rois, ind, batch_dims=-1)
# output:
# <tf.Tensor: shape=(2, 10), dtype=int64, numpy=
# array([[1, 3, 2, 2, 3, 1, 1, 2, 2, 3],
# [3, 2, 1, 3, 1, 3, 2, 1, 1, 1]])>

Tensorflow transformations for frame sliding (numpy stride_tricks)

My input in a tensorflow graph comes as a vector which contains multiple overlapping windows. How can I create this array using only tensorflow operations?
input = [1,2,3,4,5,6,7,8]
shift = 2
window_width = 4
count = (len(input) - window_width) // 2 + 1 = 3
output = [[1,2,3,4],
[3,4,5,6],
[5,6,7,8]]
In numpy I would use stride_tricks, but something similar isn't available in tensorflow. How should I approach this?
TensorFlow doesn't have stride_tricks. The following would work for your particular use case.
>>> b=tf.concat(0, [tf.reshape(input[i:i+4], [1, window_width]) for i in range(0, len(input) - window_width + 1, shift)])
>>> with tf.Session(""): b.eval()
...
array([[1, 2, 3, 4],
[3, 4, 5, 6],
[5, 6, 7, 8]], dtype=int32)
If your input is large, you may also want to look at slice_input_producer.
In case somebody still needs this, there is a function tf.signal.frame().
For the example given in the question, the code would be:
output = tf.signal.frame(input, window_width, shift)
You can use tf.map_fn() to accomplish this:
input = [1,2,3,4,5,6,7,8]
shift = 2
window_width = 4
limit = len(input) - window_width + 1
input_tensor = tf.placeholder(tf.int32, shape=(8,))
output_tensor = tf.map_fn(lambda i: input_tensor[i:i+window_width], elems=tf.range(start=0, limit=limit, delta=shift))
with tf.Session() as sess:
answer_test = sess.run(output_tensor, feed_dict = {input_tensor:input})
print(answer_test)
: [[1 2 3 4]
[3 4 5 6]
[5 6 7 8]]

Resources