Related
I have a PyTree params (in my case a nested dictionary) containing my parameters of a neural network. My goal is to compute the diagonal entries of the Hessian of a loss function with respect to the parameters and store it in a PyTree of the same structure as the parameters.
When I call jax.hessian(loss_fn)(params, data), I get a (as expected) an even more nested dictionary with the full Hessian.
How can I transform this dictionary to get the desired PyTree with diagonal entries?
To be more concrete: Lets say I have only 1 layer in my network and paramsis given by
params:
'linear':
'w': DeviceArray() of shape [5 x 1]
'b': DeviceArray() of shape [1]
The returned Hessian has the keys and shape given by
hessian:
'linear':
'b':
'linear':
'b': (1, 1),
'w': (1, 5, 1),
'w':
'linear':
'b': (5, 1, 1),
'w': (5, 1, 5, 1)
As far as I understand it, I need the entries
jnp.diag(hessian['linear']['b']['linear']['b'])
as the diagonal hessian for the bias and
jnp.diag(jnp.squeeze(hessian['linear']['w']['linear']['w']))
as the diagonal hessian for the weights. (However, the squeeze may only work for 1 dim outputs...)
How can I automate this transformation in order to work for more complex models with multiple layers?
I know that this does not scale to huge networks, I need it for testing purposes of optimizers.
I ran into the exact same problem. Unfortunately, working with Pytrees in Jax can be awkward. I was also looking at a way to construct the diagonal Hessian entry-for-entry, since that could yield a practical method.
I now have the following:
def ravelled_diagonal_indices(dims: Sequence[int]) -> jnp.ndarray:
# Get the indices for the diagonal elements of a flattened square matrix.
return (dims[0] + 1) * jnp.arange(dims[0])
# Alias to reduce clutter.
_diag_idx = ravelled_diagonal_indices
def tree_matrix_diagonal(tree: Any, reference: Optional[Any] = None) -> Any:
"""Utility function for extracting the diagonal of a Pytree of jax.numpy.array objects.
The Pytree is assumed to be square in its children and in its array objects.
Parameters
----------
tree : Any
Pytree of jax.numpy.array objects for which the number of Pytree leaves and
the sizes of each constituent array is square.
reference : Any, default = None
The intended structure for the diagonal of `tree`. For example, this can be
the Pytree with which `tree` could have been created through e.g., an outer-product
or the Hessian of a function.
Returns
-------
diag : Any
Pytree containing the flattened diagonals of `tree` if no reference was provided.
Otherwise, the diagonal elements are shaped according to the structure of `reference`.
"""
flat = jax.tree_leaves(tree)
h = jax.numpy.sqrt(len(flat)).astype(int)
_idx = _diag_idx((h,))
block_diag = [flat[i] for i in _idx]
flat_diagonal = lambda w: w.ravel()[_diag_idx((jax.numpy.sqrt(w.size).astype(int),))]
diag = jax.tree_map(flat_diagonal, block_diag)
if reference is not None:
# Reshape the diagonal Pytree to reference Pytree structure and shape
diag_tree = jax.tree_unflatten(jax.tree_structure(reference), diag)
diag = jax.tree_multimap(lambda a, b: a.reshape(jax.numpy.shape(b)), diag_tree, reference)
return diag
When I try this out on the Hessian of a very simple MLP:
params
>> {'dense/~/affine': {'weights': DeviceArray([[ 1. , 1. ],
[ 0.546326 , -0.77997607]], dtype=float32)},
'dense_1/~/affine': {'weights': DeviceArray([[ 1. ],
[-0.5155028],
[ 0.9487318]], dtype=float32)}}
hessian
>> {'dense/~/affine': {'weights': {'dense/~/affine': {'weights': DeviceArray([[[[[-0.02324889, 0.04278728],
[ 0.00814307, -0.01498652]],
[[ 0.04278728, -0.07874574],
[-0.01498652, 0.0275812 ]]],
[[[ 0.00814307, -0.01498652],
[-0.00285216, 0.00524912]],
[[-0.01498652, 0.0275812 ],
[ 0.00524912, -0.00966049]]]]], dtype=float32)},
'dense_1/~/affine': {'weights': DeviceArray([[[[[ 0.04509945],
[ 0.15897979],
[ 0.05742025]],
[[-0.08300105],
[-0.06711845],
[ 0.01683405]]],
[[[-0.01579637],
[-0.05568369],
[-0.02011181]],
[[ 0.02907166],
[ 0.02350867],
[-0.00589623]]]]], dtype=float32)}}},
'dense_1/~/affine': {'weights': {'dense/~/affine': {'weights': DeviceArray([[[[[ 0.04509945, -0.08300105],
[-0.01579637, 0.02907165]]],
[[[ 0.15897979, -0.06711845],
[-0.0556837 , 0.02350867]]],
[[[ 0.05742024, 0.01683406],
[-0.02011181, -0.00589624]]]]], dtype=float32)},
'dense_1/~/affine': {'weights': DeviceArray([[[[[-0.08748633],
[-0.07074545],
[-0.11138687]]],
[[[-0.07074545],
[-0.05720801],
[-0.09007253]]],
[[[-0.11138687],
[-0.09007251],
[-0.14181684]]]]], dtype=float32)}}}}
Then, the function returns:
tree_matrix_diagonal(hessian, reference=params)
>> {'dense/~/affine': {'weights': DeviceArray([[-0.02324889, -0.07874574],
[-0.00285216, -0.00966049]], dtype=float32)},
'dense_1/~/affine': {'weights': DeviceArray([[-0.08748633],
[-0.05720801],
[-0.14181684]], dtype=float32)}}
Upon visual inspection, you can see that the returned elements are indeed the diagonal elements of hessian cast to the canonical structure of params.
Funnily enough, for the Gauss-Newton approximation to the Hessian the procedure is much simpler. Simply take the element-wise square of the Jacobians :).
I have two tensors a and b which are of different dimensions. a is of shape [100,100] and b is of the shape [100,3,10]. I want to concatenate these two tensors.
For example:
a = torch.randn(100,100)
tensor([[ 1.3236, 2.4250, 1.1547, ..., -0.7024, 1.0758, 0.2841],
[ 1.6699, -1.2751, -0.0120, ..., -0.2290, 0.9522, -0.4066],
[-0.3429, -0.5260, -0.7748, ..., -0.5235, -1.8952, 1.2944],
...,
[-1.3465, 1.2641, 1.6785, ..., 0.5144, 1.7024, -1.0046],
[-0.7652, -1.2940, -0.6964, ..., 0.4661, -0.3998, -1.2428],
[-0.4720, -1.0981, -2.3715, ..., 1.6423, 0.0560, 1.0676]])
The tensor b is as follows:
tensor([[[ 0.4747, -1.9529, -0.0448, ..., -0.9694, 0.8009, -0.0610],
[ 0.5160, 0.0810, 0.1037, ..., -1.7519, -0.3439, 1.2651],
[-0.5975, -0.2000, -1.6451, ..., 1.3082, -0.4023, -0.3105]],
...,
[[ 0.4747, -1.9529, -0.0448, ..., -0.9694, 0.8009, -0.0610],
[ 0.1939, 1.0365, -0.0927, ..., -2.4948, -0.2278, -0.2390],
[-0.5975, -0.2000, -1.6451, ..., 1.3082, -0.4023, -0.3105]]],
dtype=torch.float64, grad_fn=<CopyBackwards>)
I want to concatenate such that the first row in tensor a of size [100] is concatenated with the first row in tensor b which is of size [3,10]. This should be applicable to all rows in both tensors. That is, in simple words, considering just the first row in a and b, I want to get an output with size [100,130] as follows:
[ 1.3236, 2.4250, 1.1547, ..., -0.7024, 1.0758, 0.2841, 0.4747, -1.9529, -0.0448, ..., -0.9694, 0.8009, -0.0610, 0.5160, 0.0810, 0.1037, ..., -1.7519, -0.3439, 1.2651, -0.5975, -0.2000, -1.6451, ..., 1.3082, -0.4023, -0.3105]
In order to do this, I performed unsqueezed to tensor a to get the two tensors in the same dimensions as follows.
a = a.unsqueeze(1)
When I perform torch.cat([a,b], I still get an error. Can somebody help me in solving this?
Thanks in advance.
Reshape b tensor accordingly and then merge it to a using torch.cat on 1 dim
torch.cat((a, b.reshape(100, -1)), dim=1)
For Python 3.8 and TensorFlow 2.5, I have a 3-D tensor of shape (3, 3, 3) where the goal is to compute the L2-norm for each of the three (3, 3) square matrices. The code that I came up with is:
a = tf.random.normal(shape = (3, 3, 3))
a.shape
# TensorShape([3, 3, 3])
a.numpy()
'''
array([[[-0.30071023, 0.9958398 , -0.77897555],
[-1.4251901 , 0.8463568 , -0.6138699 ],
[ 0.23176959, -2.1303613 , 0.01905925]],
[[-1.0487134 , -0.36724553, -1.0881581 ],
[-0.12025198, 0.20973174, -2.1444907 ],
[ 1.4264063 , -1.5857363 , 0.31582597]],
[[ 0.8316077 , -0.7645084 , 1.5271858 ],
[-0.95836663, -1.868056 , -0.04956183],
[-0.16384012, -0.18928945, 1.04647 ]]], dtype=float32)
'''
I am using axis = 2 since the 3rd axis should contain three 3x3 square matrices. The output I get is:
tf.math.reduce_euclidean_norm(input_tensor = a, axis = 2).numpy()
'''
array([[1.299587 , 1.7675754, 2.1430166],
[1.5552354, 2.158075 , 2.15614 ],
[1.8995634, 2.1001325, 1.0759989]], dtype=float32)
'''
How are these values computed? The formula for computing L2-norm is this. What am I missing?
Also, I was expecting three L2-norm values, one for each of the three (3, 3) matrices. The code I have to achieve this is:
tf.math.reduce_euclidean_norm(a[0]).numpy()
# 3.0668826
tf.math.reduce_euclidean_norm(a[1]).numpy()
# 3.4241767
tf.math.reduce_euclidean_norm(a[2]).numpy()
# 3.0293021
Is there any better way to get this without having to explicitly refer to each indices of tensor 'a'?
Thanks!
The formula you linked for computing the L2 norm looks correct. What you have is basically this:
np.sqrt(np.sum((a[0]**2)))
# 3.0668826
np.sqrt(np.sum((a[1]**2)))
# 3.4241767
np.sqrt(np.sum((a[2]**2)))
# 3.0293021
This can be vectorized by the following:
np.sqrt(np.sum(a**2, axis=(1,2)))
Output:
array([3.0668826, 3.4241767, 3.0293021], dtype=float32)
Which is effectively the same as using np.lingalg.norm (or tf.math.reduce_euclidean_norm if you want to use tensorflow)
np.linalg.norm(a, ord=None, axis=(1,2))
Output:
array([3.0668826, 3.4241767, 3.0293021], dtype=float32)
The default keyword ord=None is for calculating the L2 norm per the documentation. The axis keyword is to specify which dimensions we want to reduce which should be clear from the first code snippet.
Given a list of arrays in this format:
[array([[63371.29484043],
[65000. ],
[51114.1118643 ],
[39000. ],
[61549.2893635 ],
[58204.43242583]]), array([[28750. ],
[19166.90102574],
[19667.19108884],
[17250. ]]), array([[32188.01786071],
[33625. ],
[23988.53674308],
[29354.92883394],
[31657.26571235],
[20175. ]])]`
I would like to print it as a list without square brackets in it e.g.
a = [18758.98675732, 23418.72996313 ... 20134.77503711]
I can apply .tolist but not strip to get rid of the inner brackets.
How do I do this?
Thanks!
Try to understand the nature of the object before worrying too much about display details. Display follows from the list's structure. Pay special attention to len (for a list) and shape (for an array).
In [119]: alist=[np.array([[63371.29484043],
...: [65000. ],
...: [51114.1118643 ],
...: [39000. ],
...: [61549.2893635 ],
...: [58204.43242583]]), np.array([[28750. ],
...: [19166.90102574],
...: [19667.19108884],
...: [17250. ]]), np.array([[32188.01786071],
...: [33625. ],
...: [23988.53674308],
...: [29354.92883394],
...: [31657.26571235],
...: [20175. ]])]
You have a list of arrays that differ in shape:
In [120]: len(alist)
Out[120]: 3
In [121]: [x.shape for x in alist]
Out[121]: [(6, 1), (4, 1), (6, 1)]
You could flatten each array, producing ones that a (6,),(4,) and (6,) shape:
In [122]: [x.ravel() for x in alist]
Out[122]:
[array([63371.29484043, 65000. , 51114.1118643 , 39000. ,
61549.2893635 , 58204.43242583]),
array([28750. , 19166.90102574, 19667.19108884, 17250. ]),
array([32188.01786071, 33625. , 23988.53674308, 29354.92883394,
31657.26571235, 20175. ])]
hstack can join them into on array. Use .tolist() if you want a list as the final result:
In [123]: np.hstack(_)
Out[123]:
array([63371.29484043, 65000. , 51114.1118643 , 39000. ,
61549.2893635 , 58204.43242583, 28750. , 19166.90102574,
19667.19108884, 17250. , 32188.01786071, 33625. ,
23988.53674308, 29354.92883394, 31657.26571235, 20175. ])
Since the arrays differ in the first dimension, we could also use:
In [127]: np.vstack(alist).shape
Out[127]: (16, 1)
In [128]: np.vstack(alist).ravel()
You can use map to get the first number of each sublist and flatten the array, then turn it into a list with list:
list(map(lambda l : l[0], a))
Output:
[18758.98675732, 23418.72996313, 23625.0, 14175.0, 21015.48300191, 20134.77503711]
You can also use numpy's multidimensional array indexing to extract only the first number of each sublist:
list(a[:, 0])
Numpy has built-in flatten function:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel.html
list(a[0].ravel())
I have a net with a length 4 input vector, length 2 output vector. I am trying to predict multiple inputs simultaneously. If I just want to predict one, I would do the following and it works:
in = numpy.array( [ [1,2,3,4] ] )
self.model.predict(in)
# prediction = [ [1,2] ]
However, when I try to pass in multiple inputs I get ValueError: Error when checking input: expected dense_1_input to have shape (4,) but got array with shape (1,)
in = numpy.array( [
[1,2,3,4],
[1,2,3,4]
]
)
#OR
in = numpy.array( [
[ [1,2,3,4] ],
[ [1,2,3,4] ]
]
)
self.model.predict(in)
#ERR
What am I doing wrong?
Edit:
Code =
model = Sequential()
model.add(Dense(24, input_dim=4, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(4, activation='linear'))
model.compile(loss='mse',
optimizer=Adam(lr=self.learning_rate))
print(batch_arr[:,3][0])
predictions = self.model.predict(batch_arr[:,3][0])
print(predictions)
print(batch_arr[:,3])
predictions = model.predict(batch_arr[:,3])
Output =
[[-0.00441936 -0.20398824 -0.08134908 0.09739554]]
[[ 0.01860509 -0.01136071]]
[array([[-0.00441936, -0.20398824, -0.08134908, 0.09739554]])
array([[-0.00517939, 0.38975933, -0.11951023, -0.9718224 ]])
array([[0.00272119, 0.0025476 , 0.002645 , 0.03973542]])
array([[-0.00421809, -0.01006362, -0.07795483, -0.16971247]])
array([[-0.00904593, 0.19332681, -0.10655871, -0.64757587]])
array([[ 0.00654432, 0.00347247, -0.15332555, -0.47302148]])
array([[-0.01921821, -0.17354519, -0.20207744, -0.58569029]])
array([[ 0.00661377, 0.20038962, -0.16278598, -0.80983334]])
array([[-0.00348096, 0.18171964, -0.07072813, -0.38913168]])
array([[-0.01268919, -0.00548544, -0.08286095, -0.27108632]])
array([[ 0.01077598, -0.19254374, -0.004982 , 0.33175341]])
array([[-4.37101750e-04, -5.68196965e-01, -1.99532537e-01,
1.10581883e-01]])
array([[ 0.00657382, -0.19263146, -0.00402872, 0.33368607]])
array([[ 0.00677398, 0.19760551, -0.00076944, -0.25153403]])
array([[ 0.00261579, 0.19642629, -0.13894668, -0.71894379]])
array([[-0.0221003 , 0.37477368, -0.03765055, -0.63564477]])
array([[-0.0110009 , 0.37599703, -0.0574645 , -0.66318148]])
array([[ 0.00277214, 0.19763152, 0.00343971, -0.25211181]])
array([[-9.31810654e-05, -2.06245307e-01, -8.09019674e-02,
1.47356796e-01]])
array([[ 0.00709025, -0.37636771, -0.19725323, -0.11396513]])
array([[ 0.00015344, -0.01233088, -0.07851076, -0.11956039]])
array([[ 0.01077811, -0.18439307, -0.19043179, -0.34107231]])
array([[-0.01460483, 0.18019651, -0.05036345, -0.35505252]])
array([[-0.0127989 , 0.19071515, -0.08828268, -0.58871071]])
array([[ 0.01072609, 0.00249456, -0.00580012, 0.0409061 ]])
array([[ 0.01062156, 0.00782762, -0.17898265, -0.57245695]])
array([[-0.01180104, -0.37085843, -0.1973209 , -0.23782701]])
array([[-0.00849912, -0.00780031, -0.07940117, -0.21980343]])
array([[ 0.00672477, 0.00246062, -0.00160252, 0.04165408]])
array([[-0.02268911, -0.36534914, -0.21379125, -0.36284594]])
array([[-0.00865513, -0.20170279, -0.08379724, 0.0468145 ]])
array([[-0.0256848 , 0.17922475, -0.03098346, -0.33335449]])]
#ERR
Edit: When I print out the shape of batch_arr[:,3] I get (32,), not (32,4) as I expected. Thus I'm guess the numpy array does not know the shape of its inner arrays. Is there an easy way to fix that? It might be the root of the problem
The issue was the way that I had created my numpy array. I created it with indices of variable size, and thus it didn't know it was shaped (32,4), only that it was (32,). Reformulating the logic to ensure that the array is always a set width from the beginning allowed the array to be a (32,4), which allowed the prediction to work.