pytorch optim.SGD with momentum how to check "velocity"? - pytorch

I have a noob question: from the SGD doc they provided the equation of SGD with momentum, which indicates that apart from current gradient weight.grad, we also need to save the velocity from the previous step (something like weight.prev_v?). I know nn.Parameter object has .data and .grad attributes, but does it also saves a .prev_v? Do you know how pytorch works? Thanks!
Edit: Basically I'd like to know where does pytorch save velocity from the previous step?

Those are stored inside the state attribute of the optimizer. In the case of torch.optim.SGD the momentum values are stored a dictionary under the 'momentum_buffer' key, as you can see in the source code.
Here is a minimal example:
>>> m = nn.Linear(10,10)
>>> optim = torch.optim.SGD(m.parameters(), lr=1.e-3, momentum=.9)
>>> m(torch.rand(1, 10)).mean().backward()
>>> optim.step()
>>> optim.state
defaultdict(dict, {0: {}, Parameter containing: ...})
>>> list(optim.state.values())[0]
{'momentum_buffer': tensor([...])}

Related

Saving a model that uses tensorflow.lookup.StaticVocabularyTable in .pb format in Tensorflow 2

I am building a model that accepts as input a 2d array of string tokens, then uses a lookup table to get the assigned indices of the input tokens in the vocabulary. The model then uses those indices to compute an embedded representation of the input tokens by fetching associated token embeddings and adding them together. The compounded embedding is then compared agains another matrix using a nearest-neighbors lookup and then the indices of the top-k most similar entries are returned.
The model is saved in .pb format and is then used in a container running the TensorFlow Serving image for inference.
At the moment I have something that works just fine in TensorFlow 1.15, however I am trying to migrate my code to TensorFlow 2.4 and can't find a way to make it work.
Here is a slightly modified version of the code I am working with at the moment in TensorFlow 1.15
import tensorflow as tf
graph = tf.get_default_graph()
session = tf.Session()
tf.global_variables_initializer()
vocabulary = ['one', 'two', 'three', 'four', 'five', 'six']
embedding_dimension = 512
n_tokens = len(vocabulary)
token_embeddings = np.random.random((n_tokens, embedding_dimension))
matrix = np.random.random((100, embedding_dimension))
lookup_table_initializer = tf.lookup.KeyValueTensorInitializer(vocabulary, np.arange(n_tokens))
lookup_table = tf.lookup.StaticVocabularyTable(lookup_table_initializer, num_oov_buckets=1)
token_embeddings_with_oov_token = np.vstack([token_embeddings, np.zeros(embedding_dimension)])
token_embeddings_tensor = tf.convert_to_tensor(token_embeddings_with_oov_token, dtype=tf.float32)
matrix = tf.convert_to_tensor(matrix, dtype=tf.float32)
model_input = tf.placeholder(tf.string, [None, None], name="input")
input_tokens_indices = lookup_table.lookup(model_input)
input_token_indices_one_hot = tf.one_hot(input_tokens_indices, tf.dtypes.cast(value, dtype=np.int32)(lookup_table.size()))
encoded_text = tf.math.reduce_sum(input_token_indices_one_hot, axis=1, keepdims=True)
embedded_text = tf.linalg.matmul(encoded_text, token_embeddings_tensor)
embedded_text_pooled = tf.math.reduce_sum(embedded_text, axis=1)
embedded_text_normed = tf.divide(embedded_text_pooled, tf.norm(embedded_text_pooled, ord=2))
neighbors = tf.linalg.matmul(embedded_text_normed, product_embeddings_tensor, transpose_b=True, name="output")
tf.saved_model.simple_save(
session,
"model.pb",
inputs={"input": model_input},
outputs={"output": neighbors},
legacy_init_op=tf.tables_initializer(),
)
The issue that I am facing is when converting the above code to TensorFlow 2. First of all, the tf.placeholder is no more and I have read on other posts suggestions to replace that with tf.keras.layers.Input((), dtype=tf.dtypes.string), however then I get an error when I try to carry out the lookup_table.lookup() step, as apparently I cannot pass a symbolic tensor to that function. As a result I am stuck and do not know which way to proceed to make my model compatible with tf2 and after hours searching online for solutions I can't seem to find something that works.

What is the difference between decision function and score_samples in isolation_forest in SKLearn

I have read the documentation of the decision function and score_samples here, but could not figure out what is the difference between these two methods and which one should I use for an outlier detection algorithm.
Any help would be appreciated.
See the documentation for the attribute offset_:
Offset used to define the decision function from the raw scores. We have the relation: decision_function = score_samples - offset_. offset_ is defined as follows. When the contamination parameter is set to “auto”, the offset is equal to -0.5 as the scores of inliers are close to 0 and the scores of outliers are close to -1. When a contamination parameter different than “auto” is provided, the offset is defined in such a way we obtain the expected number of outliers (samples with decision function < 0) in training.
The User Guide references the paper Isolation forest written by Fei Tony, Kai Ming and Zhi-Hua.
I did not read the paper, but I think you can use either output to detect outliers. The documentation says score_samples is the opposite of decision_function, so I thought they would be inversely related, but both outputs seem to have the exact same relationship with the target. The only difference is that they are on different ranges. In fact, they even have the same variance.
To see this, I fit the model to the breast cancer dataset available in sklearn and visualized the average of the target variable grouped by the deciles of each output. As you can see, they both have the exact same relationship.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import IsolationForest
# Load data
X = load_breast_cancer()['data']
y = load_breast_cancer()['target']
# Fit model
clf = IsolationForest()
clf.fit(X, y)
# Split the outputs into deciles to see their relationship with target
t = pd.DataFrame({'target':y,
'decision_function':clf.decision_function(X),
'score_samples':clf.score_samples(X)})
t['bins_decision_function'] = pd.qcut(t['decision_function'], 10)
t['bins_score_samples'] = pd.qcut(t['score_samples'], 10)
# Visualize relationship
plt.plot(t.groupby('bins_decision_function')['target'].mean().values, lw=3, label='Decision Function')
plt.plot(t.groupby('bins_score_samples')['target'].mean().values, ls='--', label='Score Samples')
plt.legend()
plt.show()
Like I said, they even have the same variance:
t[['decision_function','score_samples']].var()
> decision_function 0.003039
> score_samples 0.003039
> dtype: float64
In conclusion, you can use them interchangeably as they both share the same relationship with the target.
As was previously stated in #Ben Reiniger's answer,
decision_function = score_samples - offset_. For further clarification...
If contamination = 'auto', then offset_ is fixed to 0.5
If contamination is set to something other than 'auto', then
offset is no longer fixed.
This can be seen under the fit function in the source code:
def fit(self, X, y=None, sample_weight=None):
...
if self.contamination == "auto":
# 0.5 plays a special role as described in the original paper.
# we take the opposite as we consider the opposite of their score.
self.offset_ = -0.5
return self
# else, define offset_ wrt contamination parameter
self.offset_ = np.percentile(self.score_samples(X),
100. * self.contamination)
Thus, it's important to take note of what contamination is set to, as well as which anomaly scores you are using. score_samples returns what can be thought of as the "raw" scores, as it is unaffected by offset_, whereas decision_function is dependent on offset_

Treat a tuple/list of Tensors as a single Tensor

I'm using Pytorch for some robotics Reinforcement Learning tasks. I'd like to use both images and information about the state as observations for this task. The implementation I'm using does not directly support this so I'm making some amendments. Expected observations are either state, as a 1 dimensional Tensor, or images as a 3 dimensional Tensor (channels, width, height). In my task I would like the observation to be a tuple of Tensors.
In many places in my codebase, the observation is of course expected to be a single Tensor, not a tuple of Tensors. Is there an easy way to treat a tuple of Tensors as a single Tensor?
For example, I would like:
observation.to(device)
to work as normal when observation is a single Tensor, and call .to(device) on each Tensor when observation is a tuple of Tensors.
It should be simple enough to create a data type that can support this, but I'm wondering does such a data type already exist? I haven't found anything so far.
If your tensors are all of the same size, you can use torch.stack to concatenate them into one tensor with one more dimension.
Example:
>>> import torch
>>> a=torch.randn(2,1)
>>> b=torch.randn(2,1)
>>> c=torch.randn(2,1)
>>> a
tensor([[ 0.7691],
[-0.0297]])
>>> b
tensor([[ 0.4844],
[-0.9142]])
>>> c
tensor([[ 0.0210],
[-1.1543]])
>>> torch.stack((a,b,c))
tensor([[[ 0.7691],
[-0.0297]],
[[ 0.4844],
[-0.9142]],
[[ 0.0210],
[-1.1543]]])
You can then use torch.unbind to go the other direction.

How to assign a new value to a pytorch Variable without breaking backpropagation?

I have a pytorch variable that is used as a trainable input for a model. At some point I need to manually reassign all values in this variable.
How can I do that without breaking the connections with the loss function?
Suppose the current values are [1.2, 3.2, 43.2] and I simply want them to become [1,2,3].
Edit
At the time I asked this question, I hadn't realized that PyTorch doesn't have a static graph as Tensorflow or Keras do.
In PyTorch, the training loop is made manually and you need to call everything in each training step. (There isn't the notion of placeholder + static graph for later feeding data).
Consequently, we can't "break the graph", since we will use the new variable to perform all the further computations again. I was worried about a problem that happens in Keras, not in PyTorch.
You can use the data attribute of tensors to modify the values, since modifications on data do not affect the graph. So the graph will still be intact and modifications of the data attribute itself have no influence on the graph. (Operations and changes on data are not tracked by autograd and thus not present in the graph)
Since you haven't given an example, this example is based on your comment statement: 'Suppose I want to change the weights of a layer.'
I used normal tensors here, but this works the same for weight.data and bias.data attributes of a layers.
Here is a short example:
import torch
import torch.nn.functional as F
# Test 1, random vector with CE
w1 = torch.rand(1, 3, requires_grad=True)
loss = F.cross_entropy(w1, torch.tensor([1]))
loss.backward()
print('w1.data', w1)
print('w1.grad', w1.grad)
print()
# Test 2, replacing values of w2 with w1, before CE
# to make sure that everything is exactly like in Test 1 after replacing the values
w2 = torch.zeros(1, 3, requires_grad=True)
w2.data = w1.data
loss = F.cross_entropy(w2, torch.tensor([1]))
loss.backward()
print('w2.data', w2)
print('w2.grad', w2.grad)
print()
# Test 3, replace data after computation
w3 = torch.rand(1, 3, requires_grad=True)
loss = F.cross_entropy(w3, torch.tensor([1]))
# setting values
# the graph of the previous computation is still intact as you can in the below print-outs
w3.data = w1.data
loss.backward()
# data were replaced with values from w1
print('w3.data', w3)
# gradient still shows results from computation with w3
print('w3.grad', w3.grad)
Output:
w1.data tensor([[ 0.9367, 0.6669, 0.3106]])
w1.grad tensor([[ 0.4351, -0.6678, 0.2326]])
w2.data tensor([[ 0.9367, 0.6669, 0.3106]])
w2.grad tensor([[ 0.4351, -0.6678, 0.2326]])
w3.data tensor([[ 0.9367, 0.6669, 0.3106]])
w3.grad tensor([[ 0.3179, -0.7114, 0.3935]])
The most interesting part here is w3. At the time backward is called the values are replaced by values of w1. But the gradients are calculated based on the CE-function with values of original w3. The replaced values have no effect on the graph.
So the graph connection is not broken, replacing had no influence on graph. I hope this is what you were looking for!

Is it possible to use labelled data in SKLearn?

Currently my code looks like:
clf = RandomForestClassifier(n_estimators=10, criterion='entropy')
clf = clf.fit(X, Y)
However X is an array like:
X = [[0, 1], [1, 1]]
I would prefer to use X like:
X = [{'avg': 0, 'stddev': 1}, {'avg': 1, 'stddev': 1}]
Simply because plotting a tree (as described here: http://scikit-learn.org/stable/modules/tree.html#classification ) makes much more sense when you read X[0]['avg'] rather than X[0][0]. Is it possible? Using dictionary or pandas?
You can use the DictVectorizer class to convert such a list of dicts to sparse matrices or dense numpy arrays.
scikit-learn will never use dict objects as the primary datastructure to store records internally as this not memory efficient at all compared to numpy arrays or scipy sparse matrices.
Here's is a great example by 'larsmans' on how to build a feature dict and use DictVectorizer before fitting a model on the data. Note that DictVectorizer class uses scipy.sparse matrix by default (instead of a numpy.ndarray) to make the resulting data structure able to fit in memory. As not all sklearn learning models support sparse matrices you might want to use sparse=False option in the constructor to obtain a dense array
dv = DictVectorizer(sparse=False)
Alternatively, you can specify feature names when using export_graphviz. This will generate
a tree with more meaningful labels at test nodes.
See the feature_names parameter at http://scikit-learn.org/dev/modules/generated/sklearn.tree.export_graphviz.html#sklearn.tree.export_graphviz

Resources