In PyTorch, we have torch.nn.funtional and torch.nn, and the classes/functions within the former are typically referred to as "functions" while the latter as "modules".
There seems to be a lot of overlap between the two, so I'm wondering what are each used for and what their differences are?
Here are the differences:
torch.nn.functional is the base functional interface (in terms of programming paradigm) to apply PyTorch operators on torch.Tensor.
torch.nn contains the wrapper nn.Module that provide a object-oriented interface to those operators.
So indeed there is a complete overlap, modules are a different way of accessing the operators provided by those functions.
Every single tensor operator in PyTorch is available in the form of a function and its wrapper class. For instance F.conv2d, F.relu, F.dropout, F.batch_norm etc... have corresponding modules nn.Conv2d, nn.ReLU, nn.Dropout, nn.BatchNorm2d, 2d, 3d.
Related
I have read many tutorials on how to use PyTorch to make a regression over a data set, using, for instance, a model composed of several linear layers and the MSE loss.
Well, imagine that I know the function F depends on a variable x and some unknown parameters (p_j: j=0,..., P-1) with P relatively small, but the function is a composition of special function. So, my problem is the classical minimization knowing the data {x_i,y_i}_i<=N
Min_{ {p_j} } Sum_i (F(x_i;{p_j}) - y_i)^2
I would like to know if I can use the PyTorch optimizers and if yes how I can do it?
Thanks.
In fact, PyTorch experts answer that the function to minimized must be expressed in terms of torch.tensors to let the minimizers computing the gradients. So, it is not possible in my case.
I saw both transformer and estimator were mentioned in the sklearn documentation.
Is there any difference between these two words?
The basic difference is that a:
Transformer transforms the input data (X) in some ways.
Estimator predicts a new value (or values) (y) by using the input data (X).
Both the Transformer and Estimator should have a fit() method which can be used to train them (they learn some characteristics of the data). The signature is:
fit(X, y)
fit() does not return any value, just stores the learnt data inside the object.
Here X represents the samples (feature vectors) and y is the target vector (which may have single or multiple values per corresponding sample in X). Note that y can be optional in some transformers where its not needed, but its mandatory for most estimators (supervised estimators). Look at StandardScaler for example. It needs the initial data X for finding the mean and std of the data (it learns the characteristics of X, y is not needed).
Each Transformer should have a transform(X, y) function which like fit() takes the input X and returns a new transformed version of X (which generally should have same number samples but may or may not have same features).
On the other hand, Estimator should have a predict(X) method which should output the predicted value of y from the given X.
There will be some classes in scikit-learn which implement both transform() and predict(), like KMeans, in that case carefully reading the documentation should solve your doubts.
Transformer is a type of Estimator that implements transform method.
Let me support that statement with examples I have come across in sklearn implementation.
Class sklearn.preprocessing.FunctionTransformer :
This inherits from two other classes TransformerMixin, BaseEstimator
Class sklearn.preprocessing.PowerTransformer :
This also inherits from TransformerMixin, BaseEstimator
From what I understand, Estimators just take data, do some processing, and store data based on logic implemented in its fit method.
Note: Estimator's aren't used to predict values directly. They don't even have predict method in them.
Before I give more explanation to the above statement, let me tell you about Mixin Classes.
Mixin Class: These are classes that implement a Mix-in design pattern. Wikipedia has very good explanation about it. You can read it here . To summarise, these are classes you write which have methods that can be used in many different classes. So, you write them in one class and just inherit in many different classes(A form of composition. Read These Links - Link1 Link2)
In Sklearn there are many mixin classes. To name a few
ClassifierMixin, RegressorMixin, TransformerMixin.
Here, TransformerMixin is the class that's inherited by every Transformer used in sklearn. TransformerMixin class has only one method which is reusable in every transformer and that is fit_transform.
All transformers inherit two classes, BaseEstimator(Which has fit method) and TransformerMixin(Which has fit_transform method). And, Each transformer has transform method based on its functionality
I guess that gives an answer to your question. Now, let me answer the statement I made regarding the Estimator for prediction.
Every Model Class has its own predict class that does prediction.
Consider LinearRegression, KNeighborsClassifier, or any other Model class. They all have a predict function declared in them. This is used for prediction. Not the Estimator.
The sklearn usage is perhaps a little unintuitive, but "estimator" doesn't mean anything very specific: basically everything is an estimator.
From the sklearn glossary:
estimator:
An object which manages the estimation and decoding of a model...
Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator.
transformer:
An estimator supporting transform and/or fit_transform...
As in #VivekKumar's answer, I think there's a tendency to use the word estimator for what sklearn instead calls a "predictor":
An estimator supporting predict and/or fit_predict. This encompasses classifier, regressor, outlier detector and clusterer...
In tensorflow 1.4, I found two functions that do batch normalization and they look same:
tf.layers.batch_normalization (link)
tf.contrib.layers.batch_norm (link)
Which function should I use? Which one is more stable?
Just to add to the list, there're several more ways to do batch-norm in tensorflow:
tf.nn.batch_normalization is a low-level op. The caller is responsible to handle mean and variance tensors themselves.
tf.nn.fused_batch_norm is another low-level op, similar to the previous one. The difference is that it's optimized for 4D input tensors, which is the usual case in convolutional neural networks. tf.nn.batch_normalization accepts tensors of any rank greater than 1.
tf.layers.batch_normalization is a high-level wrapper over the previous ops. The biggest difference is that it takes care of creating and managing the running mean and variance tensors, and calls a fast fused op when possible. Usually, this should be the default choice for you.
tf.contrib.layers.batch_norm is the early implementation of batch norm, before it's graduated to the core API (i.e., tf.layers). The use of it is not recommended because it may be dropped in the future releases.
tf.nn.batch_norm_with_global_normalization is another deprecated op. Currently, delegates the call to tf.nn.batch_normalization, but likely to be dropped in the future.
Finally, there's also Keras layer keras.layers.BatchNormalization, which in case of tensorflow backend invokes tf.nn.batch_normalization.
As show in doc, tf.contrib is a contribution module containing volatile or experimental code. When function is complete, it will be removed from this module. Now there are two, in order to be compatible with the historical version.
So, the former tf.layers.batch_normalization is recommended.
I would like to use PyTorch as a scientific computation package. It has much to recommend it in that respect - its Tensors are basically GPU-accelerated numpy arrays, and its autograd mechanism is potentially useful for a lot of things besides neural networks.
However, the available tutorials and documentation seem strongly geared towards quickly getting people up and running using it for machine learning. Although there is lots of good information available on the Tensor and Variable classes (and I understand that material reasonably well), the nn and optim packages always seem to be introduced by example rather than by explaining the API, which makes it hard to figure out exactly what's going on.
My main question at this point is whether I can use the optim package without also using the nn package, and if so how to do so. Of course I can always implement my simulations as subclasses of nn.Module even though they are not neural networks, but I would like to understand what happens under the hood when I do this, and what benefits/drawbacks it would give for my particular application.
More broadly, I would appreciate pointers to any resource that gives more of a logical overview of the API (for nn and optim specifically), rather than just presenting examples.
This is a partial self-answer to the specific question about using optim without using nn. The answer is, yes, you can do that. In fact, from looking at the source code, the optim package doesn't know anything about nn and only cares about Variables and tensors.
The documentation gives the following incomplete example:
optimizer = optim.Adam([var1, var2], lr = 0.0001)
and then later:
for input, target in dataset:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
The function model isn't defined anywhere and looks like it might be something to do with nn, but in fact it can just be a Python function that computes output from input using var1 and var2 as parameters, as long as all the intermediate steps are done using Variables so that it can be differentiated. The call to optimizer.step() will update the values of var1 and var2 automatically.
In terms of the structure of PyTorch overall, it seems that optim and nn are independent of one another, with nn being basically just a convenient way to chain differentiable functions together, along with a library of such functions that are useful in machine learning. I would still appreciate pointers to a good technical overview of the whole package, though.
Newbie in keras:
I am trying to understand the syntax used in keras.
Syntax that I am having difficult in understanding is while building a network. I have seen in number of places as also described in following code.
Statements like: current_layer = SOME_CODE(current_layer)
What is the meaning of such a statement? Does it means first the computation described in SOME_CODE is to be followed to the computation described in the current layer?
What is the use of such a syntax and when should one use it? Any advantages and alternatives?
input_layer = keras.layers.Input(
(IMAGE_BORDER_LENGTH, IMAGE_BORDER_LENGTH, NB_CHANNELS))
current_layer = image_mirror_left_right(input_layer)
current_layer = keras.layers.convolutional.Conv2D(
filters=16, "some values " ])
)(current_layer)
def random_image_mirror_left_right(input_layer):
return keras.layers.core.Lambda(function=lambda batch_imgs: tf.map_fn(
lambda img: tf.image.random_flip_left_right(img), batch_imgs
)
)(input_layer)
If you are indeed newbie in Keras, as you say, I would strongly suggest not bothering with such advanced stuff at this stage yet.
The repo you are referring to is a rather advanced and highly non-trivial case of using a specialized library (HyperOpt) for automatic meta-optimizing a Keras model. It involves 'automatic' model building according to some configuration parameters already stored in a Python dictionary...
Additionally, the function you quote goes beyond Keras to involve TensorFlow methods and lambda functions...
The current_layer=SOME_CODE(current_layer) is a typical example of the Keras Functional API; according to my experience, it is less widely used than the more straightforward Sequential API, but it may come handy in some more advanced cases, e.g.:
The Keras functional API is the way to go for defining complex models,
such as multi-output models, directed acyclic graphs, or models with
shared layers. [...] With the functional API, it is easy to re-use
trained models: you can treat any model as if it were a layer, by
calling it on a tensor. Note that by calling a model you aren't just
re-using the architecture of the model, you are also re-using its
weights.