Keras: Syntax clarification - keras

Newbie in keras:
I am trying to understand the syntax used in keras.
Syntax that I am having difficult in understanding is while building a network. I have seen in number of places as also described in following code.
Statements like: current_layer = SOME_CODE(current_layer)
What is the meaning of such a statement? Does it means first the computation described in SOME_CODE is to be followed to the computation described in the current layer?
What is the use of such a syntax and when should one use it? Any advantages and alternatives?
input_layer = keras.layers.Input(
(IMAGE_BORDER_LENGTH, IMAGE_BORDER_LENGTH, NB_CHANNELS))
current_layer = image_mirror_left_right(input_layer)
current_layer = keras.layers.convolutional.Conv2D(
filters=16, "some values " ])
)(current_layer)
def random_image_mirror_left_right(input_layer):
return keras.layers.core.Lambda(function=lambda batch_imgs: tf.map_fn(
lambda img: tf.image.random_flip_left_right(img), batch_imgs
)
)(input_layer)

If you are indeed newbie in Keras, as you say, I would strongly suggest not bothering with such advanced stuff at this stage yet.
The repo you are referring to is a rather advanced and highly non-trivial case of using a specialized library (HyperOpt) for automatic meta-optimizing a Keras model. It involves 'automatic' model building according to some configuration parameters already stored in a Python dictionary...
Additionally, the function you quote goes beyond Keras to involve TensorFlow methods and lambda functions...
The current_layer=SOME_CODE(current_layer) is a typical example of the Keras Functional API; according to my experience, it is less widely used than the more straightforward Sequential API, but it may come handy in some more advanced cases, e.g.:
The Keras functional API is the way to go for defining complex models,
such as multi-output models, directed acyclic graphs, or models with
shared layers. [...] With the functional API, it is easy to re-use
trained models: you can treat any model as if it were a layer, by
calling it on a tensor. Note that by calling a model you aren't just
re-using the architecture of the model, you are also re-using its
weights.

Related

Tensorflow and Bert What are they exactly and what's the difference between them?

I'm interested in NLP and I come up with Tensorflow and Bert, both seem to be from Google and both seem to be the best thing for Sentiment Analysis as of today but I don't understand what are they exactly and what is the difference between them... Can someone explain?
Tensorflow is an open-source library for machine learning that will let you build a deep learning model/architecture. But the BERT is one of the architectures itself. You can build many models using TensorFlow including RNN, LSTM, and even the BERT. The transformers like the BERT are a good choice if you just want to deploy a model on your data and you don't care about the deep learning field itself. For this purpose, I recommended the HuggingFace library that provides a straightforward way to employ a transformer model in just a few lines of code. But if you want to take a deeper look at these models, I will suggest you to learns about the well-known deep learning architectures for text data like RNN, LSTM, CNN, etc., and try to implement them using an ML library like Tensorflow or PyTorch.
Bert and Tensorflow is not different thing , There are not only 2, but many implementations of BERT. Most are basically equivalent.
The implementations that you mentioned are:
The original code by Google, in Tensorflow. https://github.com/google-research/bert
Implementation by Huggingface, in Pytorch and Tensorflow, that reproduces the same results as the original implementation and uses the same checkpoints as the original BERT article. https://github.com/huggingface/transformers
These are the differences regarding different aspects:
In terms of results, there is no difference in using one or the other, as they both use the same checkpoints (same weights) and their results have been checked to be equal.
In terms of reusability, HuggingFace library is probably more reusable, as it is designed specifically for that. Also, it gives you the freedom of choosing TensorFlow or Pytorch as deep learning framework.
In terms of performance, they should be the same.
In terms of community support (e.g. asking questions in github or stackoverflow about them), HuggingFace library is better suited, as there are a lot of people using it.
Apart from BERT, the transformers library by HuggingFace has implementations for lots of models: OpenAI GPT-2, RoBERTa, ELECTRA, ...

How to convert a tf.estimator to a keras model?

In package tf.estimator, there's a lot of defined estimators. I want to use them in Keras.
I checked TF docs, there's only one converting method that could convert keras. Model to tf. estimator, but no way to convert from estimator to Model.
For example, if we want to convert the following estimator:
tf.estimator.DNNLinearCombinedRegressor
How could it be converted into Keras Model?
You cannot because estimators can run arbitrary code in their model_fn functions and Keras models must be much more structured, whether sequential or functional they must consist of layers, basically.
A Keras model is a very specific type of object that can therefore be easily wrapped and plugged into other abstractions.
Estimators are based on arbitrary Python code with arbitrary control flow and so it's quite tricky to force any structure onto them.
Estimators support 3 modes - train, eval and predict. Each of these could in theory have completely independent flows, with different weights, architectures etc. This is almost unthinkable in Keras and would essentially amount to 3 separate models.
Keras, in contrast, supports 2 modes - train and test (which is necessary for things like Dropout and Regularisation).

Random Forest Regressor using a custom objective/ loss function (Python/ Sklearn)

I want to build a Random Forest Regressor to model count data (Poisson distribution). The default 'mse' loss function is not suited to this problem. Is there a way to define a custom loss function and pass it to the random forest regressor in Python (Sklearn, etc..)?
Is there any implementation to fit count data in Python in any packages?
In sklearn this is currently not supported. See discussion in the corresponding issue here, or this for another class, where they discuss reasons for that a bit more in detail (mainly the large computational overhead for calling a Python function).
So it could be done as discussed within the issues, by forking sklearn, implementing the cost function in Cython and then adding it to the list of available 'criterion'.
If the problem is that the counts c_i arise from different exposure times t_i, then indeed one cannot fit the counts, but one can still fit the rates r_i = c_i/t_i using MSE loss function, where one should, however, use weights proportional to the exposures, w_i = t_i.
For a true Random Forest Poisson regression, I've seen that in R there is the rpart library for building a single CART tree, which has a Poisson regression option. I wish this kind of algorithm would have been imported to scikit-learn.
In R, writing a custom objective function is fairly simple.
randomForestSRC package in R has provision for writing your own custom split rule. The custom split rule, however has to be written in pure C language.
All you have to do is, write your own custom split rule, register the split rule, compile and install the package.
The custom split rule has to be defined in the file called splitCustom.c in randomForestSRC source code.
You can find more info
here.
The file in which you define the split rule is
this.

Keras: better way to implement layer-wise training model?

I'm currently learning implementing layer-wise training model with Keras. My solution is complicated and time-costing, could someone give me some suggestions to do it in a easy way? Also could someone explain the topology of Keras especially the relations among nodes.outbound_layer, nodes.inbound_layer and how did they associated with tensors: input_tensors and output_tensors? From the topology source codes on github, I'm quite confused about:
input_tensors[i] == inbound_layers[i].inbound_nodes[node_indices[i]].output_tensors[tensor_indices[i]]
Why the inbound_nodes contain output_tensors, I'm not clear about the relations among them....If I wanna remove layers in certain positions of the API model, what should I firstly remove? Also, when adding layers to some certain places, what shall I do first?
Here is my solution to a layerwise training model. I can do it on Sequential model and now trying to implement in on the API model:
To do it, I'm simply add a new layer after finish previous training and re-compile (model.compile()) and re-fit (model.fit()).
Since Keras model requires output layer, I would always add an output layer. As a result, each time when I wanna add a new layer, I have to remove the output layer then add it back. This can be done using model.pop(), in this case model has to be a keras.Sequential() model.
The Sequential() model supports many useful functions including model.add(layer). But for customised model using model API: model=Model(input=...., output=....), those pop() or add() functions are not supported and implement them takes some time and maybe not convenient.

Custom operation implementation for RBM/DBN with tensorflow?

Since Google released out tensorflow, it becomes kind of trend in the current deep learning selections.
I'd like to do some experiments about RBM/DBN (Restricted Boltzmann Machine/Deep Belief Network), I've made some attempt by myself and kind of implement it well through the combination of available APIs from tensorflow. See code and previous answer.
So, if doesn't bother the code running performance, here's the gift for RBM/DBN implementation with tensorflow.
But, the running performance must be considered for the future. Because of the special progress of CD (Contrastive Divergence) algorithm, I think it just works against the framework (data flow graph) used by tensorflow. That's why my code seems weired.
So, the custom operation should be implemented for acceleration. I've followed the current documentation about adding custom ops.
REGISTER_OP("NaiveRbm")
.Input("visible: float32")
.Input("weights: float32")
.Input("h_bias: float32")
.Input("v_bias: float32")
.Output("hidden: float32")
.Doc(R"doc(
Naive Rbm for seperate training use. DO NOT mix up with other operations
)doc");
In my design, NaiveRbm should is an operation that takes visible,weights,h_bias,v_bias as input, but output by only first 3 Variables ( simply sigmoid(X*W+hb) ), its gradient should return at least gradients for last 3 Variables.
Imagine example psuedo code like this:
X = tf.placeholder()
W1, hb1, vb1 = tf.Variable()
W2, hb2, vb2 = tf.Variable()
rbm1 = NaiveRbm(X,W1,hb1,vb1)
train_op = tf.train.MomentumOptimizer(0.01, 0.5).minimize(rbm1)
rbm2 = NaiveRbm(tf.stop_gradient(rbm1), W2, hb2, vb2)
train_op2 = tf.train.MomentumOptimizer(0.01, 0.5).minimize(rbm2)
with tf.Session() as sess:
for batch in batches:
sess.run(train_op, feed_dict={X: batch})
for batch in batches:
sess.run(train_op2, feed_dict={X: batch})
But the tensorflow library is too complex for me. And after too much time seeking for how to implement these existing operations (sigmoid, matmul, ma_add, relu, random_uniform) in custom operation, no solution is found by myself.
So, I'd like to ask if someone could help me achieve the remain works.
PS: before getting some ideas, I'd like to dive into Theano since it implements RBM/DBN already. Just in my opinion, Caffe is kind of not suitable for RBM/DBN because of its framework.
Update: After scratch through the tutorials from Theano, I found the key reason for Theano implemented the RBM/DBN while the tensorflow haven't is the scan technology. So, there might wait tensorflow to implement scan technology to prepare for RBM/DBN implementation.

Resources