I want to implement memory monger method as described in this paper Training Deep Nets with Sublinear Memory Cost. The code that implements that method for mxnet framework can be found here.
My question is, this memory monger method is suitable for symbolic graphs like the ones in Theano and Tensorflow, but the graphs in pytorch are dynamic, so can this method be applied for pytorch ?
Related
I am working with a very large dataset (1.5 Million rows) and thought about using an SVR.
Since there is so much data I though about switching to a linear SVM and using the nystroem
method to make a kernel from the uniform sampled data.
However I would rather like to construct the kernel via Kernel K-Means, but I did not find an official
implementation yet.
This link provides a unofficual method, but this results in a very large model since it is serialized.
https://tslearn.readthedocs.io/en/stable/gen_modules/clustering/tslearn.clustering.KernelKMeans.html
Maybe someone has a clue where to look for this or how to implement this codewise from an arbitrary dataset?
I converted a model from tf.keras to caffe. When I evaluate the model with Caffe on the test set, I find that the accuracy is higher with caffe than with tf.keras. I can't think of a way to get a hand on the source of the problem (if there's a problem in the first place...)
Is this difference due to the lower-level libraries used for accelerating the computations (I am thinking of cudnn and the caffe engine)? Is there a well-known accuracy problem with the keras module of tensorflow?
By the way, there are other people that have a similar issue:
https://github.com/keras-team/keras/issues/4444
This can happen.
Once you convert your keras .h5 model to .caffemodel, the weights are numerically copied. But, internally you'll load your model to Caffe and not Keras.
As, caffe and keras are two different libraries, their internal algorithms can vary slightly. Also if you change your pre-processing scheme that can change the result too. Usually, if you use pruning (to optimize the size) the performance can go low, in the weird case this can be thought of as an extreme regularization and act as a performance booster in test.
I have a bunch of tensor operations (matmul, transpose, etc..) I would like to run on a large dataset.
Since they are still matrix operations, and since I am using Keras generators to load the data batches, It would make sense to use GPUs to compute them.
Now, I've searched a while and I can't seem to find which is the correct way to use Keras to do parallel GPU operations, using generators, outside of the standard Model object interface.
Does anyone know how to do it? Thanks!
How you can program keras or tensorflow to partitionate training on multiple GPU, let's say you are in an amaozn ec2 instance that has 8 GPU's and you want to use all of them to train faster, but your code is just for a single cpu or GPU ?
Yes, can run Keras models on multiple GPUs. This is only possible with the TensorFlow backend for the time being, because the Theano feature is still rather new. We are looking at adding support for multi-gpu in Theano in the near future (it should be fairly straightforward).
With the TensorFlow backend, you can achieve this the same way as you would in pure TensorFlow: by using the with tf.device(d) scope when defining Keras layers.
Originally from here
To create RNN cells, there are classes like GRUCell and LSTMCell which can be used later to create RNN layers.
And also there are 2 other classes as CudnnGRU and CudnnLSTM which can be directly used to create RNN layers.
In the documentation they say that the latter classes have cuDNN implementation. Why should I use or not use this cuDNN implemented classes over classical RNN implementations when I'm creating a RNN model..?
In short: cudnnGRU and cudnnLSTM can/ must be used on GPU, normal rnn implementations not. So if you have tensorflow-gpu, cudnn implementation of RNN cells would run faster.
CuDNNLSTM and CuDNNGRU are the fast implementation backed by CuDNN. Both can only be run on the GPU, with the TensorFlow backend. The cuDNN is a GPU-accelerated library of primitives for deep neural networks.
The cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.
The cuDNN highlights include:
Up to 3x faster training of ResNet-50 and GNMT on Tesla V100 vs.
Tesla P100
Improved NHWC support for pooling and strided convolution
Get Improved performance for common workloads such as ResNet50 and SSD as batchnorm now supports NHWC data layout with an added option
to fuse batchnorm with Add and ReLu operations