Some compilers use mad() multiply-accumulate function for higher performance. Does PyTorch compiler allow using a vectorized version of the mad() function?
Related
I'm trying to solve a system of equations using genetic algorithm (using python 3.x). is there a way to perform multi-objective analysis using the PyGAD module or an approach that increases accuracy of the GA output?
PyGAD does not support multi-objective optimization at the current stage. Only single objective optimization is supported.
I am trying to use a customized loss function for my NN. I've implemented all operations in torch and I have complex numbers among my data.
I get the error while training a NN:
RuntimeError: _th_addr_out not supported on CPUType for ComplexFloat
Do you know any possible solution to deal with it?
Well it seems Complex Autograd in PyTorch is currently in a prototype state, and the backward functionality for some of function is not included.
For example: torch.sign, which is used in the backward computation of torch.abs, is not defined for complex tensors. same for torch.mv. So I debugged my code line by line to find the functions which are not included, and replaced them with a customized function :)
Hope for a lot more functions to be included in the next release of PyTorch.
I'm using PyTorch to implement an intense sequence of matrix operations, using methods such as torch.mm or torch.dot. I was wondering if PyTorch uses multithreading or other optimization mechanisms to speed up the process. I am not utilizing a GPU. I appreciate if you could inform me of how fast these methods are and whether I need to take any actions to help the process.
PyTorch uses an efficient BLAS implementation and multithreading (openMP, if I'm not wrong) to parallelize such operations with multiple cores. Some performance loss comes from the Python itself - since this is an interpreted language, no significant compiler-like optimization can be done. You can use the jit module to speed up the "wrapper" code around the matrix multiplies, but for anything more than very small matrices this cost is probably negligible.
One big improvement you may be able to get manually, but which PyTorch doesn't apply automatically, is to properly order the matrix multiplies. As you probably know, depending on matrix shapes, a multiplication ABCD may have different performance computed as A(B(CD)) than if computed as (AB)(CD), etc.
In tensorflow 1.4, I found two functions that do batch normalization and they look same:
tf.layers.batch_normalization (link)
tf.contrib.layers.batch_norm (link)
Which function should I use? Which one is more stable?
Just to add to the list, there're several more ways to do batch-norm in tensorflow:
tf.nn.batch_normalization is a low-level op. The caller is responsible to handle mean and variance tensors themselves.
tf.nn.fused_batch_norm is another low-level op, similar to the previous one. The difference is that it's optimized for 4D input tensors, which is the usual case in convolutional neural networks. tf.nn.batch_normalization accepts tensors of any rank greater than 1.
tf.layers.batch_normalization is a high-level wrapper over the previous ops. The biggest difference is that it takes care of creating and managing the running mean and variance tensors, and calls a fast fused op when possible. Usually, this should be the default choice for you.
tf.contrib.layers.batch_norm is the early implementation of batch norm, before it's graduated to the core API (i.e., tf.layers). The use of it is not recommended because it may be dropped in the future releases.
tf.nn.batch_norm_with_global_normalization is another deprecated op. Currently, delegates the call to tf.nn.batch_normalization, but likely to be dropped in the future.
Finally, there's also Keras layer keras.layers.BatchNormalization, which in case of tensorflow backend invokes tf.nn.batch_normalization.
As show in doc, tf.contrib is a contribution module containing volatile or experimental code. When function is complete, it will be removed from this module. Now there are two, in order to be compatible with the historical version.
So, the former tf.layers.batch_normalization is recommended.
What should be better path to convert a scikit model (e.g. the result of a RandomForestClassifier fit) in a piece of C++ to get the the fastest .so that can be called from some other ecosystem ?
For portability of trained scikit learn models to other languages, see the sklearn-porter project.
Though, whether this will be faster than the originalRandomForestClassifier.predict method (which is multithreaded and uses numpy operations, potentially with a fast BLAS library) remains to be seen.