Why does start_queue_runners() function use with shuffle_batch() function - python-3.x

I am new to TensorFlow and trying to understand the shuffle_batch() function.When I use the shuffle_batch() with following code it does not printing anything.
import tensorflow as tf
sess=tf.Session()
random=tf.random_normal([5],mean=0.0, stddev=1.0)
shu=tf.train.shuffle_batch([sliced], 20, 100, 10)
print(sess.run(shu))
But after adding the start_queue_runners() it gives me the expected output. So what is the relationship between these start_queue_runners() and shuffle_batch() ?
import tensorflow as tf
sess=tf.Session()
random=tf.random_normal([5],mean=0.0, stddev=1.0)
shu=tf.train.shuffle_batch([sliced], 20, 100, 10)
threads = tf.train.start_queue_runners(sess=sess)
print(sess.run(shu))

The queue pipeline has been replaced by tf.dataset. You should have a look at this instead.
tf datasets
guide
It is much simpler to use the dataset api.

Related

How to leave scikit-learn esimator result in dask distributed system?

You can find a minimal-working example below (directly taken from dask-ml page, only change is made to the Client() to make it work in distributed system)
import numpy as np
from dask.distributed import Client
import joblib
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
# Don't forget to start the dask scheduler and connect worker(s) to it.
client = Client('localhost:8786')
digits = load_digits()
param_space = {
'C': np.logspace(-6, 6, 13),
'gamma': np.logspace(-8, 8, 17),
'tol': np.logspace(-4, -1, 4),
'class_weight': [None, 'balanced'],
}
model = SVC(kernel='rbf')
search = RandomizedSearchCV(model, param_space, cv=3, n_iter=50, verbose=10)
with joblib.parallel_backend('dask'):
search.fit(digits.data, digits.target)
But this returns the result to the local machine. This is not exactly my code. In my code
I am using scikit-learn tfidf vectorizer. After I use fit_transform(), it is returning the fitted and transformed data (in sparse format) to my local machine. How can I leave the results inside the distributed system (cluster of machines)?
PS: I just encountered this from dask_ml.wrappers import ParallelPostFit Maybe this is the solution?
The answer was in front of my eyes and I couldn't see it for 3 days of searching. ParallelPostFit is the answer. The only problem is that it doesn't support fit_transform() but fit() and transform() works and it returns a lazily evaluated dask array (that is what I was looking for). Be careful about this warning:
Warning
ParallelPostFit does not parallelize the training step. The underlying
estimator’s .fit method is called normally.

Keras initializers outside Keras

I want to initialize a 4*11 matrix using glorot uniform in Keras, using following code:
import keras
keras.initializers.glorot_uniform((4,11))
I get this output :
<keras.initializers.VarianceScaling at 0x7f9666fc48d0>
How can I visualize the output? I have tried c[1] and got output 'VarianceScaling' object does not support indexing.
The glorot_uniform() creates a function, and later this function will be called with a shape. So you need:
# from keras.initializers import * #(tf 1.x)
from tensorflow.keras.initializers import *
unif = glorot_uniform() #this returns a 'function(shape)'
mat_as_tensor = unif((4,11)) #this returns a tensor - use this in keras models if needed
mat_as_numpy = K.eval(mat) #this returns a numpy array (don't use in models)
print(mat_as_numpy)

Get random gamma distribution in tensorflow like numpy.random.gamma

Hi I am new to tensorflow and I am trying to generate random gamma distribution in tensorflow just like numpy.random.gamma
My numpy code is :-
self._lambda = 1 * np.random.gamma(100., 1. / 100, (self.n_topic, self.n_voca))
where n_topic=240 and n_voca=198
My tensorflow code is :-
self._tf_lambda = tf.random_gamma((self.n_topic, self.n_voca),1, dtype=tf.float32, seed=0, name='_tf_lambda')
Is it a correct implementation? I believe I failed to understand the parameters of tf.random_gamma became self._lambda <> self.tf_lambda.
You are setting different shape parameters in your distribution, so it is expected that they differ.
One thing to watch out for is that numpy has a "scale" parameter while TF has an "inverse scale" parameter. So one has to be inverted to get the same distribution.
Jupyter notebook example with matching distributions:
%matplotlib inline
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
size = (50000,)
shape_parameter = 1.5
scale_parameter = 0.5
bins = np.linspace(-1, 5, 30)
np_res = np.random.gamma(shape=shape_parameter, scale=scale_parameter, size=size)
# Note the 1/scale_parameter here
tf_op = tf.random_gamma(shape=size, alpha=shape_parameter, beta=1/scale_parameter)
with tf.Session() as sess:
tf_res = sess.run(tf_op)
plt.hist(tf_res, bins=bins, alpha=0.5);
plt.hist(np_res, bins=bins, alpha=0.5);

Tensorflow Scikit Flow get GraphDef for Android (save *.pb file)

I want to use my Tensorflow algorithm in an Android app. The Tensorflow Android example starts by downloading a GraphDef that contains the model definition and weights (in a *.pb file). Now this should be from my Scikit Flow algorithm (part of Tensorflow).
At the first glance it seems easy you just have to say classifier.save('model/') but the files saved to that folder are not *.ckpt, *.def and certainly not *.pb. Instead you have to deal with a *.pbtxt and a checkpoint (without ending) file.
I'm stuck there since quite a while. Here a code example to export something:
#imports
import tensorflow as tf
import tensorflow.contrib.learn as skflow
import tensorflow.contrib.learn.python.learn as learn
from sklearn import datasets, metrics
#skflow example
iris = datasets.load_iris()
feature_columns = learn.infer_real_valued_columns_from_input(iris.data)
classifier = learn.LinearClassifier(n_classes=3, feature_columns=feature_columns,model_dir="modeltest")
classifier.fit(iris.data, iris.target, steps=200, batch_size=32)
iris_predictions = list(classifier.predict(iris.data, as_iterable=True))
score = metrics.accuracy_score(iris.target, iris_predictions)
print("Accuracy: %f" % score)
The files you get are:
checkpoint
graph.pbtxt
model.ckpt-1.meta
model.ckpt-1-00000-of-00001
model.ckpt-200.meta
model.ckpt-200-00000-of-00001
Many possible workarounds I found would require having the GraphDef in a variable (don't know how with Scikit Flow). Or a Tensorflow session which doesn't seem to be required using Scikit Flow.
To save as pb file, you need to extract the graph_def from the constructed graph. You can do that as--
from tensorflow.python.framework import tensor_shape, graph_util
from tensorflow.python.platform import gfile
sess = tf.Session()
final_tensor_name = 'results:0' #Replace final_tensor_name with name of the final tensor in your graph
#########Build your graph and train########
## Your tensorflow code to build the graph
###########################################
outpt_filename = 'output_graph.pb'
output_graph_def = sess.graph.as_graph_def()
with gfile.FastGFile(outpt_filename, 'wb') as f:
f.write(output_graph_def.SerializeToString())
If you want to convert your trained variables to constants (to avoid using ckpt files to load the weights), you can use:
output_graph_def = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), [final_tensor_name])
Hope this helps!

ExtraTreesClassifier with sparse training data?

I am trying to use an ExtraTreesClassifier with sparse data, as per the documentation, however I do get a run time TypeError asking for dense data. This is on scikit-learn 0.17.1, and below I am quoting from the documentation:
Parameters:
X : array-like or sparse matrix of shape = [n_samples, n_features]
The code is quite simple:
import pandas as pd
from scipy.sparse import coo_matrix, csr_matrix, hstack
from sklearn.ensemble import ExtraTreesClassifier
import numpy as np
from scipy import *
features = array([[1, 0], [0, 1], [3, 4]])
sparse_features = csr_matrix(features)
labels = array([0, 1, 0])
classifier = ExtraTreesClassifier()
classifier.fit(sparse_features, labels)
And here the exception: TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.. This works fine when passing in features.
Looks like the documentation is out of date or is there something wrong with the above code?
Any help will be greatly appreciated. Thank you.
Quoting the documentation:
Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.
So I expect that passing a csc_matrix should help.
On my setup both version work normally (csc and csr, sklearn 0.17.1), I assume that problems could be with older versions of scipy.

Resources