Get random gamma distribution in tensorflow like numpy.random.gamma - python-3.x

Hi I am new to tensorflow and I am trying to generate random gamma distribution in tensorflow just like numpy.random.gamma
My numpy code is :-
self._lambda = 1 * np.random.gamma(100., 1. / 100, (self.n_topic, self.n_voca))
where n_topic=240 and n_voca=198
My tensorflow code is :-
self._tf_lambda = tf.random_gamma((self.n_topic, self.n_voca),1, dtype=tf.float32, seed=0, name='_tf_lambda')
Is it a correct implementation? I believe I failed to understand the parameters of tf.random_gamma became self._lambda <> self.tf_lambda.

You are setting different shape parameters in your distribution, so it is expected that they differ.
One thing to watch out for is that numpy has a "scale" parameter while TF has an "inverse scale" parameter. So one has to be inverted to get the same distribution.
Jupyter notebook example with matching distributions:
%matplotlib inline
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
size = (50000,)
shape_parameter = 1.5
scale_parameter = 0.5
bins = np.linspace(-1, 5, 30)
np_res = np.random.gamma(shape=shape_parameter, scale=scale_parameter, size=size)
# Note the 1/scale_parameter here
tf_op = tf.random_gamma(shape=size, alpha=shape_parameter, beta=1/scale_parameter)
with tf.Session() as sess:
tf_res = sess.run(tf_op)
plt.hist(tf_res, bins=bins, alpha=0.5);
plt.hist(np_res, bins=bins, alpha=0.5);

Related

Fastai for time series regression

So I have been using fastai library for a couple of years now. Recently, I came upon the extension library dedicated for the time series analysis - tsai
I am trying to perform simple regression task on the famous airpassengers dataset.
I have no idea what I am doing wrong:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in
import numpy as np # linear algebra
import torch
import random
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
# fastai
from fastai import *
from fastai.text import *
from fastai.text.all import *
from tsai.all import *
flight_data = sns.load_dataset("flights")
flight_data.head(20)
scaler = MinMaxScaler(feature_range=(-1, 1))
# flight_data['passengers'] = scaler.fit_transform(flight_data['passengers'].values.reshape(-1, 1)).flatten()
plt.figure(figsize=(10, 4))
plt.plot(flight_data['passengers'])
def create_inout_sequences(input_data, tw):
inout_seq = []
label_seq = []
L = len(input_data)
for i in range(L-tw):
train_seq = input_data[i:i+tw]
train_label = input_data[i+tw:i+tw+1]
inout_seq.append(train_seq)
label_seq.append(train_label)
return np.array(inout_seq), np.array(label_seq)
data = flight_data['passengers'].values
x, y = create_inout_sequences(data, 15)
src = itemify(x, y)
yy = y.reshape(-1)
xx = x.reshape(-1)
tfms = [None, [TSRegression()]]
batch_tfms = TSStandardize(by_sample=True, by_var=True)
dls = get_ts_dls(x, yy, tfms=tfms, bs=64)
dls.show_batch()
dls.one_batch()
dls.c
learn = ts_learner(dls, InceptionTime, metrics=[mae, rmse], cbs=ShowGraph())
learn.lr_find()

How to save Confusion Matrix plot so that I can call it for future reference?

I was using this latest function, sklearn.metrics.plot_confusion_matrix to plot my confusion matrix.
cm = plot_confusion_matrix(classifier,X , y_true,cmap=plt.cm.Greens)
And when I execute that cell, the confusion matrix plot showed up as expected. My problem is I want to use the plot for another cell later. When I called cm in another cell, it only shows the location of that object.
>>> cm
>>> <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1af790ac6a0>
Calling plt.show() doesn't work either
For your problem to work as you expect it you should do cm.plot()
Proof
Let's try to do it in a reproducible fashion:
from sklearn.metrics import plot_confusion_matrix
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
np.random.seed(42)
X, y = make_classification(1000, 10, n_classes=2)
clf = RandomForestClassifier()
clf.fit(X,y)
cm = plot_confusion_matrix(clf, X , y, cmap=plt.cm.Greens)
You can plot your cm object later as:
cm.plot(cmap=plt.cm.Greens);
For your reference. You can access methods available for cm object as:
[method for method in dir(cm) if not method.startswith("__")]
['ax_',
'confusion_matrix',
'display_labels',
'figure_',
'im_',
'plot',
'text_']
cm.figure_.savefig('conf_mat.png',dpi=300)

How to extract rows and columns from a 3D array in Tensorflow

I wanted to do the following indexing operation on a TensorFlow tensor.
What should be the equivalent operations in TensorFlow to get b and c as output? Although tf.gather_nd documentation has several examples but I could not generate equivalent indices tensor to get these results.
import tensorflow as tf
import numpy as np
a=np.arange(18).reshape((2,3,3))
idx=[2,0,1] #it can be any validing re-ordering index list
#These are the two numpy operations that I want to do in Tensorflow
b=a[:,idx,:]
c=a[:,:,idx]
# TensorFlow operations
aT=tf.constant(a)
idxT=tf.constant(idx)
# what should be these two indices
idx1T=tf.reshape(idxT, (3,1))
idx2T=tf.reshape(idxT, (1,1,3))
bT=tf.gather_nd(aT, idx1T ) #does not work
cT=tf.gather_nd(aT, idx2T) #does not work
with tf.Session() as sess:
b1,c1=sess.run([bT,cT])
print(np.allclose(b,b1))
print(np.allclose(c,c1))
I am not restricted to tf.gather_nd Any other suggestion to achieve the same operations on GPU will be helpful.
Edit: I have updated the question for a typo:
old statement: c=a[:,idx],
New statement: c=a[:,:,idx]
What I wanted to achieve was re-ordering of columns as well.
That can be done with tf.gather, using the axis parameter:
import tensorflow as tf
import numpy as np
a = np.arange(18).reshape((2,3,3))
idx = [2,0,1]
b = a[:, idx, :]
c = a[:, :, idx]
aT = tf.constant(a)
idxT = tf.constant(idx)
bT = tf.gather(aT, idxT, axis=1)
cT = tf.gather(aT, idxT, axis=2)
with tf.Session() as sess:
b1, c1=sess.run([bT, cT])
print(np.allclose(b, b1))
print(np.allclose(c, c1))
Output:
True
True

Unstable behavior of OneClassSVM by changing 'nu'

In the example above, I'm using my dataset to identify outliers. After making slight changes to the nu parameter, there is a huge difference in the number of anomalies identified.
Could this be just a particularity of the dataset? Or a bug in scikit-learn?
P.S. Unfortunately I cannot share the dataset.
If you decrease the value of the tol parameter of the OneClassSVM the result is better although not completely as expected for low values of nu.
import numpy as np
from sklearn.svm import OneClassSVM
import matplotlib.pyplot as plt
X = np.random.rand(100, 1)
nus = np.geomspace(0.0001, 0.5, num=100)
outlier_fraction = np.zeros(len(nus))
for i, nu in enumerate(nus):
outlier_fraction[i] = (OneClassSVM(nu=nu, tol=1e-12).fit_predict(X) == -1).mean()
plt.plot(nus, outlier_fraction)
plt.xlabel('nu')
plt.ylabel('Outlier fraction')
plt.show()
With the default tol you obtain the following
NOTE: not an answer. Offering a MCVE.
I also recently came across this. I would like to understand the inflection point at the low values
import numpy as np
import pandas as pd
from sklearn.svm import OneClassSVM
X = np.random.rand(100, 1)
nu = np.geomspace(0.0001, 1, num=100)
df = pd.DataFrame(data={'nu': nu})
for i in range(0, len(X)):
df.loc[i, 'anom_count'] = (OneClassSVM(nu=df.loc[i, 'nu']).fit_predict(X) == -1).sum()
df.set_index('nu').plot();
df.set_index('nu').plot(xlim=(0, 0.2));
df.anom_count.min() # 3
df.anom_count.idxmin() # 62
df.loc[df.anom_count.idxmin(), 'nu'] # 0.031

Sklearn kmeans equivalent of elbow method

Let's say I'm examining up to 10 clusters, with scipy I usually generate the 'elbow' plot as follows:
from scipy import cluster
cluster_array = [cluster.vq.kmeans(my_matrix, i) for i in range(1,10)]
pyplot.plot([var for (cent,var) in cluster_array])
pyplot.show()
I have since became motivated to use sklearn for clustering, however I'm not sure how to create the array needed to plot as in the scipy case. My best guess was:
from sklearn.cluster import KMeans
km = [KMeans(n_clusters=i) for i range(1,10)]
cluster_array = [km[i].fit(my_matrix)]
That unfortunately resulted in an invalid command error. What is the best way sklearn way to go about this?
Thank you
you can use the inertia attribute of Kmeans class.
Assuming X is your dataset:
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
X = # <your_data>
distorsions = []
for k in range(2, 20):
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
distorsions.append(kmeans.inertia_)
fig = plt.figure(figsize=(15, 5))
plt.plot(range(2, 20), distorsions)
plt.grid(True)
plt.title('Elbow curve')
You had some syntax problems in the code. They should be fixed now:
Ks = range(1, 10)
km = [KMeans(n_clusters=i) for i in Ks]
score = [km[i].fit(my_matrix).score(my_matrix) for i in range(len(km))]
The fit method just returns a self object. In this line in the original code
cluster_array = [km[i].fit(my_matrix)]
the cluster_array would end up having the same contents as km.
You can use the score method to get the estimate for how well the clustering fits. To see the score for each cluster simply run plot(Ks, score).
You can also use euclidean distance between the each data with the cluster center distance to evaluate how many clusters to choose. Here is the code example.
import numpy as np
from scipy.spatial.distance import cdist
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
iris = load_iris()
x = iris.data
res = list()
n_cluster = range(2,20)
for n in n_cluster:
kmeans = KMeans(n_clusters=n)
kmeans.fit(x)
res.append(np.average(np.min(cdist(x, kmeans.cluster_centers_, 'euclidean'), axis=1)))
plt.plot(n_cluster, res)
plt.title('elbow curve')
plt.show()

Resources