How to specify the weights in a MixtureModel in Julia Distributions.jl?

How to specify the weights in a MixtureModel in Julia Distributions.jl? - statistics

In Distributions.jl we can specify the priors of a mixture model. But we cannot specify the weights. For example, if I want to make a mixture like this:
pdf(Normal(2, 3), x)*w1.+pdf(Normal(5, 10), x)*w2
I cannot really specify the weights. And the priors are required to add up to 1 for obv reasons.
So, is there a way to specify the weights in MixtureModel?
Something like:
MixtureModel(Normal[
Normal(2, 3),
Normal(5, 10)
], **weights=[w1, w2]**)
Thanks

This is covered in the Distributions.jl documentation on mixture model constructors — you want the prior argument. See
https://juliastats.org/Distributions.jl/v0.14/mixture.html#Constructors-1
Here's a quick plot of their first example. The [0.2, 0.5, 0.3] are the weights:
julia> using Distributions, Plots
julia> d = MixtureModel(Normal[
Normal(-2.0, 1.2),
Normal(0.0, 1.0),
Normal(3.0, 2.5)], [0.2, 0.5, 0.3])
MixtureModel{Normal}(K = 3)
components[1] (prior = 0.2000): Normal{Float64}(μ=-2.0, σ=1.2)
components[2] (prior = 0.5000): Normal{Float64}(μ=0.0, σ=1.0)
components[3] (prior = 0.3000): Normal{Float64}(μ=3.0, σ=2.5)
julia> x = -10:0.1:10
-10.0:0.1:10.0
julia> plot(x, pdf.(d, x), legend=nothing, xlabel="x", ylabel="pdf")
Which produces

Related

Understanding L2-norm output for 3D tensor - TensorFlow2

For Python 3.8 and TensorFlow 2.5, I have a 3-D tensor of shape (3, 3, 3) where the goal is to compute the L2-norm for each of the three (3, 3) square matrices. The code that I came up with is:
a = tf.random.normal(shape = (3, 3, 3))
a.shape
# TensorShape([3, 3, 3])
a.numpy()
'''
array([[[-0.30071023, 0.9958398 , -0.77897555],
[-1.4251901 , 0.8463568 , -0.6138699 ],
[ 0.23176959, -2.1303613 , 0.01905925]],
[[-1.0487134 , -0.36724553, -1.0881581 ],
[-0.12025198, 0.20973174, -2.1444907 ],
[ 1.4264063 , -1.5857363 , 0.31582597]],
[[ 0.8316077 , -0.7645084 , 1.5271858 ],
[-0.95836663, -1.868056 , -0.04956183],
[-0.16384012, -0.18928945, 1.04647 ]]], dtype=float32)
'''
I am using axis = 2 since the 3rd axis should contain three 3x3 square matrices. The output I get is:
tf.math.reduce_euclidean_norm(input_tensor = a, axis = 2).numpy()
'''
array([[1.299587 , 1.7675754, 2.1430166],
[1.5552354, 2.158075 , 2.15614 ],
[1.8995634, 2.1001325, 1.0759989]], dtype=float32)
'''
How are these values computed? The formula for computing L2-norm is this. What am I missing?
Also, I was expecting three L2-norm values, one for each of the three (3, 3) matrices. The code I have to achieve this is:
tf.math.reduce_euclidean_norm(a[0]).numpy()
# 3.0668826
tf.math.reduce_euclidean_norm(a[1]).numpy()
# 3.4241767
tf.math.reduce_euclidean_norm(a[2]).numpy()
# 3.0293021
Is there any better way to get this without having to explicitly refer to each indices of tensor 'a'?
Thanks!

The formula you linked for computing the L2 norm looks correct. What you have is basically this:
np.sqrt(np.sum((a[0]**2)))
# 3.0668826
np.sqrt(np.sum((a[1]**2)))
# 3.4241767
np.sqrt(np.sum((a[2]**2)))
# 3.0293021
This can be vectorized by the following:
np.sqrt(np.sum(a**2, axis=(1,2)))
Output:
array([3.0668826, 3.4241767, 3.0293021], dtype=float32)
Which is effectively the same as using np.lingalg.norm (or tf.math.reduce_euclidean_norm if you want to use tensorflow)
np.linalg.norm(a, ord=None, axis=(1,2))
Output:
array([3.0668826, 3.4241767, 3.0293021], dtype=float32)
The default keyword ord=None is for calculating the L2 norm per the documentation. The axis keyword is to specify which dimensions we want to reduce which should be clear from the first code snippet.

What is Mean_test_score and STD_Test_Score used for [duplicate]

Hello I'm doing a GridSearchCV and I'm printing the result with the .cv_results_ function from scikit learn.
My problem is that when I'm evaluating by hand the mean on all the test score splits I obtain a different number compared to what it is written in 'mean_test_score'. Which is different from the standard np.mean()?
I attach here the code with the result:
n_estimators = [100]
max_depth = [3]
learning_rate = [0.1]
param_grid = dict(max_depth=max_depth, n_estimators=n_estimators, learning_rate=learning_rate)
gkf = GroupKFold(n_splits=7)
grid_search = GridSearchCV(model, param_grid, scoring=score_auc, cv=gkf)
grid_result = grid_search.fit(X, Y, groups=patients)
grid_result.cv_results_
The result of this operation is:
{'mean_fit_time': array([ 8.92773601]),
'mean_score_time': array([ 0.04288721]),
'mean_test_score': array([ 0.83490629]),
'mean_train_score': array([ 0.95167036]),
'param_learning_rate': masked_array(data = [0.1],
mask = [False],
fill_value = ?),
'param_max_depth': masked_array(data = [3],
mask = [False],
fill_value = ?),
'param_n_estimators': masked_array(data = [100],
mask = [False],
fill_value = ?),
'params': ({'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100},),
'rank_test_score': array([1]),
'split0_test_score': array([ 0.74821666]),
'split0_train_score': array([ 0.97564995]),
'split1_test_score': array([ 0.80089016]),
'split1_train_score': array([ 0.95361201]),
'split2_test_score': array([ 0.92876979]),
'split2_train_score': array([ 0.93935856]),
'split3_test_score': array([ 0.95540287]),
'split3_train_score': array([ 0.94718634]),
'split4_test_score': array([ 0.89083901]),
'split4_train_score': array([ 0.94787374]),
'split5_test_score': array([ 0.90926355]),
'split5_train_score': array([ 0.94829775]),
'split6_test_score': array([ 0.82520379]),
'split6_train_score': array([ 0.94971417]),
'std_fit_time': array([ 1.79167576]),
'std_score_time': array([ 0.02970254]),
'std_test_score': array([ 0.0809713]),
'std_train_score': array([ 0.0105566])}
As you can see, doing the np.mean of all the test_score it gives you a value approximately of 0.8655122606479532 while the 'mean_test_score' is 0.83490629
Thanks for you help,
Leonardo.

I will post this as a new answer since its so much code:
The test and train scores of the folds are: (taken from the results you posted in your question)
test_scores = [0.74821666,0.80089016,0.92876979,0.95540287,0.89083901,0.90926355,0.82520379]
train_scores = [0.97564995,0.95361201,0.93935856,0.94718634,0.94787374,0.94829775,0.94971417]
The amount of training samples in those folds are: (taken from the output of print([(len(train), len(test)) for train, test in gkf.split(X, groups=patients)]))
train_len = [41835, 56229, 56581, 58759, 60893, 60919, 62056]
test_len = [24377, 9983, 9631, 7453, 5319, 5293, 4156]
Then the test- and train-means with the amount of training samples per fold as weight is:
train_avg = np.average(train_scores, weights=train_len)
-> 0.95064898361714389
test_avg = np.average(test_scores, weights=test_len)
-> 0.83490628649308296
So this is exactly the value sklearn gives you. It is also the correct mean accuracy of your classification. The mean of the folds is incorrect in that it depends on the somewhat arbitrary splits/folds you chose.
So in concusion, both explanations were indeed identical and correct.

If you see the original code of GridSearchCV in their github repository, they dont use np.mean() instead they use np.average() with weights. Hence the difference. Here's their code:
n_splits = 3
test_sample_counts = np.array(test_sample_counts[:n_splits],
dtype=np.int)
weights = test_sample_counts if self.iid else None
means = np.average(test_scores, axis=1, weights=weights)
stds = np.sqrt(np.average((test_scores - means[:, np.newaxis])
axis=1, weights=weights))
cv_results = dict()
for split_i in range(n_splits):
cv_results["split%d_test_score" % split_i] = test_scores[:,
split_i]
cv_results["mean_test_score"] = means
cv_results["std_test_score"] = stds
In case you want to know more about the difference between them take a look
Difference between np.mean() and np.average()

I suppose the reason for the different means are different weighting factors in the mean calculation.
The mean_test_score that sklearn returns is the mean calculated on all samples where each sample has the same weight.
If you calculate the mean by taking the mean of the folds (splits), then you only get the same results if the folds are all of equal size. If they are not, then all samples of larger folds will automatically have a smaller impact on the mean of the folds than smaller folds, and the other way around.
Small numeric example:
mean([2,3,5,8,9]) = 5.4 # mean over all samples ('mean_test_score')
mean([2,3,5]) = 3.333 # mean of fold 1
mean([8,9]) = 8.5 # mean of fold 2
mean(3.333, 8.5) = 5.91 # mean of means of folds
5.4 != 5.91

Could someone please help me with sklearn.metrics.roc_curve's use and what does the function expect?

I am trying to construct 2 numpy ndarray-s from a networkx Graph's data structures that look like a list of tuples and a simple list. I would like to make a roc curve where
the validation set is the above mentioned list of tuples of the edges of a G graph that I was trying to construct like this:
x = []
for i in G_orig.nodes():
for j in G_orig.nodes():
if j > I and (i, j) not in G.edges():
if (i, j) in G_orig.edges():
x.append((i, j, 1))
else:
x.append((i, j, 0))
y_validation = np.array(x)
It looks something like this: [(1, 344, 1), (2, 23, 0), (3, 5, 0), ...... (333, 334, 1)].
The first 2 numbers mean 2 nodes, the 3rd one means whether there is an edge between them. 1 means edge, 0 means no edge.
Then roc_curve expects something called y_score in the documentation. I have a list for that made with a method called preferential attachment, therefore I named it pref_att_types. I tried to make a numpy array of it in case the roc_curve expects only it.
positive_class_predicted_probabilities = np.array(pref_att_types)
3.Then I just did what we used in class.
FPRs, TPRs, thresholds = roc_curve(y_validation,
positive_class_predicted_probabilities,
pos_label=1)
It is literally just Ctrl C + Ctrl V. But it says Value error and 'multiclass-multioutput format is not supported'. Please note that I am not a programmer just someone who studies to be a mathematics analyst.

The first argument, y_true, needs to be just the true labels, in this case 0/1 without the pair of nodes. Just be sure that the indices of the arrays y_validation and pref_att_types match

The code below draws the ROC curves for two RF models:
from sklearn.metrics import roc_curve
#create array of probabilities
y_test_predict1_probaRF = rf1.predict_proba(X_test)
y_test_predict2_probaRF = rf2.predict_proba(X_test)
RFfpr1, RFtpr1, thresholds = roc_curve(y_test, y_test_predict1_probaRF[:,1])
RFfpr2, RFtpr2, thresholds = roc_curve(y_test, y_test_predict2_probaRF[:,1])
def plot_roc_curve (fpr, tpr, label = None):
plt.plot(fpr, tpr, linewidth = 2, label = label)
plt.plot([0,1], [0,1], "k--")
plt.axis([0,1,0,1])
plt.xlabel("False positive rate")
plt.ylabel("True positive rate")
plot_roc_curve (RFfpr1,RFtpr1,"RF1")
plot_roc_curve (RFfpr2,RFtpr2,"RF2")
plt.legend()
plt.show()

Is the first axis 0 or 1 in keras.layers.BatchNormalization()?

I am working on the 'Keras_Tutorial_v2a' by Andrew Ng on Coursera and I am confused about the axis parameter in keras.layers.BatchNormalization().
The first few layers of the model are:
X_input = Input(input_shape)
X = Conv2D(32, (3, 3), strides = (1, 1), name = 'conv0')(X_input)
X = BatchNormalization(axis = 3, name = 'bn0')(X)
Where input_shape is the shape of the images of the dataset:(height, width, channels). So it seems that axis=3 is referring to the channels, but shouldn't that be axis=2? I couldn't find documentation specifying this, but usually in python indices and axes begin at 0.
So either axes begins at 1 in this function, or there is something I am missing. Can anyone clarify this for me please? I'm sure it's something simple!

In tutorials and Keras/TensorFlow codebase, you will see axis = 3 or axis = -1. This is what should be chosen, since the channel axis is 3 (or the last one, -1).
If you look in the original documentation, the default is -1 (3rd in essence).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization

In Keras, dimensions ordered as batch_size, height, width, channel. So channel is the axis=3. You want to choose the axis index which represents your channels as you stated.

Implementing word dropout in pytorch

I want to add word dropout to my network so that I can have sufficient training examples for training the embedding of the "unk" token. As far as I'm aware, this is standard practice. Let's assume the index of the unk token is 0, and the index for padding is 1 (we can switch them if that's more convenient).
This is a simple CNN network which implements word dropout the way I would have expected it to work:
class Classifier(nn.Module):
def __init__(self, params):
super(Classifier, self).__init__()
self.params = params
self.word_dropout = nn.Dropout(params["word_dropout"])
self.pad = torch.nn.ConstantPad1d(max(params["window_sizes"])-1, 1)
self.embedding = nn.Embedding(params["vocab_size"], params["word_dim"], padding_idx=1)
self.convs = nn.ModuleList([nn.Conv1d(1, params["feature_num"], params["word_dim"] * window_size, stride=params["word_dim"], bias=False) for window_size in params["window_sizes"]])
self.dropout = nn.Dropout(params["dropout"])
self.fc = nn.Linear(params["feature_num"] * len(params["window_sizes"]), params["num_classes"])
def forward(self, x, l):
x = self.word_dropout(x)
x = self.pad(x)
embedded_x = self.embedding(x)
embedded_x = embedded_x.view(-1, 1, x.size()[1] * self.params["word_dim"]) # [batch_size, 1, seq_len * word_dim]
features = [F.relu(conv(embedded_x)) for conv in self.convs]
pooled = [F.max_pool1d(feat, feat.size()[2]).view(-1, params["feature_num"]) for feat in features]
pooled = torch.cat(pooled, 1)
pooled = self.dropout(pooled)
logit = self.fc(pooled)
return logit
Don't mind the padding - pytorch doesn't have an easy way of using non zero padding in CNNs, much less trainable non-zero padding, so I'm doing it manually. Dropout also doesn't allow me to use non zero dropout, and I want to separate the padding token from the unk token. I'm keeping it in my example because it's the reason for this question's existence.
This doesn't work because dropout wants Float Tensors so that it can scale them properly, while my input is Long Tensors that don't need to be scaled.
Is there an easy way of doing this in pytorch? I essentially want to use LongTensor-friendly dropout (bonus: better if it will let me specify a dropout constant that isn't 0, so that I could use zero padding).

Actually I would do it outside of your model, before converting your input into a LongTensor.
This would look like this:
import random
def add_unk(input_token_id, p):
#random.random() gives you a value between 0 and 1
#to avoid switching your padding to 0 we add 'input_token_id > 1'
if random.random() < p and input_token_id > 1:
return 0
else:
return input_token_id
#than you have your input token_id
#for this example I take just a random number, lets say 127
input_token_id = 127
#let p be your probability for UNK
p = 0.01
your_input_tensor = torch.LongTensor([add_unk(input_token_id, p)])
Edit:
So there are two options which come to my mind which are actually GPU-friendly. In general both solutions should be much more efficient.
Option one - Doing computation directly in forward():
If you're not using torch.utils and don't have plans using it later this is probably the way to go.
Instead of doing the computation before we just do it in the forward() method of main PyTorch class. However I see no (simple) way doing this in torch 0.3.1., so you would need to upgrade to version 0.4.0:
So imagine x is your input vector:
>>> x = torch.tensor(range(10))
>>> x
tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
probs is a vector containing uniform probabilities for dropout so we can check later agains our probability for dropout:
>>> probs = torch.empty(10).uniform_(0, 1)
>>> probs
tensor([ 0.9793, 0.1742, 0.0904, 0.8735, 0.4774, 0.2329, 0.0074,
0.5398, 0.4681, 0.5314])
Now we apply the dropout probabilities probs on our input x:
>>> torch.where(probs > 0.2, x, torch.zeros(10, dtype=torch.int64))
tensor([ 0, 0, 0, 3, 4, 5, 0, 7, 8, 9])
Note: To see some effect I chose a dropout probability of 0.2 here. I reality you probably want it to be smaller.
You can pick for this any token / id you like, here is an example with 42 as unknown token id:
>>> unk_token = 42
>>> torch.where(probs > 0.2, x, torch.empty(10, dtype=torch.int64).fill_(unk_token))
tensor([ 0, 42, 42, 3, 4, 5, 42, 7, 8, 9])
torch.where comes with PyTorch 0.4.0:
https://pytorch.org/docs/master/torch.html#torch.where
I don't know about the shapes of your network, but your forward() should look something like this then (when using mini-batching you need to flatten the input before applying dropout):
def forward_train(self, x, l):
# probabilities
probs = torch.empty(x.size(0)).uniform_(0, 1)
# applying word dropout
x = torch.where(probs > 0.02, x, torch.zeros(x.size(0), dtype=torch.int64))
# continue like before ...
x = self.pad(x)
embedded_x = self.embedding(x)
embedded_x = embedded_x.view(-1, 1, x.size()[1] * self.params["word_dim"]) # [batch_size, 1, seq_len * word_dim]
features = [F.relu(conv(embedded_x)) for conv in self.convs]
pooled = [F.max_pool1d(feat, feat.size()[2]).view(-1, params["feature_num"]) for feat in features]
pooled = torch.cat(pooled, 1)
pooled = self.dropout(pooled)
logit = self.fc(pooled)
return logit
Note: I named the function forward_train() so you should use another forward() without dropout for evaluation / predicting. But you could also use some if conditions with train().
Option two: using torch.utils.data.Dataset:
If you're using Dataset provided by torch.utils it is very easy to do this kind of pre-processing efficiently. Dataset uses strong multi-processing acceleration by default so the the code sample above just has to be executed in the __getitem__ method of your Dataset class.
This could look like this:
def __getitem__(self, index):
'Generates one sample of data'
# Select sample
ID = self.input_tokens[index]
# Load data and get label
# using add ink_unk function from code above
X = torch.LongTensor(add_unk(ID, p=0.01))
y = self.targets[index]
return X, y
This is a bit out of context and doesn't look very elegant but I think you get the idea. According to this blog post of Shervine Amidi at Stanford it should be no problem to do more complex pre-processing steps in this function:
Since our code [Dataset is meant] is designed to be multicore-friendly, note that you
can do more complex operations instead (e.g. computations from source
files) without worrying that data generation becomes a bottleneck in
the training process.
The linked blog post - "A detailed example of how to generate your data in parallel with PyTorch" - provides also a good guide for implementing the data generation with Dataset and DataLoader.
I guess you'll prefer option one - only two lines and it should be very efficient. :)
Good luck!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to specify the weights in a MixtureModel in Julia Distributions.jl? - statistics

Related

Understanding L2-norm output for 3D tensor - TensorFlow2

What is Mean_test_score and STD_Test_Score used for [duplicate]

Could someone please help me with sklearn.metrics.roc_curve's use and what does the function expect?

Is the first axis 0 or 1 in keras.layers.BatchNormalization()?

Implementing word dropout in pytorch

Categories

Resources