I am working on SqueezeNet pruning . I have some questions regarding the pruning code which is based on the paper : PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE
def compute_rank(self, grad):
activation_index = len(self.activations) - self.grad_index - 1
activation = self.activations[activation_index]
values = \
torch.sum((activation * grad), dim = 0, keepdim=True).\
sum(dim=2, keepdim=True).sum(dim=3, keepdim=True)[0, :, 0, 0].data
values = \
values / (activation.size(0) * activation.size(2) * activation.size(3))
if activation_index not in self.filter_ranks:
self.filter_ranks[activation_index] = \
torch.FloatTensor(activation.size(1)).zero_().cuda()
self.filter_ranks[activation_index] += values
self.grad_index += 1
1) Why 'values' uses only in_height (2) and in_width (3) of the activation ? What about in_channels (1) ?
2) Why does filter_ranks[activation_index] depend on in_channels (1) only ?
3) Why activation multiplied with gradient ? and why sum them up ?
Large activation indicates that this filter provides important
features.
Large grad shows that this filter is sensitive to different types of
input
Filters with large activation and large grad are important and not removed
Sum is because only the entire filter can be removed
This is an educated guess for question 3)
Please correct me if wrong.
Related
I'm currently working on an optimization problem and I'm stuck at some point.
Here is a simple case:
I have a set of p parameters in c combinations and try to optimize them on i inputs. This leads to a numpy array of shape (p, c, i). For each of them, I calculate an error leading to a second array of shape (c, i). Now I have to get rid of some combinations before running the next iteration.
For the case i = 1, I use the following code which works fine (arrays are just (p, c) and (c, )):
ensemble = np.vstack((ensemble, error))
# Remove some functions that do not fit
ensemble = ensemble[:, ensemble[p, :] < thres]
# Delete error column
function_ensemble = np.delete(function_ensemble, p, axis=0)
Now I try to generalize this for i > 1:
error[error >= 0.015] = 0
error = error[np.newaxis, :, :]
# Maybe keep them seperate
ensemble = np.concatenate((ensemble, error))
And thats where I'm stuck. What I'm currently thinking of is sorting my ensemble by the error and removing all entries where the error is 0 for all i. So that the ensemble gets smaller. However, as far as I know, np.sort does not work here. Maybe I could use structured arrays but I'm not sure if this would destroy code in other places.
Does anyone have an idea for my issue?
Edit
For case i=1 a running example with p=3 and c=100 could just be:
error = np.random.rand(100)
ensemble = np.ones([3, 100])
ensemble = np.vstack((ensemble, error))
ensemble = ensemble[:, ensemble[3, :] < 0.5]
ensemble = np.delete(ensemble, 3, axis=0)
The result in this case is a subset of my ensemble where the error is smaller than 0.5.
For case i=2 (multiple ensembles) an example with p=3 and c=100 could be:
error = np.random.rand(100, 2)
ensemble = np.ones([3, 100, 2])
error[error >= 0.015] = 0
error = error[np.newaxis, :, :]
ensemble = np.concatenate((ensemble, error))
Again, I want a subset of my ensembles containing all sets of parameters with an error < 0.5. This will require some padding since each ensemble i will have a different size afterwards.
However, the ultimate goal is to iterate and adjust the parameters until only one set remains ending up with an array of size (3,1,2). (Examles above do not show iteration).
Best, Julz
Suppose I need to build a network that takes two inputs:
A patient's information, represented as an array of features
Selected treatment, represented as one-hot encoded array
Now how do I build a network that outputs a 2D probability matrix A where A[i,j] represents the probability the patient will end up at state j under treatment i. Let's say there are n possible states, and under any treatment, the total probability of all n states sums up to 1.
I wanted to do this because I was motivated by a similar network, where the inputs are the same as above, but the output is a 1d array representing the expected lifetime after treatment i is delivered. And such network is built as follows:
def default_dense(feature_shape, n_treatment):
feature_input = keras.layers.Input(feature_shape)
treatment_input = keras.layers.Input((n_treatments,))
hidden_1 = keras.layers.Dense(16, activation = 'relu')(feature_input)
hidden_2 = keras.layers.Dense(16, activation = 'relu')(hidden_1)
output = keras.layers.Dense(n_treatments)(hidden_2)
output_on_action = keras.layers.multiply([output, treatment_input])
model = keras.models.Model([feature_input, treatment_input], output_on_action)
model.compile(optimizer=tf.optimizers.Adam(0.001),loss='mse')
return model
And the training is simply
model.fit(x = [features, encoded_treatments], y = encoded_treatments * lifetime[:, np.newaxis], verbose = 0)
This is super handy because when predicting, I can use np.ones() as the encoded_treatments, and the network gives expected lifetimes under all treatments, thus choosing the best one is one-step. Certainly I can create multiple networks, each for a treatment, but it would be much less efficient.
Now the questions is, can I do the same to probability output?
I have figured it out myself. The trick is to use RepeatVector() and Permute() layers to generate a matrix mask for treatments.
The output is an element-wise Multiply() of the mask and a Softmax() of same size.
I want to use python3 to build a zeroinflatedpoisson model. I found in library statsmodel the function statsmodels.discrete.count_model.ZeroInflatePoisson.
I just wonder how to use it. It seems I should do:
ZIFP(Y_train,X_train).fit().
But when I wanted to do prediction using X_test.
It told me the length of X_test doesn't fit X_train.
Or is there another package to fit this model?
Here is the code I used:
X1 = [random.randint(0,1) for i in range(200)]
X2 = [random.randint(1,2) for i in range(200)]
y = np.random.poisson(lam = 2,size = 100).tolist()
for i in range(100):y.append(0)
df['x1'] = x1
df['x2'] = x2
df['y'] = y
df_x = df.iloc[:,:-1]
x_train,x_test,y_train,y_test = train_test_split(df_x,df['y'],test_size = 0.3)
clf = ZeroInflatedPoisson(endog = y_train,exog = x_train).fit()
clf.predict(x_test)
ValueError:operands could not be broadcat together with shapes (140,)(60,)
also tried:
clf.predict(x_test,exog = np.ones(len(x_test)))
ValueError: shapes(60,) and (1,) not aligned: 60 (dim 0) != 1 (dim 0)
This looks like a bug to me.
As far as I can see:
If there are no explanatory variables, exog_infl, specified for the inflation model, then a array of ones is used to model a constant inflation probability.
However, if exog_infl in predict is None, then it uses the model.exog_infl which is an array of ones with the length equal to the training sample.
As work around specifying a 1-D array of ones of correct length in predict should work.
Try:
clf.predict(test_x, exog_infl=np.ones(len(test_x))
I guess the same problem will occur if exposure was used in the model, but is not explicitly specified in predict.
I ran into the same problem, landing me on this thread. As noted by Josef, it seems like you need to provide exog_infl with a 1-D array of ones of correct length to work.
However, the code Josef provided misses the 1-D array-part, so the full line required to generate the required array is actually
clf.predict(test_x, exog_infl=np.ones((len(test_x),1))
The K-means method cannot deal with anistropic points. The DBSCAN and Gaussian Mixture model seems that they can work with this according to scikit-learn. I have tried to use both approaches, but they are not working for my dataset.
DBSCAN
I used the following code:
db = DBSCAN(eps=0.1,min_samples=5 ).fit(X_train,Y_train)
labels_train=db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels_train)) - (1 if -1 in labels_train else 0)
print('Estimated number of clusters: %d' % n_clusters_)
and only 1 cluster (Estimated number of clusters: 1) was detected as shown here.
Gaussian Mixture model
The code was as follows:
gmm = mixture.GaussianMixture(n_components=2, covariance_type='full')
gmm.fit(X_train,Y_train)
labels_train=gmm.predict(X_train)
print(gmm.bic(X_train))
The two clusters could not be distinguished as shown here.
How can i detect two clusters?
Read the documentation.
fit(X, y=None, sample_weight=None)
X : array or sparse (CSR) matrix of shape (n_samples, n_features) [...]
...
y : Ignored
So your invocation ignores the y coordinate.
Don't we all love python/sklearn, because it doesn't even warn you of this, but silently ignores y?
X should be the entire data, not just the x coordinates.
The notion of "train" and "predict" does not make sense for clustering. Don't use it. Only use fit_predict.
I am following RNN tutorial of Tensorflow.
I am having trouble understanding the function ptb_producer in reader.py in following script :
with tf.control_dependencies([assertion]):
epoch_size = tf.identity(epoch_size, name="epoch_size")
i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()
x = tf.strided_slice(data, [0, i * num_steps],[batch_size, (i + 1) * num_steps])
x.set_shape([batch_size, num_steps])
y = tf.strided_slice(data, [0, i * num_steps + 1],[batch_size, (i + 1) * num_steps + 1])
y.set_shape([batch_size, num_steps])
return x, y
Can anyone explain what tf.train.range_input_producer is doing ?
I have been trying to understand the same tutorial for weeks now. In my opinion, what makes it so difficult is the fact that all the functions one calls from TensorFlow are not executed immediately, but rather add their corresponding operation nodes to the graph.
According to the official documentation, a Range Input Producer 'generates integers from 0 to limit - 1 in a queue'. So, the way I see it, the code in question i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue() creates a node, which acts as a counter, producing the next number in the sequence 0:(epoch_size) once executed.
This is used to get the next batch from the input data. The raw data is split into batch_size rows, so that in every run batch_size batches are given to the training function. In every batch (row), a sliding window of size num_steps moves forward. The counter i allows the window to move forward by num_steps in every call.
Both x and y are of shape [batch_size, num_steps], since they contain batch_size batches of num_steps steps each. Variable x is the input and y is the expected output for the given input (it is produced by moving the window one item to the left, so that iff x = data[i:(i + num_steps] then y = data[(i + 1):(i + num_steps + 1)].
It has been a nightmare for me, but I hope this post helps people in the future.