I'm pretty certain this is trivial, but I haven't yet managed to quite get my head around scan. I want to iteratively build a matrix of values, m, where
m[i,j] = f(m[k,l]) for k < i, j < l
so you could think of it as a dynamic programming problem. However, I can't even generate the list [1..100] by iterating over the list [1..100] and updating the shared value as I go.
import numpy as np
import theano as T
import theano.tensor as TT
def test():
arr = T.shared(np.zeros(100))
def grid(idx, arr):
return {arr: TT.set_subtensor(arr[idx], idx)}
T.scan(
grid,
sequences=TT.arange(100),
non_sequences=[arr])
return arr
run = T.function([], outputs=test())
run()
which returns
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0.])
There's a few things here that point towards some misunderstandings. scan really can be a hard bit of Theano to wrap your head around!
Here's some updated code that does what I think you're trying to do, but I wouldn't recommend using this code at all. The basic issue is that you seem to be using a shared variable inappropriately.
import numpy as np
import theano as T
import theano.tensor as TT
def test():
arr = T.shared(np.zeros(100))
def grid(idx, arr):
return {arr: TT.set_subtensor(arr[idx], idx)}
_, updates = T.scan(
grid,
sequences=TT.arange(100),
non_sequences=[arr])
return arr, updates
outputs, updates = test()
run = T.function([], outputs=outputs, updates=updates)
print run()
print outputs.get_value()
This code is changed from the original in two ways:
The updates from the scan have to be captured (originally discarded) and passed to the theano.function's updates parameters. Without this the shared variable won't be updated at all.
The contents of the shared variable need to be examined after the function is executed (see below).
This code prints two sets of values. The first is the output of the Theano function from when it's executed. The second is the contents of the shared variable after the Theano function has executed. The Theano function returns the shared variable so you might think that these two sets of values should be the same, but you'd be wrong! No shared variables are updated until after all of the function's output values have been computed. So it's only after the function has been executed and we look at the contents of the shared variable that we see the values we expected to see originally.
Here's an example of implementing a dynamic programming algorithm in Theano. The algorithm is a simplified version of dynamic time warping which has a lot of similarities to edit distance.
import numpy
import theano
import theano.tensor as tt
def inner_step(j, c_ijm1, i, c_im1, x, y):
insert_cost = tt.switch(tt.eq(j, 0), numpy.inf, c_ijm1)
delete_cost = tt.switch(tt.eq(i, 0), numpy.inf, c_im1[j])
match_cost = tt.switch(tt.eq(i, 0), numpy.inf, c_im1[j - 1])
in_top_left = tt.and_(tt.eq(i, 0), tt.eq(j, 0))
min_c = tt.min(tt.stack([insert_cost, delete_cost, match_cost]))
c_ij = tt.abs_(x[i] - y[j]) + tt.switch(in_top_left, 0., min_c)
return c_ij
def outer_step(i, c_im1, x, y):
outputs, _ = theano.scan(inner_step, sequences=[tt.arange(y.shape[0])],
outputs_info=[tt.constant(0, dtype=theano.config.floatX)],
non_sequences=[i, c_im1, x, y], strict=True)
return outputs
def main():
x = tt.vector()
y = tt.vector()
outputs, _ = theano.scan(outer_step, sequences=[tt.arange(x.shape[0])],
outputs_info=[tt.zeros_like(y)],
non_sequences=[x, y], strict=True)
f = theano.function([x, y], outputs=outputs)
a = numpy.array([1, 2, 4, 8], dtype=theano.config.floatX)
b = numpy.array([2, 3, 4, 7, 8, 9], dtype=theano.config.floatX)
print a
print b
print f(a, b)
main()
This is highly simplified and I wouldn't recommend using it for real. In general Theano is very bad at doing dynamic programming because theano.scan is so slow in comparison to native looping. If you need to propagate gradients through a dynamic program then you may not have any choice but if you don't need gradients you should probably avoid using Theano for dynamic programming.
If you want a much more thorough implementation of DTW which gets over some of the performance hits Theano imposes by computing many comparisons in parallel (i.e. batching) then take a look here: https://github.com/danielrenshaw/TheanoBatchDTW.
Related
I'm new on PyTorch and I'm trying to code with it
so I have a function called OH which tack a number and return a vector like this
def OH(x,end=10,l=12):
x = T.LongTensor([[x]])
end = T.LongTensor([[end]])
one_hot_x = T.FloatTensor(1,l)
one_hot_end = T.FloatTensor(1,l)
first=one_hot_x.zero_().scatter_(1,x,1)
second=one_hot_end.zero_().scatter_(1,end,1)
vector=T.cat((one_hot_x,one_hot_end),dim=1)
return vector
OH(0)
output:
tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 1., 0.]])
now I have a NN that takes this output and return number but this warning always appear in my compiling
online.act(OH(obs))
output:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:17: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
4
I tried to to use online.act(OH(obs).clone().detach()) but it give me the same warning
and the code works fine and give good results but I need to understand this warning
Edit
the following is my NN that has the act function
class Network(nn.Module):
def __init__(self,lr,n_action,input_dim):
super(Network,self).__init__()
self.f1=nn.Linear(input_dim,128)
self.f2=nn.Linear(128,64)
self.f3=nn.Linear(64,32)
self.f4=nn.Linear(32,n_action)
#self.optimizer=optim.Adam(self.parameters(),lr=lr)
#self.loss=nn.MSELoss()
self.device=T.device('cuda' if T.cuda.is_available() else 'cpu')
self.to(self.device)
def forward(self,x):
x=F.relu(self.f1(x))
x=F.relu(self.f2(x))
x=F.relu(self.f3(x))
x=self.f4(x)
return x
def act(self,obs):
state=T.tensor(obs).to(device)
actions=self.forward(state)
action=T.argmax(actions).item()
return action
the problem is that you are receiving a tensor on the act function on the Network and then save it as a tensor
just remove the tensor in the action like this
def act(self,obs):
#state=T.tensor(obs).to(device)
state=obs.to(device)
actions=self.forward(state)
action=T.argmax(actions).item()
I'm trying to determine p and q values for an ARMA model. The time series is already stationary and I was looking to ACF and PACF plots, but I need to get those p and q values "on the go" (like performing a simulation).
I noticed that in statsmodels there are actually two functions for acf and pacf, but I'm not understanding how to use them properly.
This is how the code looks like
from statsmodels.tsa.stattools import acf, pacf
>>>acf(data,qstat=True)
(array([1. , 0.98707179, 0.9809318 , 0.9774078 , 0.97436479,
0.97102392, 0.96852746, 0.96620799, 0.9642253 , 0.96288455,
0.96128443, 0.96026672, 0.95912503, 0.95806287, 0.95739194,
0.95622575, 0.9545498 , 0.95381055, 0.95318588, 0.95203675,
0.95096276, 0.94996035, 0.94892427, 0.94740811, 0.94582933,
0.94420572, 0.9420396 , 0.9408416 , 0.93969163, 0.93789606,
0.93608273, 0.93413445, 0.93343312, 0.93233588, 0.93093149,
0.93033546, 0.92983324, 0.92910616, 0.92830326, 0.92799811,
0.92642784]),
array([ 2916.11296684, 5797.02377904, 8658.22999328, 11502.6002944 ,
14328.44503612, 17140.72034976, 19940.48013538, 22729.69637912,
25512.09429552, 28286.18290207, 31055.33003897, 33818.82409725,
36577.1270353 , 39332.49361223, 42082.0755955 , 44822.94911057,
47560.49941212, 50295.38504714, 53024.59880222, 55748.57526173,
58467.72758802, 61181.8659989 , 63888.25003765, 66586.53110019,
69276.46332225, 71954.97102175, 74627.57217707, 77294.54406888,
79952.23080669, 82600.54514273, 85238.73829645, 87873.86209917,
90503.68343426, 93126.47509834, 95746.79574474, 98365.17422285,
100980.34471949, 103591.88164688, 106202.58634768, 108805.3453693 ]),
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.]))
>>>pacf(data)
array([ 1. , 0.98740203, 0.26463067, 0.18709112, 0.11351714,
0.0540612 , 0.06996315, 0.05159168, 0.05358487, 0.06867607,
0.03915513, 0.06099868, 0.04020074, 0.0390229 , 0.05198753,
0.01873783, -0.00169158, 0.04387457, 0.03770717, 0.01360295,
0.01740693, 0.01566421, 0.01409722, -0.00988412, -0.00860644,
-0.00905181, -0.0344616 , 0.0199406 , 0.01123293, -0.02002155,
-0.01415968, -0.0266674 , 0.03583483, 0.0065682 , -0.00483241,
0.0342638 , 0.02353691, 0.01704061, 0.01292073, 0.03163407,
-0.02838961])
How can I get p and q with this functions? The acf function returns only 1 array if qstat is set to False
Selecting the order of an ARMA(p,q) model using estimated ACFs/PACFs is usually not the best approach. This is simply because in case of an ARMA process both the ACF and PACF slowly decay (in absolute terms) for increasing lags. So you cannot really infer the lag order from it. Instead they are mostly used for pure AR/MA models in which you observe a clear cutoff in either of the two series (but even then it is more of a graphical approach).
If you want to determine p and q "on the fly" for an ARMA model it seems more reasonable to use information criteria (e.g. AIC, BIC, etc.). statsmodels provides the function arma_order_select_ic() for this very purpose. So what you want is something like this:
from statsmodels.tsa.stattools import arma_order_select_ic
arma_order_select_ic(data, max_ar=4, max_ma=4, ic='bic')
My Sequential CNN model is trained on 39 classes as a multi-class classifier. As for predictions, it returns a one-hot encoded array like [0,0,...1,0,...] whereas I want something like [0.012,0.022,0.067,...,0.997,0.0004,...]
Is there a way to get this? if not what exactly should I make to get these?
The reason I want it this way is to verify how close are other classes, so if one says 0.98 and others say 0.96 then I am doing something wrong, data isn't enough, etc..
Thank you :)
My model is basically a keras.model resnet50 with following configs :
model = keras.applications.resnet.ResNet50(include_top=False, weights=None, input_tensor=None, input_shape=(64,64,1), pooling='avg', classes=39)
x = model.output
x = Dropout(0.7)(x)
num_classes = 39
predictions = Dense(num_classes, activation= 'softmax')(x)
model = Model(inputs = model.input, outputs = predictions)
optimizer = keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer, loss='categorical_crossentropy', metrics=['categorical_accuracy'], loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)
Sample input :
import cv2
img = cv2.imread(IMAGE_PATH, 0)
img = cv2.resize(img, (64,64))
img = np.reshape(img, (1,64,64,1))
predicted_class_indices = np.argmax(model.predict(img, verbose = 1))
Sample output:
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
Desired output (numbers are hypothetical):
array([[0.022, 0.353, 0.0535, 0.52, 0212., 0.822, 0.532, 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
One way to do so is to remove the last activation layer (related issue).
You can do so by using model.layers[-1].activation=None.
However, the softmax shouldn't output a one-hot vector but the prob distribution, you might want to check how your training is doing.
With a generator I create the random batch like:
import torch
n = 10
batch_size = 2
x = torch.zeros((batch_size, n), dtype=torch.float)
in_flags = torch.randint(n, (batch_size,), dtype=torch.long)
for idx, row in enumerate(x):
row[in_flags[idx]] = 1.0
But the disadvantage of that is that loop runs in Python.
That is the original meaning of embedding (do not confuse that with PyTorch nn.embedding). Is it possible to do with one PyTorch operator to make it be executed native or in GPU?
You can do like this:
import torch
n = 10
batch_size = 2
in_flags = torch.randint(n, (batch_size,), dtype=torch.long)
x = torch.zeros((batch_size, n), dtype=torch.float)
# this is how you can do this
x[torch.arange(batch_size), in_flags] = 1.0
print(in_flags)
print(x)
Output:
tensor([8, 0])
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
I am trying to do word embeddings in Keras. I am using 'glove.6B.50d.txt' for the purpose. I am able to get correct output till the preparation of embedding index from the "glove.6B.50d.txt" file.
But I'm always getting embedding matrix full of zeros whenever I map the word from the input provided by me to that in the embedding index.
Here is the code:
#here is the example sentence given as input
line="The quick brown fox jumped over the lazy dog"
line=line.split(" ")
#this is my embedding file
EMBEDDING_FILE='glove.6B.50d.txt'
embed_size = 10 # how big is each word vector
max_features = 10000 # how many unique words to use (i.e num rows in embedding vector)
maxlen = 10 # max number of words in a comment to use
tokenizer = Tokenizer(num_words=max_features,split=" ",char_level=False)
tokenizer.fit_on_texts(list(line))
list_tokenized_train = tokenizer.texts_to_sequences(line)
sequences = tokenizer.texts_to_sequences(line)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X_t = pad_sequences(list_tokenized_train, maxlen=maxlen)
print(sequences)
print(word_index)
print('Shape of data tensor:', X_t.shape)
#got correct output here as
# Found 8 unique tokens.
#[[1], [2], [3], [4], [5], [6], [1], [7], [8]]
#{'the': 1, 'quick': 2, 'brown': 3, 'fox': 4, 'jumped': 5, 'over': 6, 'lazy': 7, 'dog': 8}
# Shape of data tensor: (9, 10)
#loading the embedding file to prepare embedding index matrix
embeddings_index = {}
for i in open(EMBEDDING_FILE, "rb"):
values = i.split()
word = values[0]
#print(word)
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
print('Found %s word vectors.' % len(embeddings_index))
#Found 400000 word vectors.
#making the embedding matrix
embedding_matrix = np.zeros((len(word_index) + 1, embed_size))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
Here when I print the embedding matrix ,I get all zeros in it (i.e not a single word in input is recognized).
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Also if I print the embeddings_index.get(word) for each iteration, it is unable to fetch the word and returns NONE.
Where am I going wrong in the code?
The embed size should be 50 not 10 (it indicates the dimensionality of the word embedding )
The number of features should >>50 (make it close to 10,000). Restricting it to 50 means a whole lot of the vectors will be missing
Got the problem solved today.
Seems like embeddings_index.get(word) was unable to get the word because of some encoding issues.
I changed for i in open(EMBEDDING_FILE, "rb"): present in the preparation of embedding matrix to for i in open(EMBEDDING_FILE, 'r', encoding='utf-8'):
and this solved the problem.