Is it possible to modify a cp_sat model after construction? - constraint-programming

I have a model for finding a particular class of integer numbers (the "Keith numbers"), which works well, but is quite slow as it requires constructing a new model many times. Is there a way to update a model, in particular to change the coefficient in the constraint. In other words, change the model to match a different mat, without reconstructing the whole thing?
def _construct_model(self, mat):
model = cp_model.CpModel()
digit = [model.NewIntVar(0, 9, f'digit[{i}]') for i in range(self.k)]
# Creates the constraint.
model.Add(sum([mat[i] * digit[i] for i in range(self.k)]) == 0)
model.Add(digit[0] != 0)
return model, digit

Yes, but you are on your own.
You can access the underlying cp_model_proto protobuf from the model, and modify it directly.
He have no plan currently to add a modification API on top of the cp_model API.

Related

Dictionary of Constraints Python SciPy.Optimize

I'm working on creating a dictionary of constraints for a large SCED power problem for minimization. However, I'm being given a ValueError saying an unknown type is passed despite only using Optimize.LinearConstraints at present. When I change to NonlinearConstraints (shown below), indicating that 'NonlinearConstraint' object has no attribute 'A'.
I have a feeling it's due to recursive elements, as even using a single constraint as I've defined them returns the same error
Any idea how I can create the recursive linear constraints?
##EDIT
I've been told to copy the code and provide a bit more context. "gen_supply_seg" is a three dimensional array that, depending on different points in time, has different constraints
def con2a():
for t in range(len(LOAD)):
for g in range(len(GEN)):
nlc2a = optimize.NonlinearConstraint(gen_supply_seg[t,g,1],lb=0,ub=P2Max[g])
return(nlc2a)
def con2b():
for t in range(len(LOAD)):
for g in range(len(GEN)):
nlc2b = optimize.NonlinearConstraint(gen_supply_seg[t,g,2],lb=0,ub=P3Max[g])
return (nlc2b)
def con2c():
for t in range(len(LOAD)):
for g in range(len(GEN)):
nlc2c = optimize.NonlinearConstraint(gen_supply_seg[t,g,3],lb=0,ub=P4Max[g])
return (nlc2c)
con2a = con2a()
con2b = con2b()
con2c = con2c()
These constraints are then added to a set like shown
cons = (con2a,
con2b,
con2c)

What initializer does 'uniform' use?

Keras offers a variety of initializers for weights and biases. Which one does 'uniform' use?
I would think it would be RandomUniform, but this is not confirmed in the documentation, and I reached a dead-end in the source-code: the key 'uniform' is used as a global variable within the module, and I cannot find where the variable uniform is set.
One other way to confirm this is to look at the initializers source code:
# Compatibility aliases
zero = zeros = Zeros
one = ones = Ones
constant = Constant
uniform = random_uniform = RandomUniform
normal = random_normal = RandomNormal
truncated_normal = TruncatedNormal
identity = Identity
orthogonal = Orthogonal
I think today's answer is better, though.
Simpler solution:
From the interactive prompt,
import keras
keras.initializers.normal
# Out[3]: keras.initializers.RandomNormal
keras.initializers.uniform
# Out[4]: keras.initializers.RandomUniform
Original post:
Running the debugger to the deserialize method in initializers.py
and examining
globals()['uniform']
Shows that the value is indeed
<class 'keras.initializers.RandomUniform'>
Similarly, 'normal' is shown in the debugger to be <class 'keras.initializers.RandomNormal'>.
Note that uniform often works better than normal, and the theoretical advantages of one over the other is not clear.

Seq2seq for non-sentence, float data; stuck configuring the decoder

I am trying to apply sequence-to-sequence modelling to EEG data. The encoding works just fine, but getting the decoding to work is proving problematic. The input-data has the shape None-by-3000-by-31, where the second dimension is the sequence-length.
The encoder looks like this:
initial_state = lstm_sequence_encoder.zero_state(batchsize, dtype=self.model_precision)
encoder_output, state = dynamic_rnn(
cell=LSTMCell(32),
inputs=lstm_input, # shape=(None,3000,32)
initial_state=initial_state, # zeroes
dtype=lstm_input.dtype # tf.float32
)
I use the final state of the RNN as the initial state of the decoder. For training, I use the TrainingHelper:
training_helper = TrainingHelper(target_input, [self.sequence_length])
training_decoder = BasicDecoder(
cell=lstm_sequence_decoder,
helper=training_helper,
initial_state=thought_vector
)
output, _, _ = dynamic_decode(
decoder=training_decoder,
maximum_iterations=3000
)
My troubles start when I try to implement inference. Since I am using non-sentence data, I do not need to tokenize or embed, because the data is essentially embedded already. The InferenceHelper class seemed the best way to achieve my goal. So this is what I use. I'll give my code then explain my problem.
def _sample_fn(decoder_outputs):
return decoder_outputs
def _end_fn(_):
return tf.tile([False], [self.lstm_layersize]) # Batch-size is sequence-length because of time major
inference_helper = InferenceHelper(
sample_fn=_sample_fn,
sample_shape=[32],
sample_dtype=target_input.dtype,
start_inputs=tf.zeros(batchsize_placeholder, 32), # the batchsize varies
end_fn=_end_fn
)
inference_decoder = BasicDecoder(
cell=lstm_sequence_decoder,
helper=inference_helper,
initial_state=thought_vector
)
output, _, _ = dynamic_decode(
decoder=inference_decoder,
maximum_iterations=3000
)
The Problem
I don't know what the shape of the inputs should be. I know the start-inputs should be zero because it is the first time-step. But this throws errors; it expects the input to be (1,32).
I also thought I should pass the output of each time-step unchanged to the next. However, this raises problems at run-time: the batch-size varies, so the shape is partial. The library throws an exception at this as it tries to convert the start_input to a tensor:
...
self._start_inputs = ops.convert_to_tensor(
start_inputs, name='start_inputs')
Any ideas?
This is a lesson in poor documentation.
I fixed my problem, but failed to address the variable batch-size problem.
The _end_fn was causing problems I was unaware of. I also managed to work out what the appropriate fields are for the InferenceHelper. I've given the fields names in case anyone needs guidance in future
def _end_fn(_):
return tf.tile([False], [batchsize])
inference_helper = InferenceHelper(
sample_fn=_sample_fn,
sample_shape=[lstm_number_of_units], # In my case, 32
sample_dtype=tf.float32, # Depends on the data
start_inputs=tf.zeros((batchsize, lstm_number_of_units)),
end_fn=_end_fn
)
As for the batch-size problem, there are two things I'm considering:
Changing the internal state of my model object. My TensorFlow computation graph is built inside a class. A class-field records the batch-size. Changing this during training may work. Or:
Pad the batches so that they are 200 sequences long. This will waste time.
Preferably I'd like a way to dynamically manage the batch-sizes.
EDIT: I found a way. It involves simply substituting square-brackets for parentheses:
inference_helper = InferenceHelper(
sample_fn=_sample_fn,
sample_shape=[self.lstm_layersize],
sample_dtype=target_input.dtype,
start_inputs=tf.zeros([batchsize, self.lstm_layersize]),
end_fn=_end_fn
)

Doc2Vec.infer_vector keeps giving different result everytime on a particular trained model

I am trying to follow the official Doc2Vec Gensim tutorial mentioned here - https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb
I modified the code in line 10 to determine best matching document for the given query and everytime I run, I get a completely different resultset. My new code iin line 10 of the notebook is:
inferred_vector = model.infer_vector(['only', 'you', 'can', 'prevent', 'forest', 'fires'])
sims = model.docvecs.most_similar([inferred_vector], topn=len(model.docvecs))
rank = [docid for docid, sim in sims]
print(rank)
Everytime I run the piece of code, I get different set of documents that are matching with this query: "only you can prevent forest fires". The difference is stark and just does not seem to match.
Is Doc2Vec not a suitable match for querying and information extraction? Or are there bugs?
Look into the code, in infer_vector you are using parts of the algorithm that is non-deterministic. Initialization of word vector is deterministic - see the code of seeded_vector, but when we look further, i.e., random sampling of words, negative sampling (updating only sample of word vector per iteration) could cause non-deterministic output (thanks #gojomo).
def seeded_vector(self, seed_string):
"""Create one 'random' vector (but deterministic by seed_string)"""
# Note: built-in hash() may vary by Python version or even (in Py3.x) per launch
once = random.RandomState(self.hashfxn(seed_string) & 0xffffffff)
return (once.rand(self.vector_size) - 0.5) / self.vector_size
Set negative=0 to avoid randomization:
import numpy as np
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
documents = [list('asdf'), list('asfasf')]
documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(documents)]
model = Doc2Vec(documents, vector_size=20, window=5, min_count=1, negative=0, workers=6, epochs=10)
a = list('test sample')
b = list('testtesttest')
for s in (a, b):
v1 = model.infer_vector(s)
for i in range(100):
v2 = model.infer_vector(s)
assert np.all(v1 == v2), "Failed on %s" % (''.join(s))

Adding documents to gensim model

I have a class wrapping the various objects required for calculating LSI similarity:
class SimilarityFiles:
def __init__(self, file_name, tokenized_corpus, stoplist=None):
if stoplist is None:
self.filtered_corpus = tokenized_corpus
else:
self.filtered_corpus = []
for convo in tokenized_corpus:
self.filtered_corpus.append([token for token in convo if token not in stoplist])
self.dictionary = corpora.Dictionary(self.filtered_corpus)
self.corpus = [self.dictionary.doc2bow(text) for text in self.filtered_corpus]
self.lsi = models.LsiModel(self.corpus, id2word=self.dictionary, num_topics=100)
self.index = similarities.MatrixSimilarity(self.lsi[self.corpus])
I now want to add a function to the class to allow adding documents to the corpus and updating the model accordingly.
I've found dictionary.add_documents, and model.add_documents, but there are two things that aren't clear to me:
When you originally create the LSI model, one of the parameters the function receives is id2word=dictionary. When updating the model, how do you tell it to use the updated dictionary? Is it actually unnecessary, or will it make a difference?
How do I update the index? It looks from the documentation that if I use the Similarity class, and not the MatrixSimilarity class, I can add documents to the index, but I don't see such functionality for MatrixSimilarity. If I understood correctly, the MatrixSimilarity is better if my input corpus contains dense vectors (which is does, because I'm using the LSI model). Do I have to change it to Similarity just so that I can update the index? Or, conversely, what's the complexity of creating this index? If it's insignificant, should I just create a new index with my updated corpus, as follows:
Code:
self.dictionary.add_documents(new_docs) # new_docs is already after filtering stop words
new_corpus = [self.dictionary.doc2bow(text) for text in new_docs]
self.lsi.add_documents(new_corpus)
self.index = similarities.MatrixSimilarity(self.lsi[self.corpus])
Thanks. :)
will it seems that it doesn't update the dictionary.. it just add a new documents not new features.. so you should take a different approach..
I had the same problem and I found this issue on the gensim githup helpful

Resources