I am using the recurrent Gaussian Process library. I believe the code is developed by older versions of python and pytorch. I ran one of the experiments of the model after cloning the repository
python ./testing/rnn_rgp_test.py
I got this error message from this line of the rnn_encoder.py script:
./RGP/autoreg/rnn_encoder.py", line 274, in backward_computation
torch.autograd.backward( variables=self.forward_means_list + self.forward_vars_list,
TypeError: backward() got an unexpected keyword argument 'variables'
I will be grateful if someone can point out how I can fix this error?
Version 0.3.1 of PyTorch seems to be the last version with the variables parameter. Ideally, the RGP library should have documented which version of their dependencies they use but they didn't. Given that their Git repo seems to be inactive, you have several choices:
Use old versions of whatever libraries they require. You will have to go from one error to the next, hoping that things work as intended.
Fork RGP and re-implement the logic with current libraries. This will likely involve significant coding and may not even be possible at all.
Try to find a different library that implements RGPs.
I'm using a self-written model in Rcpp which works fine when using Rcpp 1.0.6 or 1.0.5. But after updating to Rcpp 1.0.7 the model run crashes right after executing the R function to start the model. (However, compilation with sourceCpp() works without any error or warning.)
The code in Rcpp is organized as follows: there are several functions written in different c++-files and these functions are loaded with header files into my runModel.cpp file that defines my function that is exported to R to run the code.
This function is used like this runModel(DateVector SimPeriod, List ModelInput, NumericVector Settings). Maybe it is worth noting that functions in different c++-files are using the same variables and changing them sometimes, so I had to write also something like initModel.cpp and a correspondent header file which is imported in almost every c++ files.
I was already looking in the https://cran.r-project.org/web/packages/Rcpp/news.html to relate the made changes in 1.0.7 to my issue. But unfortunately, I have no idea what might be the reason for the crash. I'm appreciating every comment on this.
I'm sorry that I cannot give a reproducible example but the model code is too complex to create one (especially because I do not know where the error is hidden.)
I have trained the model on a super computer(ubuntu). After the training I used the model with Windows 10 and got this error:
SourceChangeWarning: source code of class 'torch.nn.modules.linear.Linear' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
I can't load the model I trained.
pytorch version in ubuntu: 1.1.0a0+9a7bcac
pytorch version in windows: 0.4.1
What is going wrong and how can I fix it?
EDIT after comment discussion:
It seems like yor Windows version is outdated and there is thus a version conflict occurring. I would strongly suggest to update the version on Windows to post 1.0 release, which should fix the problem.
According to this link, you can likely ignore the warning (not an error), as long as your model seems to still work as intended. The usual culprit for such changes is that you have inconsistent versions of PyTorch on your two systems, and therefore might encounter this warning.
Generally, the versions are supposed to be fully backward compatible, but of course there is no guarantee for this. It has nothing to do with the fact that you are on Linux and/or Windows, unless the source code detects changes in the line break character (which is the main difference from what I remember), although I think this is very unlikely to be the case.
Thank dennlinger very much!
Is the problem of version.
After I update the pytorch version, it work!
Background
I use an Anaconda environment in Windows 10, made following this post by Mike Müller:
conda create -n keras python=3.6
conda activate keras
conda install keras
This environment has Python 3.6.8, Keras 2.2.4, TensorFlow 1.12.0, and NumPy 1.16.1.
I was working on optimizing code for a team I just joined when I found I can't even run their code. I reduced it to a test case with an MCVE (at least, for me; apologies for not being able to give a testable example):
class TestEvaluation(unittest.TestCase):
def setUp(self):
# In-house function loads inputs and labels properly.
self.inputs, self.labels = load_data()
# Using a pretrained model, known to work.
self.model = keras.models.load_model('model_name.h5')
# Passes, and is loaded successfully.
self.assertIsNotNone(self.model)
def test_model_evaluation(self):
# Fails on my machine, reporting high loss and 0% accuracy.
scores = self.model.evaluate(self.inputs, self.labels)
accuracy = scores[1] * 100
self.assertAlmostEqual(accuracy, 93, delta=5)
Research
This exact scenario runs perfectly fine from someone else's computer, so we've deduced the following: we have the same code, model, and data. Therefore, it should be the environment, right?
I built more Anaconda environments to reproduce the version numbers that work on their machine. However, this didn't fix it. Moreover, this seems to be an issue that not many other people have had, as far I've found by searching online.
I went through the following other environments:
Python 3.6.4, Keras 2.2.4, TensorFlow 1.12.0, NumPy 1.16.2
(The one that worked for someone else, though admittedly without Anaconda)
Python 3.5.2, Keras 2.2.2, TensorFlow 1.10.0, NumPy 1.15.2
Question
The model is pretrained, the validation set is correctly loaded, but Keras fails to report the ~93% accuracy I'm expecting.
How can I fix this issue of getting 0% accuracy?
Update
I've learned a lot more about the situation. I found that installing a Python 3.6 environment on Ubuntu 18.04 got me to random guessing (~25% accuracy). So, it's no longer 0%! Further, I tried to replicate a machine that's been used for testing a lot, which had Ubuntu 16.04.5. This got me to ~46% accuracy. I wasn't able to perfectly replicate it since Ubuntu forced me to update to 16.04.6 when I installed some packages, and I also don't know how they run things on the machine they test with (I tried myself, and it didn't work).
I also learned that the guy who compiled and saved the model was using MacOS High Sierra, but he also gets it to work in the lab environment. I'll need to follow up on that.
Further, I kept searching online and found others with the same issue:
Keras issue #7676 - An open issue for nearly 2 years. The OP reported his saved model works differently on different machines, which sounds a lot like my problem.
Keras issue #4875 - An open issue for over 2 years. This particular comment seems to be the common solution. I'm not sure if this will solve the problem or not, and I don't actually have the code that compiled this model. However, it seems that many people found issues in how their model was built and saved, so I might need to investigate this further...
I apologize for claiming a solution before, I was ecstactic to see that assertNotEqual(accuracy, 0) passed.
Be Aware
I previously wrote an incorrect answer, and this may very well be another poorly-formed solution. Please be aware I haven't fully tested this hypothesis. Also be aware that this is still an open issue in the Keras community and many people have messed things up in a number of ways to arrive at this problem.
Developing Our Solution
Let Person A be the guy who can run the model okay on our lab computers, as well as his MacBook. Let Person B be the one who can't (i.e. me and everyone else).
I got my team to take this problem more seriously. We got to the point where A has a terminal open at a desktop next to B. A runs the test script and gets 92% accuracy. B runs the script and gets 2%. At this point, we were on the same machine using the exact same Python environment and Keras settings (~/.keras). We were also sure that we had the same script, model, and data. Or, so we thought.
I chose to doubt everything at that point. I scp'd the script, model, and data from A's account to B's account. It worked. Here's what that could mean as a solution:
A Guess at the Problem
The files B had were bad. B got them from team storage on Google Drive, as well as Slack. Further, some were delivered by A through his MacBook. The scripts were genuinely the same. The model and data B had actually differed in binary, but had the same size in bytes, looked "similar" in binary, and could've possibly been an encoding issue.
It wasn't Google Drive. I uploaded and re-downloaded the correct files, and nothing went wrong. However, the wrong file was there to begin with.
Possibly Slack? Perhaps Slack was corrupting the encoding when B downloaded A's files.
Possibly it coming from a MacBook? MacOS generates a lot of .DS_Store-like files, and I don't know much about it. MacOS might've played a role in the model and data being OS-dependent. I wouldn't rule it out simply because I'm ignorant of how that OS operates. I heavily suspect this though because I happen to have a spare MacBook, and I got it to work in that environment before we started testing on the same machine.
Worst Case Scenario
We're accepting that we can get the model to work on a single machine that everyone has access to. Does this mean that the model might still not work on other machines? Unfortunately, yes.
We're not taking the time to test other machines after wasting nearly 2 months on this problem. I hope this research and debugging helps someone else out. I didn't want to leave it at "never mind, fixed it."
This question might be too detailed for this forum, but I could not find a mailing list for duktape. Maybe this question will be useful for others trying to get duktape running on more obscure hardware.
I am trying to get duktape to work on an old ColdFire CPU, using an OLD gcc compiler (2.95.3). The board has limited resources (flash/RAM) but I seem to have enough of both. I must live with the old compiler.
I believe the duk_config.h is calculating the right options regarding endianness, etc. I am using a number of the duktape options to reduce code and data size. I have successfully used the same configuration on 64 and 32 bit Ubuntu and it works fine.
The "properties string" that is formed and set in duk_hthread_create_builtin_objects() is:
"bb u pnRHSBOL p2 a8 generic linux gcc" which seems correct (not sure of the effect of the "generic" tag for architecture).
I am getting a failure when calling duk_create_heap(). I have isolated the problem to a what I believe is a JS compile error related to duk_initjs. If I undef DUK_USE_BUILTIN_INITJS, initialization works. The error is a syntax error (not sure where yet). By running "strings" on my executable, I can see that the javascript program source string is there. As a side issue, when this error occurs, the longjmp doesn't work (setjmp never called?) so my fatal handler gets called, but I don't care about for now.
I thought it might be my small C stack (as it appears the js compiler uses recursion) but making the stack much larger didn't help.
I am starting to dig into the JS compiler, but this must be an issue with the architecture or my environment. Any suggestions appreciated!
EDIT: I just now noticed a post of a similar issue, and there was a request to repeat with "-DDUK_OPT_DEBUG -DDUK_OPT_DPRINT -DDUK_OPT_ASSERTIONS -DDUK_OPT_SELF_TESTS" I will try to use these options (if possible, I am very close to a relocation limit on my executable).
There was a bug in 1.4.0 release (https://github.com/svaarala/duktape/pull/550) which caused duk_config.h to incorrectly end up with an unpacked value representation even when the architecture supported packed representation. This might be an issue in your case - try adding and explicit -DDUK_OPT_PACKED_TVAL (which forces Duktape to use packed representation) to see if it helps.