mlflow on Windows10 and desktop.ini - pytorch

it's the first time for me to leave a question here.
I'm currently using PyTorch on my research and trying to organize results with MLFlow.
I know that many problems when using MLflow on Windows10 but since there are no options for this... I'm trying to get used to it.
Errors that I encounter here are "Metrics 'desktop.ini' is malformed ...'
This error is nagging me when -
Using mlflow ui to see experiment results from the past mlflow ui error
When trying to use mlflow.pytorch.log_model(model, ...) pytorch.log_model error
These two are the main concerns for me. My question here is
Is there other result organizing tools that I can use except tensorboard?
Is there any method that can save model.pth from pytorch to mlflow? + if it's impossible, are there any other formats that we use to save the configuration (such as YAML, other hierarchical languages like XML..)
Thank you

Related

How can I include fast.ai functionality when primarily using PyTorch?

I am using PyTorch to carry out vision tasks, but would like to use some of what fast.ai provides since it has a lot of useful functionality. I'd prefer to work mostly in PyTorch since it's easier for me to understand what's going on, it's easier for me to find information on it online, and I want to maintain flexibility.
In https://docs.fast.ai/migrating_pytorch it's written that after I use the following imports: from fastai.vision.all import * and from migrating_pytorch import *, I should be able to start "Incrementally adding fastai goodness to your PyTorch models", which sounds great.
But when I run the second import I get ModuleNotFoundError: No module named 'migrating_pytorch'. Searching in https://github.com/fastai/fastai I also don't find any code mention of migrating_pytorch.py, nor did I manage to find something online.
(I'm using fast.ai version 2.3.1)
I'd like to know if this is indeed the way to go, and if so how to get it working. Or if there's a better way then how I should use that approach instead.
As an example, it would be nice if I could use the EarlyStoppingCallback, SaveModelCallback, and add some metrics from fast.ai instead of writing them myself, while still having everything in mostly "native" PyTorch.
Preferably the solution isn't specific to vision only, but that's my current need.
migrating_pytorch is an example script. It's in the fast.ai repo at: https://github.com/fastai/fastai/blob/master/nbs/examples/migrating_pytorch.py
The notebook that shows how to use it is at: https://github.com/fastai/fastai/blob/827e7cc0fad2db06c40df393c9569309377efac0/nbs/examples/migrating_pytorch.ipynb
For the callback example. Your training code would end up looking something like:
cbs = [EarlyStoppingCallback(), SaveModelCallback()]
learner = Learner(dls, simple_cnn(), loss_func=F.cross_entropy, cbs=cbs)
learner.fit(1)
Those two callbacks probably need some arguments, e.g. save path, etc.

Using Concert Technology application or Callable Library application with CVXPY

I am looking at a two step approach for a optimization problem. My first step is using a MILP formulation of the problem and the second step involves using the solution from the first step as an initial solution but now with a MIQP formulation. I have been able to apply this concept in MATLAB using CPLEX. However, I am now trying the same using CVXPY with CPLEX as the solver. Now I know about the warm_start option but this does not work with the CPLEX solver. I am able to set CPLEX parameters but I am not sure how to initialize my solution. I am thinking of setting the ADVANCE START SWITCH parameter for CPLEX to 1, but now I need to set the initial solution. According to this page: http://www-eio.upc.es/lceio/manuals/cplex-11/html/usrcplex/solveMIP17.html, I need to use the method setVectors in a Concert Technology application, or by using CPXcopymipstart in a Callable Library application to set the initial solution. I am unsure of how to use this along with CVXPY.
The functionality you are looking for does not currently exist in CVXPY. CVXPY is a generic modeling layer that wraps around several solvers and it does not expose the CPLEX-specific CPXreadcopymipstarts nor CPXaddmipstarts functionality.
The fact that setting the value property of variables and using the warm_start option, as suggested in this answer, doesn't work, is a CVXPY issue. It looks like there is an open github issue for this here. In the future, this will likely be the intended solution to your general question.
For now, you'll have to use one of the CPLEX APIs directly. As you mentioned in the comments of this related stackoverflow question, you do not like the idea of using the lower-level CPLEX Python API. That leaves you with docplex as a viable option.

What is the easiest way to operationalize Python code?

I am new to writing Python code. I have currently written a few modules for data analysis projects. The data is queried from AWS Redshift tables and summarized in CSVs and Excel spreadsheets.
At this point I do not want to pass it on other users in the org as I do not want to expose the code.
Is there an easy way to operationalize the code without exposing it?
PS: I am in the process of learning front-end development (Flask, HTML, CSS) so users can input data and get results back.
Python programs are almost always shipped as bare source. There are ways of compiling Python code into binaries, but this is not a common thing to do and usually I would not recommend it, as it's not as easy as one might expect (which is too bad, really).
That said, you can check out cx_Freeze and Cython.

Keras `evaluate` Function Returns Wrong Accuracy on Different Machines

Background
I use an Anaconda environment in Windows 10, made following this post by Mike Müller:
conda create -n keras python=3.6
conda activate keras
conda install keras
This environment has Python 3.6.8, Keras 2.2.4, TensorFlow 1.12.0, and NumPy 1.16.1.
I was working on optimizing code for a team I just joined when I found I can't even run their code. I reduced it to a test case with an MCVE (at least, for me; apologies for not being able to give a testable example):
class TestEvaluation(unittest.TestCase):
def setUp(self):
# In-house function loads inputs and labels properly.
self.inputs, self.labels = load_data()
# Using a pretrained model, known to work.
self.model = keras.models.load_model('model_name.h5')
# Passes, and is loaded successfully.
self.assertIsNotNone(self.model)
def test_model_evaluation(self):
# Fails on my machine, reporting high loss and 0% accuracy.
scores = self.model.evaluate(self.inputs, self.labels)
accuracy = scores[1] * 100
self.assertAlmostEqual(accuracy, 93, delta=5)
Research
This exact scenario runs perfectly fine from someone else's computer, so we've deduced the following: we have the same code, model, and data. Therefore, it should be the environment, right?
I built more Anaconda environments to reproduce the version numbers that work on their machine. However, this didn't fix it. Moreover, this seems to be an issue that not many other people have had, as far I've found by searching online.
I went through the following other environments:
Python 3.6.4, Keras 2.2.4, TensorFlow 1.12.0, NumPy 1.16.2
(The one that worked for someone else, though admittedly without Anaconda)
Python 3.5.2, Keras 2.2.2, TensorFlow 1.10.0, NumPy 1.15.2
Question
The model is pretrained, the validation set is correctly loaded, but Keras fails to report the ~93% accuracy I'm expecting.
How can I fix this issue of getting 0% accuracy?
Update
I've learned a lot more about the situation. I found that installing a Python 3.6 environment on Ubuntu 18.04 got me to random guessing (~25% accuracy). So, it's no longer 0%! Further, I tried to replicate a machine that's been used for testing a lot, which had Ubuntu 16.04.5. This got me to ~46% accuracy. I wasn't able to perfectly replicate it since Ubuntu forced me to update to 16.04.6 when I installed some packages, and I also don't know how they run things on the machine they test with (I tried myself, and it didn't work).
I also learned that the guy who compiled and saved the model was using MacOS High Sierra, but he also gets it to work in the lab environment. I'll need to follow up on that.
Further, I kept searching online and found others with the same issue:
Keras issue #7676 - An open issue for nearly 2 years. The OP reported his saved model works differently on different machines, which sounds a lot like my problem.
Keras issue #4875 - An open issue for over 2 years. This particular comment seems to be the common solution. I'm not sure if this will solve the problem or not, and I don't actually have the code that compiled this model. However, it seems that many people found issues in how their model was built and saved, so I might need to investigate this further...
I apologize for claiming a solution before, I was ecstactic to see that assertNotEqual(accuracy, 0) passed.
Be Aware
I previously wrote an incorrect answer, and this may very well be another poorly-formed solution. Please be aware I haven't fully tested this hypothesis. Also be aware that this is still an open issue in the Keras community and many people have messed things up in a number of ways to arrive at this problem.
Developing Our Solution
Let Person A be the guy who can run the model okay on our lab computers, as well as his MacBook. Let Person B be the one who can't (i.e. me and everyone else).
I got my team to take this problem more seriously. We got to the point where A has a terminal open at a desktop next to B. A runs the test script and gets 92% accuracy. B runs the script and gets 2%. At this point, we were on the same machine using the exact same Python environment and Keras settings (~/.keras). We were also sure that we had the same script, model, and data. Or, so we thought.
I chose to doubt everything at that point. I scp'd the script, model, and data from A's account to B's account. It worked. Here's what that could mean as a solution:
A Guess at the Problem
The files B had were bad. B got them from team storage on Google Drive, as well as Slack. Further, some were delivered by A through his MacBook. The scripts were genuinely the same. The model and data B had actually differed in binary, but had the same size in bytes, looked "similar" in binary, and could've possibly been an encoding issue.
It wasn't Google Drive. I uploaded and re-downloaded the correct files, and nothing went wrong. However, the wrong file was there to begin with.
Possibly Slack? Perhaps Slack was corrupting the encoding when B downloaded A's files.
Possibly it coming from a MacBook? MacOS generates a lot of .DS_Store-like files, and I don't know much about it. MacOS might've played a role in the model and data being OS-dependent. I wouldn't rule it out simply because I'm ignorant of how that OS operates. I heavily suspect this though because I happen to have a spare MacBook, and I got it to work in that environment before we started testing on the same machine.
Worst Case Scenario
We're accepting that we can get the model to work on a single machine that everyone has access to. Does this mean that the model might still not work on other machines? Unfortunately, yes.
We're not taking the time to test other machines after wasting nearly 2 months on this problem. I hope this research and debugging helps someone else out. I didn't want to leave it at "never mind, fixed it."

scikit learn task managment library

Update:
after some extra search. I thin I am overuse scikit-learn. if I want a production ML tools. I should use something like mahout which built on hadoop. scikit-learn is more like a toy tools for experiment ideas.
I am new to scikit-learn. I try to use scikit-learn to train a model, I want to experiment different feature combinationes and data pre-processing techniques. Each experiment will takes few hours(in order to minimize error, I will run every experiment 10 times with different train-test split), So I wrote some python script to run experiment one by one automatically, when an experiment is done, it will send me an email.
It works well, I found another server that is available to run my experiment today, it seems reasonable I should write some script that can run experiments in a distribution-fashion. There are big data platforms like hadoop, but I find that it is not for python and scikit-learn(please point out to me If my understanding of hadoop is wrong).
Because scikit-learn is an "old" library, so I think there should have existing libraries that have these capabilities that I want. or I am running in wrong direction of scikit-learn?
I try to google "scikit-learn task managment", But nothing I want turn out. other key word to search is also very welcome.
See "Experimentation frameworks" at http://scikit-learn.org/dev/related_projects.html

Resources