scikit learn task managment library - scikit-learn

Update:
after some extra search. I thin I am overuse scikit-learn. if I want a production ML tools. I should use something like mahout which built on hadoop. scikit-learn is more like a toy tools for experiment ideas.
I am new to scikit-learn. I try to use scikit-learn to train a model, I want to experiment different feature combinationes and data pre-processing techniques. Each experiment will takes few hours(in order to minimize error, I will run every experiment 10 times with different train-test split), So I wrote some python script to run experiment one by one automatically, when an experiment is done, it will send me an email.
It works well, I found another server that is available to run my experiment today, it seems reasonable I should write some script that can run experiments in a distribution-fashion. There are big data platforms like hadoop, but I find that it is not for python and scikit-learn(please point out to me If my understanding of hadoop is wrong).
Because scikit-learn is an "old" library, so I think there should have existing libraries that have these capabilities that I want. or I am running in wrong direction of scikit-learn?
I try to google "scikit-learn task managment", But nothing I want turn out. other key word to search is also very welcome.

See "Experimentation frameworks" at http://scikit-learn.org/dev/related_projects.html

Related

Running stable-diffusion on graphcore IPU's

I have been looking for a version of Stable-Diffusion which would be able to run on the IPU's. Currently (due to the high availability) so far I can find CUDA based ones only.
Now I wonder if there is a way to run scripts/trainers/learning etc that are Cuda based on IPU? For example a translation program in between.
I doubt there is, and I bet as I cannot find a IPU version I'll have to modify the scripts :(.
There is the HuggingFace optimum library which acts as the interoperability layer for transformers to run on IPUs. You can find Stable Diffusion there.
For other models that are not supported in the library, there's a guide on how you could modify your script to make it IPU-compatible here

How can I include fast.ai functionality when primarily using PyTorch?

I am using PyTorch to carry out vision tasks, but would like to use some of what fast.ai provides since it has a lot of useful functionality. I'd prefer to work mostly in PyTorch since it's easier for me to understand what's going on, it's easier for me to find information on it online, and I want to maintain flexibility.
In https://docs.fast.ai/migrating_pytorch it's written that after I use the following imports: from fastai.vision.all import * and from migrating_pytorch import *, I should be able to start "Incrementally adding fastai goodness to your PyTorch models", which sounds great.
But when I run the second import I get ModuleNotFoundError: No module named 'migrating_pytorch'. Searching in https://github.com/fastai/fastai I also don't find any code mention of migrating_pytorch.py, nor did I manage to find something online.
(I'm using fast.ai version 2.3.1)
I'd like to know if this is indeed the way to go, and if so how to get it working. Or if there's a better way then how I should use that approach instead.
As an example, it would be nice if I could use the EarlyStoppingCallback, SaveModelCallback, and add some metrics from fast.ai instead of writing them myself, while still having everything in mostly "native" PyTorch.
Preferably the solution isn't specific to vision only, but that's my current need.
migrating_pytorch is an example script. It's in the fast.ai repo at: https://github.com/fastai/fastai/blob/master/nbs/examples/migrating_pytorch.py
The notebook that shows how to use it is at: https://github.com/fastai/fastai/blob/827e7cc0fad2db06c40df393c9569309377efac0/nbs/examples/migrating_pytorch.ipynb
For the callback example. Your training code would end up looking something like:
cbs = [EarlyStoppingCallback(), SaveModelCallback()]
learner = Learner(dls, simple_cnn(), loss_func=F.cross_entropy, cbs=cbs)
learner.fit(1)
Those two callbacks probably need some arguments, e.g. save path, etc.

Mix pytorch lightning with vanilla pytorch

I am doing a meta learning research and am using the MAML optimization provided by learn2learn. However as one of the baseline, I would like to test a non-meta-learning approach, i.e. the traditional training + testing.
Due to the lightning's internal usage of optimizer it seems that it is difficult to make the MAML work with learn2learn in lightning, so I couldn't use lightning in my meta-learning setup, however for my baseline, I really like to use lightning in that it provides many handy functionalities like deepspeed or ddp out of the box.
Here is my question, other than setting up two separate folders/repos, how could I mix the vanilia pytorch (learn2learn) with pytorch lightning (baseline)? What is the best practice?
Thanks!
Decided to answer my question. So I ended up using the torch lightning's manual optimization so that I can customize the optimization step. This would make both approaches using the same framework, and I think is better than maintaining 2 separate repos.

mlflow on Windows10 and desktop.ini

it's the first time for me to leave a question here.
I'm currently using PyTorch on my research and trying to organize results with MLFlow.
I know that many problems when using MLflow on Windows10 but since there are no options for this... I'm trying to get used to it.
Errors that I encounter here are "Metrics 'desktop.ini' is malformed ...'
This error is nagging me when -
Using mlflow ui to see experiment results from the past mlflow ui error
When trying to use mlflow.pytorch.log_model(model, ...) pytorch.log_model error
These two are the main concerns for me. My question here is
Is there other result organizing tools that I can use except tensorboard?
Is there any method that can save model.pth from pytorch to mlflow? + if it's impossible, are there any other formats that we use to save the configuration (such as YAML, other hierarchical languages like XML..)
Thank you

Library to use for event extraction from text?

My project is to extract events automatically from a given text. The events can be written specifically, or just mentioned in a sentence, or not be there at all.
What is the library or technique that I should use for this? I tried the Stanford NER demo, but it gave bad results. I have enough time to explore and learn a library, so complexity is not a problem. Accuracy is a priority.
The task itself is a complex one, it is still not solve now(2018). But recently a very useful python library for nlp is emerging. It is Spacy, this lib has a relative higher performance than its competitors. With the library you can do things like tokenize,POS tagging,NER and sentence similarity。But you still need to utilize these features and extract events based on your specific rule.
Deeplearning may also be a way to try, but even though it works fine on other nlp tasks, it is still immature on event extraction.

Resources