I have stanza nlp pipeline created using
nlp = stanza.Pipeline(lang='en', processors='tokenize', logging_level='ERROR')
I want to delete this instance at runtime and free up the GPU memory. Can someone please help me how to achieve this?
I tried setting nlp=None and then doing gc.collect(). I also tried del nlp and then torch.cuda.empty_cache(). But none of these approaches seems to be working.
Related
So I'm trying to use a pretrained Doc2vec for my semantic search project. I tried with this one https://github.com/jhlau/doc2vec (English Wikipedia DBOW) and with the forked version of Gensim (0.12.4) and python 2.7
It works fine when I use most_similar but when i try to use infer_vector I get this error:
AttributeError: 'Doc2Vec' object has no attribute 'neg_labels'
what can i do to make this work?
For reasons given in this other answer, I'd recommend against using a many-years-old custom fork of Gensim, and also find those particular pre-trained models a little fishy in their sizes to actually contain all the purported per-article vectors.
But also: that error resembles a very-old bug which only showed up if Gensim was not fully installed to have the necessary Cython-optimized routines for fast training/inference operations. (That caused some older, seldom-run code to be run that had a dependency on the missing neg_labels. Newer versions of Gensim have eliminated that slow code-path entirely.)
My comment on an old Gensim issue has more details, and a workaround that might help - but really, the much better thing to do for quality results & speedy code is to use a current Gensim, & train your own model.
it's the first time for me to leave a question here.
I'm currently using PyTorch on my research and trying to organize results with MLFlow.
I know that many problems when using MLflow on Windows10 but since there are no options for this... I'm trying to get used to it.
Errors that I encounter here are "Metrics 'desktop.ini' is malformed ...'
This error is nagging me when -
Using mlflow ui to see experiment results from the past mlflow ui error
When trying to use mlflow.pytorch.log_model(model, ...) pytorch.log_model error
These two are the main concerns for me. My question here is
Is there other result organizing tools that I can use except tensorboard?
Is there any method that can save model.pth from pytorch to mlflow? + if it's impossible, are there any other formats that we use to save the configuration (such as YAML, other hierarchical languages like XML..)
Thank you
Update:
after some extra search. I thin I am overuse scikit-learn. if I want a production ML tools. I should use something like mahout which built on hadoop. scikit-learn is more like a toy tools for experiment ideas.
I am new to scikit-learn. I try to use scikit-learn to train a model, I want to experiment different feature combinationes and data pre-processing techniques. Each experiment will takes few hours(in order to minimize error, I will run every experiment 10 times with different train-test split), So I wrote some python script to run experiment one by one automatically, when an experiment is done, it will send me an email.
It works well, I found another server that is available to run my experiment today, it seems reasonable I should write some script that can run experiments in a distribution-fashion. There are big data platforms like hadoop, but I find that it is not for python and scikit-learn(please point out to me If my understanding of hadoop is wrong).
Because scikit-learn is an "old" library, so I think there should have existing libraries that have these capabilities that I want. or I am running in wrong direction of scikit-learn?
I try to google "scikit-learn task managment", But nothing I want turn out. other key word to search is also very welcome.
See "Experimentation frameworks" at http://scikit-learn.org/dev/related_projects.html
For the Spacy package, model files for deps, ner, and pos throw an invalid load key or EOF error when I try to load them using pickle.
I have executed the code on windows and linux systems. I don't think it is a binary mode transfer issue. I have checked it in detail. I am not able to figure out the issue. Most likely the file is corrupt but I am not sure. Is there a way it can be fixed using the hex editor?
Any help is highly appreciated. It will be great if someone can explain pickling in a bit detail.
Appreciate your help.
The English() object in Spacy is not pickable. See issue #125
I am playing around with the Stanford coreNLP parser and I am having a small issue that I assume is just something stupid I'm missing due to my lack of experience. I am currently using the node.js stanford-corenlp wrapper module with the latest full Java version of Stanford CoreNLP.
My current results are returning somehting similar to the "Collapsed Dependencies with CC processed" data here: http://nlp.stanford.edu/software/example.xml
I am trying to figure out how I can get the dependencies titled "Universal dependencies, enhanced" as show here: http://nlp.stanford.edu:8080/parser/index.jsp
If anyone can shed some light on even just what direction I need to research more about, it would be extremely helpful. Currently Google has not been helping much with the specific "Enhanced" results and I am just trying to find out what I need to pass,call or include in my annotators to get the results shown at the link above. Thanks for your time!
Extra (enhanced) dependencies can be enabled in the depparse annotator by using its 'depparse.extradependencies' option.
According to http://nlp.stanford.edu/software/corenlp.shtml it is set to NONE by default, and can be set to SUBJ_ONLY or MAXIMAL.