I have found a typo in the official PyTorch Documentation. Where can I raise the flag so that it is rectified?
From the PyTorch Contribution Guide, in the section on Documentation:
Improving Documentation & Tutorials
We aim to produce high quality documentation and tutorials. On rare
occasions that content includes typos or bugs. If you find something
you can fix, send us a pull request for consideration.
In order to use properly, it is important to understand the algorithmic/mathematical basis for Deep Feature Synthesis and featuretools. Are there papers, patents, comparison with other tools?
You can find the peer reviewed paper on Deep Feature Synthesis (the algorithm used in Featuretools) here: https://dai.lids.mit.edu/wp-content/uploads/2017/10/DSAA_DSM_2015.pdf.
The implementation has changed since publication, but the core ideas have not. Refer to the documentation or source on GitHub for the latest details.
I'm using Spacy in a project and noticed that my Docker images are pretty big. A bit of research led me to find out that just the Spacy installation itself (in /usr/local/lib/python3.6/site-packages/spacy) accounts for 267MB, so I was wondering if there's anything that can be done to reduce that footprint?
Out of interest, SpaCy's 2.2 was released yesterday (Oct 2nd, 2019).
One of the product features for this 2.2 is "Smaller disk foot-print, better language resource handling". So, upgrading to SpaCy 2.2. may be one way to reduce the size of a SpaCy installation.
(Although this post doesn't solve your specific problem, I believe it does answer this specific question.)
I am playing around with the Stanford coreNLP parser and I am having a small issue that I assume is just something stupid I'm missing due to my lack of experience. I am currently using the node.js stanford-corenlp wrapper module with the latest full Java version of Stanford CoreNLP.
My current results are returning somehting similar to the "Collapsed Dependencies with CC processed" data here: http://nlp.stanford.edu/software/example.xml
I am trying to figure out how I can get the dependencies titled "Universal dependencies, enhanced" as show here: http://nlp.stanford.edu:8080/parser/index.jsp
If anyone can shed some light on even just what direction I need to research more about, it would be extremely helpful. Currently Google has not been helping much with the specific "Enhanced" results and I am just trying to find out what I need to pass,call or include in my annotators to get the results shown at the link above. Thanks for your time!
Extra (enhanced) dependencies can be enabled in the depparse annotator by using its 'depparse.extradependencies' option.
According to http://nlp.stanford.edu/software/corenlp.shtml it is set to NONE by default, and can be set to SUBJ_ONLY or MAXIMAL.
What approaches are there to generating question from a sentence? Let's say I have a sentence "Jim's dog was very hairy and smelled like wet newspaper" - which toolkit is capable of generating a question like "What did Jim's dog smelled like?" or "How hairy was Jim's dog?"
Thanks!
Unfortunately there isn't one, exactly. There is some code written as part of Michael Heilman's PhD dissertation at CMU; perhaps you'll find it and its corresponding papers interesting?
If it helps, the topic you want information on is called "question generation". This is pretty much the opposite of what Watson does, even though "here is an answer, generate the corresponding question" is exactly how Jeopardy is played. But actually, Watson is a "question answering" system.
In addition to the link to Michael Heilman's PhD provided by dmn, I recommend checking out the following papers:
Automatic Question Generation and Answer Judging: A Q&A Game for Language Learning (Yushi Xu, Anna Goldie, Stephanie Seneff)
Automatic Question Generationg from Sentences (Husam Ali, Yllias Chali, Sadid A. Hasan)
As of 2022, Haystack provides a comprehensive suite of tools to accomplish the purpose of Question generation and answering using the latest and greatest Transformer models and Transfer learning.
From their website,
Haystack is an open-source framework for building search systems that work intelligently over large document collections. Recent advances in NLP have enabled the application of question answering, retrieval and summarization to real world settings and Haystack is designed to be the bridge between research and industry.
NLP for Search: Pick components that perform retrieval, question answering, reranking and much more
Latest models: Utilize all transformer based models (BERT, RoBERTa, MiniLM, DPR) and smoothly switch when new ones get published
Flexible databases: Load data into and query from a range of databases such as Elasticsearch, Milvus, FAISS, SQL and more
Scalability: Scale your system to handle millions of documents and deploy them via REST API
Domain adaptation: All tooling you need to annotate examples, collect user-feedback, evaluate components and finetune models.
Based on my personal experience, I am 95% successful in generating Questions and Answers in my Internship for training purposes. I have a sample web user interface to demonstrate and the code too. My Web App and Code.
Huge shoutout to the developers on the Slack channel for helping noobs in AI like me! Implementing and deploying a NLP model has never been easier if not for Haystack. I believe this is the only tool out there where one can easily develop and deploy.
Disclaimer: I do not work for deepset.ai or Haystack, am just a fan of haystack.
As of 2019, Question generation from text has become possible. There are several research papers for this task.
The current state-of-the-art question generation model uses language modeling with different pretraining objectives. Research paper, code implementation and pre-trained model are available to download on the Paperwithcode website link.
This model can be used to fine-tune on your own dataset (instructions for finetuning are given here).
I would suggest checking out this link for more solutions. I hope it helps.