Algorithmic details behind Deep Feature Synthesis and Featuretools? - featuretools

In order to use properly, it is important to understand the algorithmic/mathematical basis for Deep Feature Synthesis and featuretools. Are there papers, patents, comparison with other tools?

You can find the peer reviewed paper on Deep Feature Synthesis (the algorithm used in Featuretools) here: https://dai.lids.mit.edu/wp-content/uploads/2017/10/DSAA_DSM_2015.pdf.
The implementation has changed since publication, but the core ideas have not. Refer to the documentation or source on GitHub for the latest details.

Related

Question about natural language processing

I am working on a graduation project related to "Aspect extraction (AE)".
I'm pretty confused about POS taging, syntax tree, grammar rules, and other low-level NLP stuff. I need a reference that teaches me these things in detail, so if any of you know I hope you don't mind me?
I know my question is not about programming directly and this may not agree with the site, but I really need to.
This is not an easy one! I assume that you are trying to understand the underlying 'why' and 'what', so, if I were you I would start with the one and only "Speech and Language Processing" by Daniel Jurafsky and James H. Martin. They have a whole section (Section 17 in my Second edition) on the representation of meaning, and state representation, with a whole subsection on Aspect.
In addition to that, the book will also help you understand various existing approaches to POS-tagging and the other topics you mentioned above, and, the book is available online for free! There is even the draft of the 3rd edition out there.
Additionally, after reading the chapters above, you can check out how other people do aspect extraction here

Agile, Scrum and documentation [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
One of the Four Core Agile values says "Working Software over comprehensive documentation" and this is explained as a good thing. Furthermore it is explained that rather than written communication (e-mails included), face-to-face meetings are preferred and "more productive".
I would like for someone to explain to me why or how is this a good thing?
In a organization I used to work there were heaps of working software that I had to maintain. The documentation was minimal and it was a nightmare. It didn't help that the programs were not modularized and were very hard to understand and with the most esoteric twists and very disorganized. Comprehensive documentation as very important was one think I took from that experience. It doesn't matter if the software works now if it is not going to work in the near future right?
And on face-to-face meetings, I had the same doubt. I very much prefer e-mails (written) You can say the most outrageous of things when talking but when it is written then it is a deal. Plus if you are in a multinational organization with several languages, it helps a lot
I would like to hear the voice of people with Agile experience. How is the above a good thing? Thanks
Working software over comprehensive documentation
Comprehensive documentation is sometimes seen as a way to demonstrate progress. "If we have a detailed specification and a weighty design document then we are making good progress towards a product delivery"
What working software over comprehensive documentation means is that we view working software as a better demonstration of progress than documentation. This is because comprehensive documentation can give a false level of confidence.
So there is nothing that says avoid doing any documentation. It is just saying that we should only do the documentation that is needed and not just do documentation because it is part of a process.
In your example where there the software is difficult to work with then more documentation may well be needed. Just don't write documents that never get used and offer little value.
Individuals and interaction over process and tools
Face-to-face communication has many advantages over other forms of communication. For example:
People use body language to give context to conversations
People use audible and visual clues as to when to start and stop talking - this helps to make conversations flow
Regular face-to-face discussions often help teams to bond together
Notice though that the Agile manifesto does not mention face-to-face communication. All it says is individuals and interaction. If you and your team have ways of communicating that are as effective as face-to-face communication then that fits just as well within the Agile approach. The important part is that we value interaction and having members of the team work closely with each other.
When all agile recommendations are taken into account there are no issues you mentioned in your question. Working software should also has good code standards and design.
Regarding your particular issue with a lack of documentation unit tests (TDD/BDD) could be very useful. Good code coverage can explain how code should work even better than detailed documentation. Agile methodology also welcomes simplicity so your entire architecture might be over-complicated
Regarding face-to-face communication. Just imagine situation when you detected issue in your product (web-site markup). Instead of writing long email with steps to reproduce and attaching screenshots, you just go to front-end developer sitting in your room or make skype call and start explaining problem. Developer quickly realizes that he forgot to include some script. So your will get answer in minutes while your email can be answered next day.
I think it would be necessary to clarify your needs on using agile first before you want to apply agile.
Agile is the recommended working framework for a highly unpredictable domain(you may also check Cynefin model for identifying your working contexts). In this domain, you do require "working software" and "good communication" to review and revise your development in a short-term iterative process. As a result, you can change and improve your software based on the feedback from your software. This is proven to be the most effective and efficient way to build software in high competitive business world.
However, in your organization, you are maintaining legacy software with limited documentation. This context is totally different from what agile is designed for. You need optimization in your world, not testing or growth seeking. In short, process/tools and documentation are more important.
Regarding email communication, there is no doubt that email makes the deal, but you could never make a deal by just using email. It is the same as how you apply agile. You should apply both face-to-face and email based on different situation.
I would regard Agile as a framework more than a methodology. The concept there is to allow you build your own process based on your own working environment.
Documentation is an expression of a shared vocabulary, so it should be consistent from the epic all the way down to the comments in the code:
Documentation should be comprehensive and understandable. Using examples is recommended.
Language between feature stories, technical stories, pseudocode, and assertions should have naming conventions
A feature that people do not know about is a useless feature.
Lack of documentation can be a symptom of the lack of a marketing plan
A feature that isn't documented is a useless feature. A patch for a new feature must include the documentation.
Lack of documentation can be a symptom of the lack of usability, accessibility, and information architecture
Adjust the documentation. Doing this first gives you an impression of how
your changes affect the user.
Lack of documentation can be a symptom of a lack of focus on the user and the maintainer:
Software is not useful by itself. The executable software is only part of the picture. It is of no use without user manuals, business processes, design documentation, well-commented source code, and test cases. These should be produced as an intrinsic part of the development, not added at the end. In particular, recognize that design documentation serves two distinct purposes:
To allow the developers to get from a set of requirements to an implementation. Much of this type of documentation outlives its usefulness after implementation.
To allow the maintainers to understand how the implementation satisfies the requirements. A document aimed at maintainers is much shorter, cheaper to produce and more useful than a traditional design document.
And understanding the purpose of any project requires building a relationship between the project timeline and the source code history:
Write the change log entries for your changes. This is both to save us the extra work of writing them, and to help explain your changes so we can understand them.
The purpose of the change log is to show people where to find what was changed. So you need to be specific about what functions you changed; in large functions, it’s often helpful to indicate where within the function the change was.
On the other hand, once you have shown people where to find the change, you need not explain its purpose in the change log. Thus, if you add a new function, all you need to say about it is that it is new. If you feel that the purpose needs explaining, it probably does — but put the explanation in comments in the code. It will be more useful there.
References
Vim documentation: develop
SCRUM-PSP: Embracing Process Agility and Discipline (pdf)
Secure Software Development Life Cycle Processes | US-CERT
An Uneasy Marriage? Merging Scrum and TSP (pdf)
TSP/PSP and Agile-SCRUM: Similarities & Differences
GNU Emacs Manual: Sending Patches

Meta construction capabilities?

I am currently considering Orange as the base for a meta-learning assistant prototype I intend to develop, but before committing myself to a thorough exploration of the documentation and learning about python development (which would both be quite time consuming), I would appreciate some insight regarding the feasibility of such prototype within Orange framework.
The main aim of the prototype I intend to develop is to allow efficient use of data mining and machine learning algorithm by non experts. Concretely, I wish as a first step to be able to give the user a workflow answering his modelling need, that I elicit from his dataset and expression of his need. In order to perform this elicitation, I intend to run a process that implies designing and executing learning workflows on his data.
Is it possible from within the Orange framework (or else from an above "supervising" framework) to automatically define and execute learning workflows ?
Yes, it is.
We have actually experimented with a "recommendation system" that would suggest parts of the workflow to the user. It wasn't useful. Also, there have been various meta-learning projects in the past and I think that the general consensus is --- it doesn't work. ;)
But if you intend to try it, Orange is suitable platform for this.
#hoijui: Orange no longer has any other mailing list or forum, just this one. Developers follow Stack overflow and answer questions there.

Is there a free or opensource SERM Modeling tool for linux?

I looking for a good SERM Modeling tool for linux. Is there any? Which is best?
Open ModelSphere may be the lone mature tool that fits the requirements...
open source (under GPL)
run on Linux: cross platform, actually, since it is Java-based.
standard fare for typical modeling tools, including forward and backward engineering and validation.
No explicit support for SERM, but ability to introduce new notations. Several of the notations readily included appear to be relatively close to [what I understand] SERM is.
This last point might be the show-stopper... hopefully this suggestion can be be a starting point.
Disclosure:
I'm no modeling wizard, merely an occasional user of modeling tools typically included in software development IDEs. Also, I'm not versed in SERM in particular, and unsure of its subtle (or so subtle) differences with other modeling metamodels.
I would typically remain an interested spectator of this type of questions, but in view of the little attention it has received so far (and in view of the +50 bounty, right!) I'm kindly posting the above with the intent of maybe stirring things a bit. I'll be glad to delete this answer, amend it as suggested and otherwise try and help with generating traffic in this direction.
If nothing else, this answer may prompt Anton Bessonov to elaborate a bit on specific uses and capabilities would be relevant to his quest.

Generating questions from text (NLP)

What approaches are there to generating question from a sentence? Let's say I have a sentence "Jim's dog was very hairy and smelled like wet newspaper" - which toolkit is capable of generating a question like "What did Jim's dog smelled like?" or "How hairy was Jim's dog?"
Thanks!
Unfortunately there isn't one, exactly. There is some code written as part of Michael Heilman's PhD dissertation at CMU; perhaps you'll find it and its corresponding papers interesting?
If it helps, the topic you want information on is called "question generation". This is pretty much the opposite of what Watson does, even though "here is an answer, generate the corresponding question" is exactly how Jeopardy is played. But actually, Watson is a "question answering" system.
In addition to the link to Michael Heilman's PhD provided by dmn, I recommend checking out the following papers:
Automatic Question Generation and Answer Judging: A Q&A Game for Language Learning (Yushi Xu, Anna Goldie, Stephanie Seneff)
Automatic Question Generationg from Sentences (Husam Ali, Yllias Chali, Sadid A. Hasan)
As of 2022, Haystack provides a comprehensive suite of tools to accomplish the purpose of Question generation and answering using the latest and greatest Transformer models and Transfer learning.
From their website,
Haystack is an open-source framework for building search systems that work intelligently over large document collections. Recent advances in NLP have enabled the application of question answering, retrieval and summarization to real world settings and Haystack is designed to be the bridge between research and industry.
NLP for Search: Pick components that perform retrieval, question answering, reranking and much more
Latest models: Utilize all transformer based models (BERT, RoBERTa, MiniLM, DPR) and smoothly switch when new ones get published
Flexible databases: Load data into and query from a range of databases such as Elasticsearch, Milvus, FAISS, SQL and more
Scalability: Scale your system to handle millions of documents and deploy them via REST API
Domain adaptation: All tooling you need to annotate examples, collect user-feedback, evaluate components and finetune models.
Based on my personal experience, I am 95% successful in generating Questions and Answers in my Internship for training purposes. I have a sample web user interface to demonstrate and the code too. My Web App and Code.
Huge shoutout to the developers on the Slack channel for helping noobs in AI like me! Implementing and deploying a NLP model has never been easier if not for Haystack. I believe this is the only tool out there where one can easily develop and deploy.
Disclaimer: I do not work for deepset.ai or Haystack, am just a fan of haystack.
As of 2019, Question generation from text has become possible. There are several research papers for this task.
The current state-of-the-art question generation model uses language modeling with different pretraining objectives. Research paper, code implementation and pre-trained model are available to download on the Paperwithcode website link.
This model can be used to fine-tune on your own dataset (instructions for finetuning are given here).
I would suggest checking out this link for more solutions. I hope it helps.

Resources