How to develop a Decision Support System - decision-tree

I would like to develop a decision support system for diagnosis disease. I am newbie for the programming. Can anyone suggest which programming language is most suitable?

That really depends on what you want to do exactly (do you want to use prebuilt libraries?).
I see you added weka as a tag. Java is a good option, since it's so versatile, powerful, fast and fairly easy to use, implementable as a web service, plus you can use the weka library for quickly building trees.
But really, any other programming language (C++, python, matlab) will have the power to build a tree.

You have to design a decision mathematical model. In order to do that, you need a back ground in statistic, data mining or even fuzzy logic.
And then, you code the decision making algorithm to get the final results. I suggest you should have a validation phase to ensure your program is useful and correct.

Related

What is the use of UML in day today life?

I don't have a programming job. But I do code (using functional programming and OOP) from time to time, in order to make many repetitive tasks, that I perform in the software I use, simple: extract data from a simulation and dump it into an excel file, read data from an excel file and use that data to manipulate my simulation, etc.
I can manage this pretty well without using UML. But what I want to know is: for someone who doesn't code for a living, yet uses code to make life simpler and save time, how important is using UML (I do understand that I cannot use it for functional programming)? What are the practical benefits? How can I write better code by using UML? I know it's not a substitute for a programming language .. as in I cannot use it as a programming language. But what I was also hoping to understand was how can I use it before I actually begin to write OOP code?
Unlike other answers suggest I believe that UML is not only beneficial when you work with a team of developers, also when you work on a project on your own you can benefit by using UML.
In my opinion the true benefit of UML is that you are forced to think before you act. Of course you can always directly start programming when starting a new project, but (most certainly for larger projects) it is better to think about your design.
By creating complete UML models you will notice that you need to think about your software (what does the user needs to be able to do with my software? How will the software react? etc. etc.). Due to this whole process I believe that by the time you will start coding, you will already have such good understanding on the structure of your program that you will be able to code your project better and faster.
Concluding I think UML is all about doing it right the first time.
Next to this, you will always have proper documentation on your code. This makes it more easy to maintain.
Not very sure what you mean by day to day life but UML helps in:
The communication of the desired structure and behaviour of a system between analysts, architects, developers, stakeholders and
users.
The visualisation and control of system architecture.
Promote a deeper understanding of the system, exposing opportunities for simplification and re-use.
Manage risk.
When working in a team, you first create a UML. That way you know which classes have to be made and you can divide the working, knowing everyone will use the names on the UML with the right connections between classes. (Inheritance, composition,..)
It's an abstract version of a program and it's very important. Ofcourse, if you're on your own you don't 'need' to use it, but it might make life easier. If the task takes long, create the UML beforehand so you don't loose track of what you're doing plus it helps you see design patterns. :)
In addition to the other mentioned use cases of UML models, the most efficient use case for a UML model is when it is used to generate other models or text (code).
For instance you can generate java classes from a UML class diagram.
Search for MDA (Model Driven Architecture) or MDSD (Model Driven Software Design).
If you are looking for a tool to support generating code/text from UML models then take a look at the Acceleo Project.

Pros/cons of different language workbench tools such as Xtext and MPS?

Does anyone have experience working with language workbench tools such as Xtext, Spoofax, and JetBrains' MPS? I'm looking to try one out and am having a hard time finding a good comparison of the different tools. What are the pros and cons of each?
I'm looking to build DSLs that generate python code, so I'm especially interested to hear from people who've used one of these tools with python (all three seem pretty Java-focused... why is that?). The DLSs are primarily for my own use, so I care less about building a really pretty IDE than I do about it being KISS to define the syntax and write the code generator. The ability to type-check / do static analysis of the DLSs would be pretty cool too.
I'm a little afraid of getting far down a path, hitting a wall, and realizing that all my code is in a format that can't be ported to anything else -- is that a risk with these tools? MPS in particular seems a little scary since as I understand it you don't really generate text-based syntaxes but rather build specialized editors for ASTs.
Markus Voelter does a pretty good job comparing those three in se-radio and Software ArchitekTOUR podcasts.
The basic idea is, that Xtext is most used, therefore most stable and documented, and it is based on popular Eclipse platform and modeling ecosystem - EMF which surrounds it. On the other hand it is parser based and uses ANTLR internally, which means the kind of grammars you can define is limited and languages cannot be combined easily.
Spoofax is an academic product with least adoption of those three. It is also parser based, but uses its own parser generator internally which allows language combinations.
Jetbrains MPS is projection based, which gives much freedom to language designer and allows combinations of languages. *t also has solid support. Drawback might be the learning curve.
None of these tools is strictly Java focused as target language for code generators. Xtext uses Xpand templates, which are plain text. I don't really know how code generation in Spoofax works. MPS has its base language, which is said to be subset of Java, but there are different alternatives.
I personally use Xtext because of its simplicity and maturity, but those strong limitations given by its design make it not a very future proof choice.
I have chosen XText in the same case two weeks ago, but I don't know anything about Spoofax.
My first impression - Xtext is very simple and productive.
I have made my first realife(but very simple) project in 30 minutes, I have generated a graphviz dot graph and html report.
I don't like MPS because I prefer plain text source and destination files.
There are other systems for doing this kind of thing. If your goal is building tools, you don't necessarily have to look to an IDE with an integrated tool; sometimes you can find better tools that have focused on utility rather than IDE integration
Consider any of the pure program transformation tools:
TXL (practical, single paradigm)
Stratego (Spoofax before it was transplanted into Eclipse)
Rascal (research, very nicely designed in many ways)
DMS Software Reengineering Toolkit (happens to be mine; commercial; used to do heavy duty DSL/conventional langauge analysis and transformation including on C++)
These all provide good mechanisms for defining DSLs and transforming them.
What really matters is the support machinery for carrying out "life after parsing".
I 've experimented for a couple of days with Xtext and while the tool looks promising I was eventually put off by the tight integration with the Eclipse ecosystem and the pain one has to go through just to solve what should be given hassle-free out of the box: a headless run of the code generator you implemented. See here for some of the minutiae one has to go through (and it's not even properly documented on the Xtext web site but rather on a blog, meaning its an ad-hoc patch that could very well break on the next release).
Will take another look in half a year to see if there has been any improvement on this front.
Take a look at the Markus Völter's book. It does a very comprehensive comparison of these 3 technologies.
http://dslbook.org
XText is very well maintained but this doesn't mean it's problem-less. Getting type-system, scoping and generation running isn't as easy as advertised.
Spoofax is scannerless, (simplifying grammar composition). Not that well documented, but seems complete.
MPS is projectional. A pro for language composition and con for editing. Supports multiple editors for an AST and will soon even support a nice diagram editor. Base language documentation isn't that good. Typesystem, scoping, checking is very well handled. Model to model transformations are done by the solver. My colleagues using it complain about model to text languages. (My opinion M2M wasn't that intuitive either.)
Years ago Microsoft had the OSLO project. MGrammar and especially Quadrant were very promising. It was possible to represent your model in table, form, text or diagram view. But suddenly they've cancelled the project (and perhaps shot the people working on it)
Perhaps today the best place to compare different language workbenches is http://www.languageworkbenches.net/ and there http://www.languageworkbenches.net/past-editions/ shows how a set of Language Workbenches implement a similar kind of task: a dsl for a particular domain.
Update 2022: as links were broken and newer articles on the topic are written see the site referred above at:
https://web.archive.org/web/20160324201529/http://www.languageworkbenches.net/
References to article reviewing language workbenches include: 1) State of the art: https://link.springer.com/chapter/10.1007/978-3-319-02654-1_11 and 2) Empirical evaluation: https://hal.archives-ouvertes.fr/file/index/docid/706841/filename/Evaluation_of_Modeling_Tools_Adaptation.pdf

Languages for implementing decision trees

What would be a good choice of programming language in which to implement a decision tree? The results of the implementation will be for personal use only, so no need to consider ability to publish etc.
I have heard that Octave is a good option, can anyone explain why a matrix based language is recommended for implementing decision trees?
I have used Standard ML both to implement decision trees and to write a compiler from a domain-specific language into decision trees. I've also compiled similar decision trees into C code.
It really depends what you want to do with decision trees. If you are trying to do something sophisticated or you are trying to make the decision trees especially easy to read and write, I would suggest either creating a domain-specific language or embedding domain-specific operators into Haskell or Standard ML. If you just want to get going, you could start with ML (easier than Haskell for a beginner) and that preserves some options for later.
In general, ML and Haskell are both very good at representing and manipulating trees of all kinds.
I can't explain why someone would recommend a matrix-based language for decision trees.
I am pretty sure that the first decision tree was written in LISP.
Still many such algorithm is still written in LISP.
You can find many documentation if you decide to choose LISP.
Scheme is also a good language for that purpose and it is simpler/smaller than LISP.
Also the learning curve is fast in both languages.
IMHO

Programmatic parsing and understanding of language (English)

I am looking for some resources pertaining to the parsing and understanding of English (or just human language in general). While this is obviously a fairly complicated and wide field of study, I was wondering if anyone had any book or internet recommendations for study of the subject. I am aware of the basics, such as searching for copulas to draw word relationships, but anything you guys recommend I will be sure to thoroughly read.
Thanks.
Check out WordNet.
You probably want a book like "Representation and Inference for Natural Language - A First Course in Computational Semantics"
http://homepages.inf.ed.ac.uk/jbos/comsem/book1.html
Another way is looking at existing tools that already do the job on the basis of research papers: http://nlp.stanford.edu/index.shtml
I've used this tool once, and it's very nice. There's even an online version that lets you parse English and draws dependency trees and so on.
So you can start taking a look at their papers or the code itself.
Anyway take in consideration that in any field, what you get from such generic tools is almost always not what you want. In the sense that the semantics attributed by such tools is not what you would expect. For most cases, given a specific constrained domain it's preferable to roll your own parser, and do your best to avoid any ambiguities beforehand.
The process that you describe is called natural language understanding. There are various algorithms and software tools that have been developed for this purpose.

Is Antlr a DSL generator and an alternative to Intentional Programming?

I am struck by the ambition and creativity of Charles Simonyi's efforts to establish the field of Intentional Programming, first at Microsoft and then with his own company.
What exactly is Intentional Programming
http://en.wikipedia.org/wiki/Intentional_programming
In this approach to software, a
programmer first builds a toolbox
specific to a given problem domain
(such as life insurance). Domain
experts, aided by the programmer, then
describe the program's intended
behavior in a What You See Is What You
Get (WYSIWYG)-like manner. An
automated system uses the program
description and the toolbox to
generate the final program. Successive
changes are only done at the WYSIWYG
level.
It seems to be such a useful and practical approach to programming, potentially circumventing many of the problems with current approaches to software development.
Essentially it seems to facilitate the creation of domain-specific languages by non-programmers (business/systems analysts) but at a stage much closer to real-life implementation than UML could provide. He says it will be completed eventually but that it is not there yet (almost 15 years later).
DSLs run the gamut from simple 5-line rule engines to complex applications like Ruby on Rails. So I imagine the delay in releasing his product has to do with the fact that he is dealing with simplifying a much higher level of abstraction because he has to essentially allow for the encapsulation of all domain languages at once.
So, my question is
(a) whether Antlr could be an alternative to Intentional Programming - although perhaps a less user-friendly alternative which requires the intervention of programmers rather than permitting business analysts to generate the DSL? Could you use Antlr to generate a DSL like Ruby on Rails (assuming it supported Ruby as an output - which I think it does not)? What can it not do? Also, I don't understand why it's called a "language parser" rather than a "language generator" - since the latter describes what it is used for while the former describes how it achieves its end result.
and
(b) if Antlr is different from Intentional Programming, is there anything similar to Intentional Programming?
In answer to part b), three systems that work in a similar space are:
JetBrains MPS
Eclipse xText
MetaCase MetaEdit+
Each of these products has different strengths and weaknesses, but all of them fall into the category of Language Workbenches. Intentional Software's Intentional Workbench is possibly the most ambitious product in this category to date, but is also not generally available.
MPS and xText are free, open-source products. MetaCase is the most mature, and is a commercial product. All of them have a steep learning curve.
I am not an expert on this, so treat with a large pinch of salt. However...
ANTLR itself is not a DSL generator, though it can be used to create code that interprets DSLs. It is a parser generator - but the DSL generator would have to create what ANTLR generates a parser from.
ANTLR is just a parser generator. In any non-trivial DSL, writing the parser is less than 50% of the effort expended in implementing the DSL. The evaluator/rule engine/code generator/schedule or whatever else your DSL does, probably requires more work and can't be generated like a parser.

Resources