Building a decision tree system to help spec out a system

Building a decision tree system to help spec out a system - expert-system

I'm looking into a way to build a system that will help our sales team spec a product.
The ideal system would ask some base questions which is common amongst all products. Further questions would then branch from these answers. From what I can figure this is essentially a big decision tree/expert system I've been asked to build.
I wondered if anyone has had experience in building such a system and can provide some reference or can recommend a framework/software to support such a system?

You could use the answers to questions as values of attributes and then build a decision tree based on these attributes to classify the products. Is this the sort of situation your in?
If so, what are you looking for? Algorithms? Take a look at things like ID3, it is a decision tree algorithm which could work for a situation like yours. I'm not familiar with any specific frameworks/software for decision trees. Let me know if there is any more details you can give about what sort of help you need.

Related

Meta construction capabilities?

I am currently considering Orange as the base for a meta-learning assistant prototype I intend to develop, but before committing myself to a thorough exploration of the documentation and learning about python development (which would both be quite time consuming), I would appreciate some insight regarding the feasibility of such prototype within Orange framework.
The main aim of the prototype I intend to develop is to allow efficient use of data mining and machine learning algorithm by non experts. Concretely, I wish as a first step to be able to give the user a workflow answering his modelling need, that I elicit from his dataset and expression of his need. In order to perform this elicitation, I intend to run a process that implies designing and executing learning workflows on his data.
Is it possible from within the Orange framework (or else from an above "supervising" framework) to automatically define and execute learning workflows ?

Yes, it is.
We have actually experimented with a "recommendation system" that would suggest parts of the workflow to the user. It wasn't useful. Also, there have been various meta-learning projects in the past and I think that the general consensus is --- it doesn't work. ;)
But if you intend to try it, Orange is suitable platform for this.
#hoijui: Orange no longer has any other mailing list or forum, just this one. Developers follow Stack overflow and answer questions there.

Understanding decision tree model in SAS Enterprise Miner

I'm looking to find out a little more about the automated generation of decision trees using SAS Enterprise Miner.
I'm looking to find out what types of decisions were made and basically the meaning of the example decision tree and each of its components.
If anyone might have some helpful info on this, It would be great to hear it.

Simple toolkits for emotion (sentiment) analysis (not using machine learning)

I am looking for a tool that can analyze the emotion of short texts. I searched for a week and I couldn't find a good one that is publicly available. The ideal tool is one that takes a short text as input and guesses the emotion. It is preferably a standalone application or library.
I don't need tools that is trained by texts. And although similar questions are asked before no satisfactory answers are got.
I searched the Internet and read some papers but I can't find a good tool I want. Currently I found SentiStrength, but the accuracy is not good. I am using emotional dictionaries right now. I felt that some syntax parsing may be necessary but it's too complex for me to build one. Furthermore, it's researched by some people and I don't want to reinvent the wheels. Does anyone know such publicly/research available software? I need a tool that doesn't need training before using.
Thanks in advance.

I think that you will not find a more accurate program than SentiStrength (or SoCal) for this task - other than machine learning methods in a specific narrow domain. If you have a lot (>1000) of hand-coded data for a specific domain then you might like to try a generic machine learning approach based on your data. If not, then I would stop looking for anything better ;)

Identifying entities and extracting precise information from short texts, let alone sentiment, is a very challenging problem specially with short text because of lack of context. Hovewer, there are few unsupervised approaches to extracting sentiments from texts mainly proposed by Turney (2000). Look at that and may be you can adopt the method of extracting sentiments based on adjectives in the short text for your use-case. It is hovewer important to note that this might require you to efficiently POSTag your short text accordingly.

Maybe EmoLib could be of help.

Effective strategies for studying frameworks/ libraries partially [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I remember the old effective approach of studying a new framework. It was always the best way to read a good book on the subject, say MFC. When I tried to skip a lot of material to speed up coding it turned out later that it would be quicker to read the whole book first. There was no good ways to study a framework in small parts. Or at least I did not see them then.
The last years a lot of new things happened: improved search results from Google, programming blogs, much more people involved in Internet discussions, a lot of open source frameworks.
Right now when we write software we much often depend on third-party (usually open source) frameworks/ libraries. And a lot of times we need to know only a small amount of their functionality to use them. It's just about finding the simplest way of using a small subset of the library without unnecessary pessimizations.
What do you do to study as less as possible of the framework and still use it effectively?
For example, suppose you need to index a set of documents with Lucene. And you need to highlight search snippets. You don't care about stemmers, storing the index in one file vs. multiple files, fuzzy queries and a lot of other stuff that is going to occupy your brain if you study Lucene in depth.
So what are your strategies, approaches, tricks to save your time?
I will enumerate what I would do, though I feel that my process can be improved.
Search "lucene tutorial", "lucene highlight example" and so on. Try to estimate trust score of unofficial articles ( blog posts ) based on publishing date, the number and the tone of the comments. If there is no a definite answer - collect new search keywords and links on the target.
Search for really quick tutorials/ newbie guides on official site
Estimate how valuable are javadocs for a newbie. (Read Lucene highlight package summary)
Search for simple examples that come with a library, related to what you need. ( Study "src/demo/org/apache/lucene/demo")
Ask about "simple Lucene search highlighting example" in Lucene mail list. You can get no answer or even get a bad reputation if you ask a silly question. And often you don't know whether you question is silly because you have not studied the framework in depth.
Ask it on Stackoverflow or other QA service "could you give me a working example of search keywords highlighting in Lucene". However this question is very specific and can gain no answers or a bad score.
Estimate how easy to get the answer from the framework code if it's open sourced.
What are your study/ search routes? Write them in priority order if possible.

I use a three phase technique for evaluating APIs.
1) Discovery - In this phase I search StackOverflow, CodeProject, Google and Newsgroups with as many different combination of search phrases as possible and add everything that might fit my needs into a huge list.
2) Filter/Sort - For each item I found in my gathering phase I try to find out if it suits my needs. To do this I jump right into the API documentation and make sure it has all of the features I need. The results of this go into a weighted list with the best solutions at the top and all of the cruft filtered out.
3) Prototype - I take the top few contenders and try to do a small implementation hitting all of the important features. Whatever fits the project best here wins. If for some reason an issue comes up with the best choice during implementation, it's possible to fall back on other implementations.
Of course, a huge number of factors go into choosing the best API for the project. Some important ones:
How much will this increase the size of my distribution?
How well does the API fit with the style of my existing code?
Does it have high quality/any documentation?
Is it used by a lot of people?
How active is the community?
How active is the development team?
How responsive is the development team to bug patch requests?
Will the development team accept my patches?
Can I extend it to fit my needs?
How expensive will it be to implement overall?
... And of course many more. It's all very project dependent.
As to saving time, I would say trying to save too much here will just come back to bite you later. The time put into selecting a good library is at least as important as the time spent implementing it. Also, think down the road, in six months would you rather be happily coding or would you rather be arguing with a xenophobic dev team :). Spending a couple of extra days now doing a thorough evaluation of your choices can save a lot of pain later.

The answer to your question depends on where you fall on the continuum of generality/specificity. Do you want to solve an immediate problem? Are you looking to develop a deep understanding of the library? Chances are you’re somewhere between those extremes. Jeff Atwood has a post about how programmers move between these levels, based on their need.
When first getting started, read something on the high-level design of the framework or library (or language, or whatever technology it is), preferably by one of the designers. Try to determine what problems they are trying to address, what the organizing principles behind the design are, and what the central features are. This will form the conceptual framework from which future understanding will hang.
Now jump in to it. Create something. Do not copy and paste somebody's code. Instead, when things don’t work, read the error messages in detail, and the help on those error messages, and figure out why that error occurred. It can be frustrating, when things don’t work, but it forces you to think, and that’s when you learn.

1) Search Google for my task
2) look at examples with a few different libraries, no need to tie myself down to Lucene for example, if I don't know what other options I have.
3) Look at the date of last update on the main page, if it hasn't been updated in 6-months leave (with some exceptions)
4) Search for sample task with library (don't read tutorials yet)
5) Can I understand what's going on without a tutorial? If yes continue if no start back at 1
6) Try to implement the task
7) Watch myself fail
8) Read a tutorial
9) Try to implement the task
10) Watch myself fail and ask on StackOverflow, or mail the authors, post on user group (if friendly looking)
11) If I could get the task done, I'll consider the framework worthy of study and read up the main tutorial for 2 hours (if it doesn't fit in 2 hours I just ignore what's left until I need it)

I have no recipe, in the sense of a set of steps I always follow, that's largely because everything I learn is different. Some things are radically new to me (Dojo for example, I have no fluency in Java script so that's a big task), some just enhancements of previous knowledge (Iknow EJB 2 well, so learning EJB 3 while on the surface is new with all its annotations, its building on concepts.)
My general strategy though is I'd describe as "Spiral and Park". I try to circle the landscape first, understand the general shape, I Park concepts that I don't get just yet, don't let it worry me. Then i go a little deeper into some areas, but again try not to get obsessed with one, Spiralling down into the subject. Hopefully I start to unpark and understand, but also need to park more things.
Initially I want answers to questions such as:
What's it for?
Why would I use this rather than that other thing I already know
What's possible? Any interesting sweet spots. "Eg. ooh look at that nice AJAX-driven update"
I do a great deal of skim reading.
Then I want to do more exploring on the hows. I start to look for gotchas and good advice. (Eg. in java: why is "wibble".equals(var) a useful construct?)
Specific techniques and information sources:
Most important: doing! As early as possible I want to work a tutorial or two. I probably have to get the first circuit of the spiral done, but then I want to touch and experiment.
Overview documents
Product documents
Forums and discussion groups, learning by answering questions is my favourite technique.
if at all possible I try to find gurus. I'm fortunate in having in my immediate colleagues a wealth of knowledge and experience.

Quick-start guides.
A quick look at the API documentation if available.
Reading sample codes.
Messing around YOU HAVE TO MESS AROUND (sorry for the caps).
If it's a small library/API with a small or no community you can always contact the developer himself and ask for help 'cause he'll probably be more than happy to help you; he's happy that one more person is using his API.

Mailing lists are a great resource as long as you do your homework first before asking questions.
Mailing list archives are invaluable for most of the questions I've had on CoreAudio related stuff.

I would never read javadoc. As there often is none. And when there is, most likely it isnt up to date. So one gets confused at the best.
Start with the simplest possible tutorial you find within some minutes.
Often the tutorial will lead you to further sources at the end, so then most of the time one is on a path that goes on and on, deeper and deeper.

It really depends on what the topic is and how much info is on it. Learning by example is a good way to start a topic brand new to you, especially if you're knowledgeable in other similar libraries or languages. You can take a topic you're familiar with, and say "I understand how to implement using X, lets see how it's done using Y".

So what are your strategies, approaches, tricks to save your time?
Well, I search. I generally never ask questions, preferring to research myself. If worse comes to worse I'll read the documentation. In some cases (say, when I was doing some work with SharpSVN) I had to look at the source, specifically the test cases, to get some information about how the API worked.
Generally, I have to be honest, most of my 'study' and 'learning' is by accident.
For example, just a few seconds ago, I discovered how to get the "Recent" folder in C#. I had no idea how to do that before seeing the question, considering it interesting, and then searching.
So for me the real 'trick' is that I hang around on forums, answer questions, and accidentally pick up knowledge. Then when it comes time for me to research something; chances are I know a bit about it, and searching is easier and I can focus on the implementation [typically implementing a test program first] and progressing from there.

About "AUTOMATIC TEXT SUMMARIZER (lingustic based)" [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am having "AUTOMATIC TEXT SUMMARIZER (linguistic approach)" as my final year project. I have collected enough research papers and gone through them. Still I am not very clear about the 'how-to-go-for-it' thing. Basically I found "AUTOMATIC TEXT SUMMARIZER (statistical based)" and found that it is much easier compared to my project. My project guide told me not to opt this (statistical based) and to go for linguistic based.
Anyone who has ever worked upon or even heard of this sort of project would be knowing that summarizing any document means nothing but SCORING each sentence (by some approach involving some specific algos) and then selecting sentences having score more than threshold score. Now the most difficult part of this project is choosing the appropriate algorithm for scoring and later implementing it.
I have moderate programming skills and would like to code in JAVA (because there I'll get lots of APIs resulting in lesser overheads). Now I want to know that for my project, what should be my approach and algos used. Also how to implement them.

Using Lexical Chains for Text Summarization (Microsoft Research)
An analysis of different algorithms: DasMartins.2007
Most important part in the doc:
• Nenkova (2005) analyzes that no system
could beat the baseline with statistical
significance
• Striking result!
Note there are 2 different nuances to the liguistic approach:
Linguistic rating system (all clear here)
Linguistic generation (rewrites sentences to build the summary)

Automatic Summarization is a pretty complex area - try to get your java skills first in order as well as your understanding of statistical NLP which uses machine learning. You can then work through building something of substance. Evaluate your solution and make sure you have concretely defined your measurement variables and how you went about your evaluation. Otherwise, your project is doomed to failure. This is generally considered a high risk project for final year undergraduate students as they often are unable to get the principles right and then implement it in a way that is not right either and then their evaluation measures are all ill defined and don't reflect on their own work clearly. My advice would be to focus on one area rather then many in summarization as you can have single and multi document summaries. The more varied you make your project the less likely hold of you receiving a good mark. Keep it focused and in depth. Evaluate other peoples work then the process you decided to take and outcomes of that.
Readings:
-Jurafsky book on NLP there is a back section on summarization and QA.
-Advances in Text Summarization by inderjeet mani is really good
Understand what things like term weighting, centroid based summarization, log-likelihood ratio, coherence relations, sentence simplification, maximum marginal relevance, redundancy, and what a focused summary actually is.
You can attempt it using a supervised or an unsupervised approach as well as a hybrid.
Linguistic is a safer option that is why you have been advised to take that approach.
Try attempting it linguistically then build statistical on to hybridize your solution.
Use it as an exercise to learn the theory and practical implication of the algorithms as well as build on your knowledge. As you will no doubt have to explain and defend your project to the judging panel.

If you really have read those research papers and research books you probably know what is known. Now it is up to you to implement the knowledge of those research papers and research books in a Java application. Or you could expand the human knowledge by doing some innovation/invention. If you do expand human knowledge you have become a true scientist.

Please make your question more specific, in these two main areas:
Project definition: What is the goal of your project?
Is the input unit a single document? A list of documents?
Do you intend your program to use machine learning?
What is the output?
How will you measure success?
Your background knowledge: You intend to use linguistic rather than statistical methods.
Do you have background in parsing natural language? In semantic representation?
I think some of these questions are tough. I am asking them because I spent too much time trying to answer similar questions in the course of my studies. Once you get these sorted out, I may be able to give you some pointers. Mani's "Automatic Summarization" looks like a good start, at least the introductory chapters.

The University of Sheffield did some work on automatic email summarising as part of the EU FASiL project a few years back.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string