As I am not sure about predicted columns which are used in Spotfire 6.0.Could you please help me on this.
Than k you in Advance.
The free jumpstart video available from TIBCO covers and demonstrate on Lesson 3 how these computational tools work by the machines that make our civilization run, from the people whose ingenuity created them.
Related
I am currently working on a project where a multi criteria decision making algorithm is needed in order to evaluate several alternatives for a given goal. After long research, I decided to use the AHP method for my case study. The problem is that the alternatives taken into account for the given goal contain incomplete data.
For example, I am interested in buying a house and I have three alternatives to consider. One criterion for comparing them is the size of the house. Let’s assume that I know the sizes of some of the rooms of these houses, but I do not have information about the actual sizes of the entire houses.
My questions are:
Can we apply AHP (or any MCDM method) when we are dealing with
incomplete data?
What are the consequences?
And, how can we minimize the presence of missing data in MCDM?
I would really appreciate some advice or help! Thanks!
If you still looking for answers, please let me answer your questions.
Before the detail explain, I coludn't answer with a technical approach in programming language.
First, we can use uncertinal data for MCDM, AHP with statical method.
As reducing lost of data, you can use deep learning concepts like entropy.
The result of it will be get reliability by accuracy of probabilistic approach.
The example that you talked, you could find the data of entire extent in other houses has same extent of criteria. Accuracy will depend on number of criteria, reliability of inference.
To get the perfect answer in your problem, you might need to know optimization, linear algebra, calculus, statistics above intermediate level
I'm student in management major, and I would help you as I can. I hope you get what you want
i need some help in implementing Learning To Rank (LTR). It is related to my semester project and I'm totally new to this. The details are as follows:
I gathered around 90 documents and populated 10 user queries. Now i have to rank these documents based on each query using three algorithms specifically LambdaMart, AdaRank, and Coordinate Ascent. Previously i applied clustering techniques on Vector Space Model but that was easy. However in this case, I don't know how to change the data according to these algorithms. As i have this textual data( document and queries) in txt format in separate files. I have searched for solutions online and I'm unable to find a proper solution so can anyone here please guide me in the right direction i.e. Steps. I would really appreciate.
As you said you have applied the clustering in vector space model. the input of these algorithms are also vectors.
Why don't you have a look at the standard data set introduced for learning to rank issue (Letor benchmark) in which documents are shown in vectors of features?
There is also implementation of these algorithm provided in java (RankLib), which may give you the idea to solve the problem. I hope, this help you!
I am seeking a method to allow me to analyse/search for patterns in asset price movements using 5 variables that move and change with price (from historical data).
I'd like to be able to assign a probability to a forecasted price move when for example, var1 and var2 do this and var3..5 do this, then price should do this with x amount of certainty.
Q1: Could someone point me in the right direction as to what framework / technique can help me achieve this?
Q2: Would this be a multivariate continuous random series analysis?
Q3: A Hidden Markov modelling?
Q4: Or perhaps is it a data-mining problem?
I'm looking for what rather then how.
One may opt to use Machine-Learning tools to build a learner to either
both classify of what kind the said "asset price movement" will beand serve also statistical probability measures for such a Classifier prediction
both regress a real target value, to which the asset price will moveandserve also statistical probability measures for such a Regressor prediction
A1: ( while StackOverflow strongly discourages users to ask about an opinion about a tool or a particular framework ) there would be not much damages or extra time to be spent, if one performs academia papers research and there would be quite a remarkable list of repeatedly used tools, used for ML in the context of academic R&D. For a reason, there would not be a surprise to meet scikit-learn ML-classes a lot, some other papers may work with R-based quantitative finance / statistical libraries. The tools, however, with all due respect, are not the core to answer all the doubts and inital confusion present in a mix of your questions. The subject confusion is.
A2: No, it would not. Well, unless you beat all the advanced quantitative research and happen to prove that the Market exhibits a random behaviour ( which it is not and for which it would be waste of time to re-cite remarkable research published about why it is not indeed a random process ).
A3: Do not try to jump on any wagon just because of it's attractive Tag or "contemporary popularity" in marketing minded texts. With all due respect, understanding HMM is outside of your sight while you now appear to move just to the nearest horizons to first understand what to look for.
A4: This is a nice proof of a missed target. Your question shows in this particular point better than in others, how small amount of own research efforts were put into covering the problem-domain and acquiring at least some elementary knowledge before typing the last two questions.
StackOverflow encourages users to ask high quality questions, so do not hesitate to re-edit your post to add some polishing efforts to this subject.
If in a need for an inspiration, try to review a nice and a powerful approach for a fast Machine Learning process, where both Classification and Regression tasks obtain also probability estimates for each predicted target value.
To have some idea about highly performant ML-predictors, these typically operate on much more than a set of 5 variables ( called in the ML-domain "features" ) . ( Think rather about some large hundreds to small thousands features, typically heavily non-linear transformations from the original TimeSeries' data ).
There you go, if indeed willing to master ML for algorithmic trading.
May like to read about a state-of-art research in this direction:
[1] Mondrian Forests: Efficient Online Random Forests
>>> arXiv:1406.2673v2 [stat.ML] 16 Feb 2015
[2] Mondrian Forests for Large-Scale Regression when Uncertainty Matters
>>> arXiv:1506.03805v4 [stat.ML] 27 May 2016 >>>
May also enjoy other posts on subject: >>> StackOverflow Algorithmic-Trading >>>
I am new to azure machine learning. We are trying to implement questions similarity algorithm using azure machine learning. We have large set of questions and answers. Our objective is to identify whether newly added questions are duplicates or not? Just like Stackoverflow suggests existing questions when we ask new questions?Can we use azure machine learning services to solve this? Can someone guide us in the right direction?
Yes you can use Azure Machine Learning studio and could use the method Jennifer proposed.
However, I would assume it is much better to run a R script against a database containing all current questions in your experiment and return a similarity metric for each comparison.
Have a look at the following paper for some examples (from simple/basic to more advanced) how you could do this:
https://www.researchgate.net/publication/4314910_Question_Similarity_Calculation_for_FAQ_Answering
A simple way to start would just be to implement a simple "bags of words" comparison. This will yield a distance matrix that you could use for clustering or use to give back similar questions. The following R code would so such a thing, in essence you build a large string with as first sentence the new question and then follow it with all known questions. This method will, obviously, not really take into consideration the meaning of the questions and would just trigger on equal word usage.
library(tm)
library(Matrix)
x <- TermDocumentMatrix( Corpus( VectorSource( strings.with.all.questions ) ) )
y <- sparseMatrix( i=x$i, j=x$j, x=x$v, dimnames = dimnames(x) )
plot( hclust(dist(t(y))) )
Yes, you can definitely do this with Azure Machine Learning. It sounds like you have a clustering problem (you are trying to group together similar questions).
There is a "Clustering: Find similar companies" sample that does a similar thing at https://gallery.cortanaanalytics.com/Experiment/60cf8e46935c4fafbf86f669121a24f0. You can read the description on that page and click the "Open in Studio" button in the right-hand sidebar to actually open the workspace in Azure Machine Learning Studio. In that sample, they are finding similar companies based on the text from the company's Wikipedia article (for example: Microsoft and Apple are similar companies because the word "computer" appears a lot in both articles). Your problem is very similar except you would use the text in your questions to find similar questions and cluster them into groups accordingly.
In k-means clustering, "k" is the number of clusters that you want to form, so this number will probably be pretty big for your specific problem. If you have 500 questions, perhaps start with 250 centroids? But mess around with this number and see what works. For performance reasons, you might want to start with a small dataset for testing and then run all of your data through the model after it seems to be grouping well.
Also, the documentation for K-means clustering is here.
I need to conduct analysis on one factor - which is number of days per project. I have around 30000 of projects with the number of days for each.
The projects are grouped in: categories(there are 10 categories), scales(A/B/C), regions(EU or Asia), months (12-in 1 year) and also one 0-1 factor.
I need to run analysis on this whole database, to find out which factors are important for the number of days and how they are influencing it.
I think that linear regression is one of the way to do it but I don't know how to use it (I'm going to work in excel).
I'm not sure if MANOVA is the right method and also how to conduct the analysis using it.
Are the methods correct and is there some guide how to run them in excel? Are there any more useful methods to do it?
Since it is 1-factor I think it would be ANOVA rather than MANOVA. Excel's Analysis ToolPak has a nice ANOVA utility. Here is a nice tutorial on using it.
Over the years there has been repeated criticisms of the numerical stability of Excel's statistical computation. The worst of the problems seem to have been largely fixed by Excel 2010, but many statisticians still regard Excel with suspicion. With 30,000 data points split between so many categories, I really don't think Excel would be a good tool. It might be good for a preliminary analysis, but with data on that scale you should consider using R.