I have made a bi lstm and I would like to be able to predict new values with it, i.e. I would like to have the corresponding labels in output. However, I notice that I always get the same value. Do you know why? Did I make a mistake in my settings? You can see in the first picture my bi-lstm and in the second picture the predictions. Tanhk you!!!
I want to know what is the best approach to handle a regression analysis on all text data type. I have the following data set.
my feature columns are: Strength, area of development, leadership, satisfactory
values of these columns are predefined set of texts eg. "Continuous Improvement,Self-Development,Coaching and Mentoring,Creativity,Adaptability"
based on the value in these columns I want to predict the label (overall Performance) - Outstanding or Exceeding Expectation or Meeting Expectation.
what should be the best approach to deal with this dataset ?
I used fitrgp from gaussian process matlab toolbox and calculated the predicted values for a given observation. I calculated in three different cases and got three predicted values arrays say ypred1,ypred2 and ypred3. Now I want to test the goodness of fit for these outputs in order to judge which algorithm values gives more accurate result. The details of fitrgp is given below link,
https://uk.mathworks.com/help/stats/gaussian-process-regression-models.html
It would be grateful if anyone help me in this regard. Thank you in advace
I am using Weka IBk for text classificaiton. Each document basically is a short sentence. The training dataset contains 15,000 documents. While testing, I can see that k=1 gives the best accuracy? How can this be explained?
If you are querying your learner with the same dataset you have trained on with k=1, the output values should be perfect barring you have data with the same parameters that have different outcome values. Do some reading on overfitting as it applies to KNN learners.
In the case where you are querying with the same dataset as you trained with, the query will come in for each learner with some given parameter values. Because that point exists in the learner from the dataset you trained with, the learner will match that training point as closest to the parameter values and therefore output whatever Y value existed for that training point, which in this case is the same as the point you queried with.
The possibilities are:
The data training with data tests are the same data
Data tests have high similarity with the training data
The boundaries between classes are very clear
The optimal value for K is depends on the data. In general, the value of k may reduce the effect of noise on the classification, but makes the boundaries between each classification becomes more blurred.
If your result variable contains values of 0 or 1 - then make sure you are using as.factor, otherwise it might be interpreting the data as continuous.
Accuracy is generally calculated for the points that are not in training dataset that is unseen data points because if you calculate the accuracy for unseen values (values not in training dataset), you can claim that my model's accuracy is the accuracy that is been calculated for the unseen values.
If you calculate accuracy for training dataset, KNN with k=1, you get 100% as the values are already seen by the model and a rough decision boundary is formed for k=1. When you calculate the accuracy for the unseen data it performs really bad that is the training error would be very low but the actual error would be very high. So it would be better if you choose an optimal k. To choose an optimal k you should be plotting a graph between error and k value for the unseen data that is the test data, now you should choose the value of the where the error is lowest.
To answer your question now,
1) you might have taken the entire dataset as train data set and would have chosen a subpart of the dataset as the test dataset.
(or)
2) you might have taken accuracy for the training dataset.
If these two are not the cases than please check the accuracy values for higher k, you will get even better accuracy for k>1 for the unseen data or the test data.
So I have set up a linear model in excel of passenger numbers and population. There are two decay parameters which change the forecasts of passenger numbers - for the different types of transport.
Manually I can change the decay factors 0.1-1.0 for every combination, to see how the fit of the model changes. I would like to find the combination of parameters that creates the best model fit at a 0.01 accuracy. Any ideas how?
Essentially the passenger forecasts change when setting parameters which in turn changes the model fit. I need an easy way to see how model fit changes with changing the parameters! Thanks.