So I have set up a linear model in excel of passenger numbers and population. There are two decay parameters which change the forecasts of passenger numbers - for the different types of transport.
Manually I can change the decay factors 0.1-1.0 for every combination, to see how the fit of the model changes. I would like to find the combination of parameters that creates the best model fit at a 0.01 accuracy. Any ideas how?
Essentially the passenger forecasts change when setting parameters which in turn changes the model fit. I need an easy way to see how model fit changes with changing the parameters! Thanks.
Related
I am working on a sales projection model based on a sigmoid curve. As part of the model, I hope to fit the curve to available historical data, using the sum of squared differences (SUM) between the projection and the historical. The only variable for the curve is its steepness, which I am using Solver to solve for the minimum of SUM.
I hope to make this model more dynamic by allowing users to update the historical figures (highlighted in green) once they become available. But instead of needing to run Solver every time the historical figures are updated, is it possible to automate this by integrating a Worksheet Change event for VBA? I am quite new to VBA, so I'm not too sure how to include a function like Solver into the event.
Attached are screenshots of the model and the parameters for Solver:
https://i.imgur.com/mBS76Wt.jpg
https://i.imgur.com/f8CBcHc.jpg
Thanks in advance for the help!
I want to know what is the best approach to handle a regression analysis on all text data type. I have the following data set.
my feature columns are: Strength, area of development, leadership, satisfactory
values of these columns are predefined set of texts eg. "Continuous Improvement,Self-Development,Coaching and Mentoring,Creativity,Adaptability"
based on the value in these columns I want to predict the label (overall Performance) - Outstanding or Exceeding Expectation or Meeting Expectation.
what should be the best approach to deal with this dataset ?
I am attempting to create a prediction model using multiple linear regression.
One of the predictor variables I want to use is a percentage, so it ranges from 0 - 100. I hypothesize that when it’s <50% there will be a negative effect on the target variable and when >50% a positive effect.
The mean of the predictor variable isn’t exactly 50 in my data set so I am unsure if I centre or Standardize this variable, or just subtract 50 from it to create the split I am looking for.
I am very new to statistics and self teaching myself at the moment, any help is greatly appreciated.
I have constructed a GMM-UBM model for the speaker recognition purpose. The output of models adapted for each speaker some scores calculated by log likelihood ratio. Now I want to convert these likelihood scores to equivalent number between 0 and 100. Can anybody guide me please?
There is no straightforward formula. You can do simple things like
prob = exp(logratio_score)
but those might not reflect the true distribution of your data. The computed probability percentage of your samples will not be uniformly distributed.
Ideally you need to take a large dataset and collect statistics on what acceptance/rejection rate do you have for what score. Then once you build a histogram you can normalize the score difference by that spectrogram to make sure that 30% of your subjects are accepted if you see the certain score difference. That normalization will allow you to create uniformly distributed probability percentages. See for example How to calculate the confidence intervals for likelihood ratios from a 2x2 table in the presence of cells with zeroes
This problem is rarely solved in speaker identification systems because confidence intervals is not what you want actually want to display. You need a simple accept/reject decision and for that you need to know the amount of false rejects and accept rate. So it is enough to find just a threshold, not build the whole distribution.
I've trained a model, the test results on test-set are okay.
Now I have saved the model as 'Trained model' and made a new experiment into a new dataset, for making predictions where I don't have the actual value's.
Normally, the trained model gives me a scored label result per instance.
But now, the scored label results are empty. Also when I convert the score results to CSV the scored labels column is empty.
Even stranger, when I take a look at the Statistics of the score Visualize tab, I DO see the statistics of the scored values. But no actual scored values...
Is this a bug? Or am I forgetting something important? Whats going on ;) ?
If your test dataset is missing the dependent values, your predictive experiment may fail for some models. The solution is to pad your csv file with zero values instead of blank values.
I had this same issue and it was frustrating but I think I finally understand why this is happening.
When I was training my experiment, part of the cleaning process was populating missing values or trimming existing data with R.
The problem is if one of those features is optional. For instance if you have a column that is not filled in, the scoring model will fail in the web service.
To see if this problem affects you, go to your Predictive Experiment and visualize the Score Model results. If you see empty Predicted Label & Predicted Score values, you can easily see which data points are missing.