I'm trying to find out the exact formula used in H2O for the Mean Residual Deviance loss function for a Tweedie distribution.
Or even, in general, what would be the mean residual deviance for a Tweedie distributed dependent variable?
So far, I've found this page (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html#tweedie-models) where the deviance formula for a tweedie distribution is given as:
However, inside the H2O code, found on github on this page line 103 (https://github.com/h2oai/h2o-3/blob/master/h2o-core/src/main/java/hex/Distribution.java#L103) the formula is specified differently (ignoring the omega, which is just the weight, and the lack of summation):
2 * w * (Math.pow(y, 2 - tweediePower) / ((1 - tweediePower) * (2 - tweediePower)) - y * exp(f * (1 - tweediePower)) / (1 - tweediePower) + exp(f * (2 - tweediePower)) / (2 - tweediePower))
which in equation form is:
So, is the documentation wrong or the implementation? I would appreciate any help!
Thank you!
Thank you for pointing this out, while the backend equation located here is correct (so the implementation is correct), the equation in the documentation appears to be incorrect. I have created this Jira ticket to update the equation in the documentation. The ticket contains the correct equation along with helpful information to derive it.
Related
Is there a way to increase the value of objective_convergence_tolerance for Solver=2 (BPOPT)?
It seems to be working only for Solver = 1 and 2.
Thanks.
There are 4 parts that are checked for convergence including:
max(abs(g + a^T * lam - zL + zU))/s_d
max(abs(c))
max(abs(diag[x-xL s-sL]*zL - mu))/s_c
max(abs(diag[xU-x sU-s]*zU - mu))/s_c
The maximum of these 4 parts must be less than the convergence tolerance. Right now there is no separate objective function convergence criterion. Gekko has 3 solvers that are included with the publicly available version including:
APOPT
BPOPT
IPOPT
The BPOPT solver isn't typically one one of the best solvers if you'd like to try m.options.SOLVER=1 (APOPT) or m.options.SOLVER=3 (IPOPT).
I'm looking into a GradientBoostingClassifier in sklearn. Then, I found there are 3 kind of criterion. Friedman mse, mse, mae.
the descriptions provided by sklearn are:
The function to measure the quality of a split. Supported criteria are “friedman_mse” for the mean squared error with improvement score by Friedman, “mse” for mean squared error, and “mae” for the mean absolute error. The default value of “friedman_mse” is generally the best as it can provide a better approximation in some cases.
I can't understand what is different?
Who's gonna let me know?
thanks!
I've provided a full answer in this link due to the convenience of writing TeX. However, it resumes in the fact that this splitting criterion allow us to take the decision not only on how close we're to the desired outcome (which is what MSE does), but also based on the probabilities of the desired k-class that we're going to find in the region l or in the region r (by considering a global weight w1*w2 / (w1 + w2)). I strongly recommend you to check the above link for a full explanation.
According to the scikit-learn source code, the main difference between these two criteria is the impurity-improvement method. The MSE / FriedmanMSE criterion calculates an impurity of the current node and tries to reduce (improve) it, The smaller the impurity the better.
Mean squared error impurity criterion.
MSE = sum_square_of_left / w_l + sum_square_of_right / w_r
source
On the other side FriedmanMSE impurity criterion use following to improve purity:
diff = w_r * total_left_sum - w_l * total_rigth_sum
improvement = diff**2 / (w_r * w_l)
Note: w_r (right) is for total left sum and visa versa.
you can simplify the following equations with the better notation, which was provided in Friedman published paper itself (eq. 35).
which says
improvement = (w_l * w_r) / (w_l + w_r) * (mean_left - mean_right) ^ 2
Which w_l, w_r are the corresponding sum of weights for respective left or right part.
source
For assigning meaning to left and right keywords, imagine the whole system in an array (e.g samples[start: end]), so for example left means the left elements of the current node.
I am trying to minimize the objective function in B19. Why the solver can't find any feasible solution? I can't understand. Basically the model is:
Variable:
- xi
Constraints:
- xi is boolean
- sum(li * xi') < F
- li * xi > F' , foreach i
Objective function:
- sum(xi * di)
Yes, the set of constraints that you currently have, when combined are infeasible.
If you relax them a little, Excel will find the optimal solution.
First, look at pi * xi > 0.6 for all i
This does two things. Since all pi is greater than 0, all the xi variables are forced to be 1. (0 is not possible)
Also, look at the last column, xi. Since that pi is 0.2, even for x=1, xi * pi cannot be >0.6. (To find a feasible solution, you have to lower your P' to be 0.2
Now look at your other constraint:
sum of pi * xi < 5
All your pi's add up to 5.4. And due to the previous set of constraints, all the X's are forced to take on the value of 1. So you have to make B10 to be at least 5.4.
All that said, I suspect that you don't want your xi to be binary. Perhaps, you want them to be 0 <= xi <= 1, that is, they can take fractional values as well.
After relaxing the values of B10 and B16, Excel was able to find a solution. See image below.
Update Based on OP's Clarification
Your Xi variables are fine. 0 means taxi is running, 1 means it will go for repair.
Objective function: Minimize cost of repair (Sumproduct of Ci Xi) is also fine.
You have to change the Individual taxi constraint. As the problem defines it, pi = 0 is good, pi of 1 is bad. The company doesn't want ANY taxi to have its pi more than a threshold P'.
Constraint: pi * (1-Xi) <= 0.6 (say)
Think about this: As soon as any taxi's pi crosses the threshold, it will be forced to go for repair since Xi will have to become 1.
The global availability: For this, you can simply sum up the 'available taxis' and make sure that there are more than the required minimum (say 5)
sum(1-xi) >=5
If you set up your Excel model this way, you will get a feasible solution.
See Image below:
Update 2: Including a Global Degradation Index cap
If instead of constraint 4. above, you wanted a constraint that kept the sum of all pi's under some limit you would do the following:
(1-xi) is the indicator for a taxi in service.
So this constraint becomes:
Sum of pi * (1-xi) <= P'
For convenience, you can create a row in Excel which is (1-xi) and then use Sumproduct of that row and the pi row.
Hope that helps.
In particular, glmnet docs imply it creates a "Generalised Linear Model" of the gaussian family for regression, while scikit-learn imply no such thing (ie, seems like it's a pure linear regression, not generalised). But I'm not sure about this.
In the documentation you link to, there is an optimization problem which shows exactly what is optimized in GLMnet:
1/(2N) * sum_i(y_i - beta_0 - x_i^T beta) + lambda * [(1 - alpha)/2 ||beta||_2^2 + alpha * ||beta||_1]
Now take a look here, where you will find the same formula written as the optimization of a euclidean norm. Note that the docs have omitted the intercept w_0, equivalent to beta_0, but the code does estimate it.
Please also note that lambda becomes alpha and alpha becomes rho...
The "Gaussian family" aspect probably refers to the fact that an L2-loss is used, which corresponds to assuming that the noise is additive Gaussian.
Hey i am trying to calculate a cosinor analysis in statistica but am at a loss as to how to do so. I need to calculate the MESOR, AMPLITUDE, and ACROPHASE of ciracadian rhythm data.
http://www.wepapers.com/Papers/73565/Cosinor_analysis_of_accident_risk_using__SPSS%27s_regression_procedures.ppt
there is a link that shows how to do it, the formulas and such, but it has not given me much help. Does anyone know the code for it, either in statistica or SPSS??
I really need to get this done because it is for an important paper
I don't have SPSS or Statistica, so I can't tell you the exact "push-this-button" kind of steps, but perhaps this will help.
Cosinor analysis is fitting a cosine (or sine) curve with a known period. The main idea is that the non-linear problem of fitting a cosine function can be reduced to a problem that is linear in its parameters if the period is known. I will assume that your period T=24 hours.
You should already have two variables: Time at which the measurement is taken, and Value of the measurement (these, of course, might be called something else).
Now create two new variables: SinTime = sin(2 x pi x Time / 24) and CosTime = cos(2 x pi x Time / 24) - this is desribed on p.11 of the presentation you linked (x is multiplication). Use pi=3.1415 if the exact value is not built-in.
Run multiple linear regression with Value as outcome and SinTime and CosTime as two predictors. You should get estimates of their coefficients, which we will call A and B.
The intercept term of the regression model is the MESOR.
The AMPLITUDE is sqrt(A^2 + B^2) [square root of A squared plus B squared]
The ACROPHASE is arctan(- B / A), where arctan is the inverse function of tan. The last two formulas are from p.14 of the presentation.
The regression model should also give you an R-squared value to see how well the 24 hour circadian pattern fits the data, and an overall p-value that tests for the presence of a circadian component with period 24 hrs.
One can get standard errors on amplitude and phase using standard error propogation formulas, but that is not included in the presentation.