How to find nonlinear systems of equations information - openmodelica

Dymola generates a summary of the linear and nonlinear systems of equations. Here is an example of the Dymola output:
Sizes of nonlinear systems of equations: {6, 11, 44}
Sizes after manipulation of the nonlinear systems: {1, 9, 11}
Is the same information available when using OpenModelica? If so, what is the process for generating a nonlinear systems of equations summary?
Thanks,
Michael

You can use -d=backenddaeinfo which prints things like:
Non-linear torn systems: 6 {2 1,1 1,1 6,2 9,2 3,2 1}
Or you use in OMEdit Simulate with transformational debugger and check the equations and blocks in detail. Would in the GUI display something like:
non-linear (torn), unknowns: 3, iteration variables: 2
(torn) der(pumps.heatTransfer.states[1].h) := Modelica.Media.Water.IF97_Utilities.h_pT_der(pumps.medium.p, pumps.medium.T, $cse5, der(pumps.medium.p), der(pumps.medium.T))
(residual) Modelica.Media.Water.IF97_Utilities.rho_pT_der(pumps.medium.p, pumps.medium.T, $cse5, $DER.pumps.medium.p, $DER.pumps.medium.T) - $DER.pumps.rho = 0
(residual) pumps.medium.p * $DER.pumps.rho + ($DER.pumps.heatTransfer.states[1].h - $DER.pumps.medium.u) * pumps.rho ^ 2.0 - $DER.pumps.medium.p * pumps.rho = 0

Related

Finding probability for Discrete Binomial distribution problems

Problem Description:
In each of 4 different competitions, Jin has 60% chance of winning. Assuming that the competitions are independent of each other, what is the probability that: Jin will win at least 1 race?
Given Binomial distribution Parameters:
n=4
p=0.60
Hint:
P(x>=1)=1-P(x=0)
Use the binom.pmf() function of scipy.stats package to calculate the probability.
Below is the python code i have tried. But it is being evaluated as wrong.
from scipy import stats
n = 4
p = 0.6
p1 = 1 - p
p2 = stats.binom.pmf(1,4,p1)
print(p1)
Using the hint, all you need to do is to evaluate the PMF of the binomial distribution at x=0 and subtract the result from 1 to obtain the probability of Jin winning at least one competition:
from scipy import stats
x=0
n=4
p=0.6
p0 = stats.binom.pmf(x,n,p)
print(1-p0)

How to fit the best probability distribution model to my data in python?

i have about 20,000 rows of data like this,,
Id | value
1 30
2 3
3 22
..
n 27
I did statistics to my data,, the average value 33.85, median 30.99, min 2.8, max 206, 95% confidence interval 0.21.. So most values around 33, and there are some outliers (a little).. So it seems like a distribution with long tail.
I am new to both distribution and python,, i tried class fitter https://pypi.org/project/fitter/ to try many distribution from Scipy package,, and loglaplace distribution showed the lowest error (although not quiet understand it).
I read almost all questions in this thread and i concluded two approaches (1) fitting a distribution model and then in my simulation i draw random values (2) compute the frequency of different groups of values,, but this solution will not have a value more than 206 for example.
Having my data which is values (number), what is the best approach to fit a distribution to my data in python as in my simulation i need to draw numbers. The random numbers must have same pattern as my data. Also i need to validate the model is well presenting my data by drawing my data and the model curve.
One way is to select the best model according to the Bayesian information criterion (called BIC).
OpenTURNS implements an automatic method of selection (see doc here).
Suppose you have an array x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], here a quick example:
import openturns as ot
# Define x as a Sample object. It is a sample of size 11 and dimension 1
sample = ot.Sample([[xi] for xi in x])
# define distributions you want to test on the sample
tested_distributions = [ot.WeibullMaxFactory(), ot.NormalFactory(), ot.UniformFactory()]
# find the best distribution according to BIC and print its parameters
best_model, best_bic = ot.FittingTest.BestModelBIC(sample, tested_distributions)
print(best_model)
>>> Uniform(a = -0.769231, b = 10.7692)

Given these static points, which is Bezier Curve

I have this question that is to determine the bezier curve that goes through theses points: (1,1), (2,-1), (3,1). How can we find the curve? I dont get how to use the equation. And how could we find the curves degree?
There is infinity of Bezier curves through these points. You need to be more specific.
For example, one may define that curve is quadratic, the first point is starting one (t=0), the third is ending one (t=1) and the second is exact middle (t=0.5).
Then build equations (substituting t value and points coordinates) and solve them for coefficients
p[0].x * (1-t)^2 + p[1].x * 2 * t * (1-t) + p[2].x * t^2 = X(t)
example for the first point:
p[0].x * (1-0)^2 + p[1].x * 2 * 0 * (1-0) + p[2].x * (0)^2 = 1
p[0].x = 2
A Bezier curve of degree N is a weighted sum of N+1 control points in 2D
Sum(i=0,N) Wi(t).Ci
If you have N+1 points Pj for which the values of t are known, you get a linear system of 2(N+1) equations in 2(N+1) unknowns
Sum(I=0,N) Wi(tj).Ci = Pj
The tj can be chosen uniformly in [0,1]. Another choice is using the cumulated linear distances between the control points (also reduced to [0,1]).
It is a very different matter if you don't want to provide the tj yourself. Then you can reduce the degree of the curve and trade control points for the values of t. Generally, the number of equations and unknowns cannot match and the system will be over or underdetermined.
Another difficulty is if you impose the order of the points. Then the unknowns t are constrained to be increasing, leading to a difficult system of equations and inequations.

Scikit Learn - Random Forest: How continuous feature is handled?

Random Forest accepts numerical data. Usually features with text data is converted to numerical categories and continuous numerical data is fed as it is without discretization. How the RF treat the continuous data for creating nodes? Will it bin the continuous numerical data internally? or treat each data as discrete level.
for example:
I want to feed a data set(ofcourse after categorizing the text features) to RF. How the continuous data is handled by the RF?
Is it advisable to discretize the continuous data(longitudes and latitudes, in this case) before feeding? Or doing so information is lost?
As far as I understand, you are asking how the threshold is chosen for continuous features. The binning occurs at values, where your class is changed. For example, consider the following 1D dataset with x as feature and y as class variable
x = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [ 1, 1, 0, 0, 0, 0, 0, 1, 1, 1]
The two possible candidate cuts will be considered: (i) between 2 and 3 (will practically look like as x<2.5) and (ii) between 7 and 8 (as x<7.5).
Among these two candidates the second one will be chosen since it provides a better separation. Them the algorithm goes to the next step.
Therefore it is not advisable to discretize the data yourself. Think about this with the data above. If, for example, you discretize the data in 5 bins [1, 2 | 3, 4 | 5, 6 | 7, 8 | 9, 10], you miss the best split (since 7 and 8 will be in one bin).
You are asking about DecisionTrees. Because RandomForest is ensemble model, and by itself it don't know anything about data, it fully relies on decisons from base estimators (In this case DecisionTrees), and aggregates them.
So, how DecisionTree is treating continious features: Look at this official documentation page. DecisionTreeClassifier was fitted on continuous dataset (Fisher irises), if you will look at the picture of tree - it has threshold value in each node over some chosen feature at this node.

Mean absoluate error of each tree in Random Forest

I am using the evaluation class of weka for the the mean absolute error of each generated tree in random forest. The explanation says that "Refers to the error of the predicted values for numeric classes, and the error of the predicted probability distribution for nominal classes."
Can someone explain it in easy words or probably with an exammple ?
The mean absolute error is an indication of how close your predictions are, on average, to the actual values of the test data.
For numerical classes this is easy to think about.
Example:
True values: {0, 1, 4}
Predicted values: {1, 3, 1}
Differences: {-1, -2, 3} (subtract predicted from true)
Absolute differences: {1, 2, 3}
Mean Absolute Difference: (1+2+3)/3 = 2
For nominal classes a prediction is no longer a single value, but rather the probability distribution of the instance belonging to the different possible classes. The provided example will have two classes.
Example:
Notation: [0.5, 0.5] indicates an instance with 50% chance of belonging to class Y, 50% chance of belonging to class X.
True distributions: { [0,1] , [1,0] }
Predicted distributions: { [0.25, 0.75], [1, 0] }
Differences: { [-0.25, 0.25], [0, 0] }
Absolute differences: { (0.25 + 0.25)/2, (0 + 0)/2 } = {0.25, 0}
Mean absolute difference: (0.25 + 0)/2 = 0.125
You can double check my explanation by visiting the source code for Weka's evaluation class.
Also as a side note, I believe the mean absolute difference reported by Weka for random forest is for the forest as a whole, not the individual trees.

Resources