octave mann-whitney/u_test p-value confusion - statistics

I find the result of mann-whitney test confusing, and the gnu documentation didn't help me. https://www.gnu.org/software/octave/doc/interpreter/Tests.html
Here is the simple example I tried:
octave:1> x=[1,1,1,1,1]
x =
1 1 1 1 1
octave:2> y=[2,2,2,2,2,2]
y =
2 2 2 2 2 2
octave:3> [p,z]=u_test(x, y, "<>")
p = 0.0061699
z = -2.7386
octave:4> [p,z]=u_test(x, y, ">")
p = 0.0030849
z = -2.7386
The first u_test makes sense since at that p value, the null hypothesis is rejected, and the alternative, which is that P(x>y) != 1/2 would be accepted.
However, the second u_test would suggest that the null hypothesis is again rejected so the alternative P(x>y) > 1/2 is accepted, which doesn't make any sense to me.
Where did I go wrong?

I have had a look at u_test.m and the function appears to have several bugs, thank you for discovering and reporting it.
You are right, the author has mixed up the order of the tests. It should be the other way round according to his definition (i.e. the result you're getting should be for '<').
Furthermore, it seems that the calculation for z contains a bug too, this seems to be confirmed by a different p value when using online calculators.
Feel free to report this as a bug at the octave bug tracker (and please link the resulting bug page back here in the comments; otherwise I'll report the bug if you'd like).
In the meantime I'll work on a patch. Thanks again.

Related

Bachelier Normal Implied Vol Python Calculation (Help) Jekel

Writing a python script to calc Implied Normal Vol ; in line with Jekel article (Industry Standard).
https://jaeckel.000webhostapp.com/ImpliedNormalVolatility.pdf
They say they are using a Generalized Incomplete Gamma Function Inverse.
For a call:
F(x)=v/(K - F) -> find x that makes this true
Where F is Inverse Incomplete Gamma Function
And x = (K - F)/(T*sqrt(T) ; v is the value of a call
for that x, IV is =(K-F)/x*sqrt(T)
Example I am working with:
F=40
X=38
T=100/365
v=5.25
Vol= 20%
Using the equations I should be able to backout Vol of 20%
Scipy has upper and lower Incomplete Gamma Function Inverse in their special functions.
Lower: scipy.special.gammaincinv(a, y) : {a must be positive param}
Upper: scipy.special.gammainccinv(a, y) : {a must be positive param}
Implementation:
SIG= sympy.symbols('SIG')
F=40
T=100/365
K=38
def Objective(sig):
SIG=sig
return(special.gammaincinv(.5,((F-K)**2)/(2*T*SIG**2))+special.gammainccinv(.5,((F-K)**2)/(2*T*SIG**2))+5.25/(K-F))
x=optimize.brentq(Objective, -20.00,20.00, args=(), xtol=1.48e-8, rtol=1.48e-8, maxiter=1000, full_output=True)
IV=(K-F)/x*T**.5
Print(IV)
I know I am wrong, but Where am I going wrong / how do I fix it and use what I read in the article ?
Did you also post this on the Quantitative Finance Stack Exchange? You may get a better response there.
This is not my field, but it looks like your main problem is that brentq requires the passed Objective function to return values with opposite signs when passed the -20 and 20 arguments. However, this will not end up happening because according to the scipy docs, gammaincinv and gammainccinv always return a value between 0 and infinity.
I'm not sure how to fix this, unfortunately. Did you try implementing the analytic solution (rather than iterative root finding) in the second part of the paper?

String Formatting with Decimal Number Python

I am working on building a report. One figure on the report needs to be expressed in millions. I wrote some basic formatters to handle various types of formatting that need to be consistent throughout the report. To do this, I use lambda functions and string formatting. The two functions are below. One is to round, the other to format.
formatter_round = lambda x: 0 if (x is None or x is bool) else round(x/1000000,1)
formatter_dollar = lambda x: '${:,.1}'.format(0) if (x is None or x == 0) else ('${:,.1}'.format(x) if x >= 0 else '$({:,.1})'.format(abs(x)))
Now comes the problem. See my example below.
I am dealing with two numbers, a = 350000 and b = 850000.
For a everything works as I'd expect. The float isn't necessarily correct when rounded ( not "what I'd expect", but understandable behavior), but the decimal is correct.
a = 350000
formatter_dollar(formatter_round(a))
Out[89]: '$0.3'
a = Decimal(a)
formatter_dollar(formatter_round(a))
Out[91]: '$0.4'
When I run the same example with b, however, this breaks down.
b = 850000
formatter_dollar(formatter_round(b))
Out[93]: '$0.8'
b = Decimal(b)
formatter_dollar(formatter_round(b))
Out[95]: '$0.8'
My question is, how can I properly round and display numbers?
I thought my issue was floating point numbers, and a seemed to confirm that. Then when I ran the same with b, I realized that isn't the case.

Nested if in Gnu Mathprog for an energy model

I have a code in Gnu Mathprog for an energy model:
s.t.EBa1_RateOfFuelProduction1{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR: OutputActivityRatio[r,t,f,m,y] <> 0}:
RateOfActivity[r,l,t,m,y]*OutputActivityRatio[r,t,f,m,y] = RateOfProductionByTechnologyByMode[r,l,t,m,f,y];
s.t.EBa4_RateOfFuelUse1{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR: InputActivityRatio[r,t,f,m,y]<>0}:
RateOfActivity[r,l,t,m,y]*InputActivityRatio[r,t,f,m,y] = RateOfUseByTechnologyByMode[r,l,t,m,f,y];
I want to put these two constraints in one, and i am thinking to insert two conditional expressions(if).The first if, will be referred to technology(t) and fuel(f)where the OutputActivityRatio<>0 and the second one for the same technology(t) it will start checking again the f(fuels) to see if the InputActivityRatio<>0.
Like that:
s.t.RateOfProduction{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR: OutputActivityRatio[r,t,f,m,y] <>0}:
RateOfActivity[r,l,t,m,y]*OutputActivityRatio[r,t,f,m,y] = RateOfProductionByTechnologyByMode[r,l,t,m,f,y]
If InputActivityRatio[r,t,ff,m,y]<>0 then
RateOfActivity[r,l,t,m,y]*InputActivityRatio[r,t,f,m,y] = RateOfUseByTechnologyByMode[r,l,t,m,f,y]
else 0
else 0 ;
My question is: is it possible to have two if in series (nested if) and between them to have an equation as well?How can I write something like that?
Thank you very much!
As described in your other Question (regarding nested if-then-else in mathprog) there are no If-Then-Else statements in mathprog. The workaround with conditional for-loops is also no solution for your problem, since you can only use them in pre- or post processing of your data (you can't use this in your constraints!).
But there are still possibilities to merge your constraints. I think something like the following would work, if your condition is that either Input or Output is 0.
s.t.RateOfProduction{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR}:
(RateOfActivity[r,l,t,m,y]*OutputActivityRatio[r,t,f,m,y])
+ (RateOfActivity[r,l,t,m,y]*InputActivityRatio[r,t,f,m,y])
= RateOfProductionByTechnologyByMode[r,l,t,m,f,y];
Here in the lefthandside summation one multiplication would turn zero.
Since I don't know which parts are variables and which a parameters, this solution could also fail (for example it could be problematic if there is input and output at the same time and the rest of the model doesn't contain the right bounds for that)

Solving maximizing problems in Alloy (or other optimization problems)

I've bought and read the Software Abstractions book (great book actually) a couple of months if not 1.5 years ago. I've read online tutorials and slides on Alloy, etc. Of course, I've also done exercises and a few models of my own. I've even preached for Alloy in some confs. Congrats for Alloy btw!
Now, I am wondering if one can model and solve maximizing problems over integers in Alloy. I don't see how it could be done but I thought asking real experts could give me a more definitive answer.
For instance, say you have a model similar to this:
open util/ordering[State] as states
sig State {
i, j, k: Int
}{
i >= 0
j >= 0
k >= 0
}
pred subi (s, s': State) {
s'.i = minus[s.i, 2]
s'.j = s.j
s'.k = s.k
}
pred subj (s, s': State) {
s'.i = s.i
s'.j = minus[s.j, 1]
s'.k = s.k
}
pred subk (s, s': State) {
s'.i = s.i
s'.j = s.j
s'.k = minus[s.k, 3]
}
pred init (s: State) {
// one example
s.i = 10
s.j = 8
s.k = 17
}
fact traces {
init[states/first]
all s: State - states/last | let s' = states/next[s] |
subi[s, s'] or subj[s, s'] or subk[s, s']
let s = states/last | (s.i > 0 => (s.j = 0 and s.k = 0)) and
(s.j > 0 => (s.i = 0 and s.k = 0)) and
(s.k > 0 => (s.i = 0 and s.j = 0))
}
run {} for 14 State, 6 Int
I could have used Naturals but let's forget it. What if I want the trace which leads to the maximal i, j or k in the last state? Can I constrain it?
Some intuition is telling me I could do it by trial and error, i.e., find one solution and then manually add a constraint in the model for the variable to be stricly greater than the one value I just found, until it is unsatisfiable. But can it be done more elegantly and efficiently?
Thanks!
Fred
EDIT: I realize that for this particular problem, the maximum is easy to find, of course. Keep the maximal value in the initial state as-is and only decrease the other two and you're good. But my point was to illustrate one simple problem to optimize so that it can be applied to harder problems.
Your intuition is right: trial and error is certainly a possible approach, and I use it regularly in similar situations (e.g. to find minimal sets of axioms that entail the properties I want).
Whether it can be done more directly and elegantly depends, I think, on whether a solution to the problem can be represented by an atom or must be a set or other non-atomic object. Given a problem whose solutions will all be atoms of type T, a predicate Solution which is true of atomic solutions to a problem, and a comparison relation gt which holds over atoms of the appropriate type(s), then you can certainly write
pred Maximum[ a : T ] {
Solution[a]
and
all s : T | Solution[s] implies
(gt[a,s] or a = s)
}
run Maximum for 5
Since Alloy is resolutely first-order, you cannot write the equivalent predicate for solutions which involve sets, relations, functions, paths through a graph, etc. (Or rather, you can write them, but the Analyzer cannot analyze them.)
But of course one can also introduce signatures called MySet, MyRelation, etc., so that one has one atom for each set, relation, etc., that one needs in a problem. This sometimes works, but it does run into the difficulty that such problems sometimes need all possible sets, relations, functions, etc., to exist (as in set theory they do), while Alloy will not, in general, create an atom of type MySet for every possible set of objects in univ. Jackson discusses this technique in sections 3.2.3 (see "Is there a loss of expressive power in the restriction to flat relations?"), 5.2.2 "Skolemization", and 5.3 "Unbounded universal quantifiers" of his book, and the discussion has thus far repaid repeated rereadings. (I have penciled in an additional index entry in my copy of the book pointing to these sections, under the heading 'Second-order logic, faking it', so I can find them again when I need them.)
All of that said, however: in section 4.8 of his book, Jackson writes "Integers are not actually very useful. If you think you need them, think again; ... Of course, if you have a heavily numerical problem, you're likely to need integers (and more), but then Alloy is probably not suitable anyway."

Explain this DSP notation

I'm trying to implement this extenstion of the Karplus-Strong plucked string algorithm, but I don't understand the notation there used. Maybe it will take years of study, but maybe it won't - maybe you can tell me.
I think the equations below are in the frequency domain or something. Just starting with the first equation, Hp(z), the pick direction lowpass filter. For one direction you use p = 0, for the other, perhaps 0.9. This boils down to to 1 in the first case, or 0.1 / (1 - 0.9 z-1) in the second.
alt text http://www.dsprelated.com/josimages/pasp/img902.png
Now, I feel like this might mean, in coding terms, something towards:
H_p(float* input, int time) {
if (downpick) {
return input[time];
} else {
return some_function_of(input[t], input[t-1]);
}
}
Can someone give me a hint? Or is this futile and I really need all the DSP background to implement this? I was a mathematician once...but this ain't my domain.
So the z-1 just means a one-unit delay.
Let's take Hp = (1-p)/(1-pz-1).
If we follow the convention of "x" for input and "y" for output, the transfer function H = y/x (=output/input)
so we get y/x = (1-p)/(1-pz-1)
or (1-p)x = (1-pz-1)y
(1-p)x[n] = y[n] - py[n-1]
or: y[n] = py[n-1] + (1-p)x[n]
In C code this can be implemented
y += (1-p)*(x-y);
without any additional state beyond using the output "y" as a state variable itself. Or you can go for the more literal approach:
y_delayed_1 = y;
y = p*y_delayed_1 + (1-p)*x;
As far as the other equations go, they're all typical equations except for that second equation which looks like maybe it's a way of selecting either HΒ = 1-z-1 OR 1-z-2. (what's N?)
The filters are kind of vague and they'll be tougher for you to deal with unless you can find some prepackaged filters. In general they're of the form
H = H0*(1+az-1+bz-2+cz-3...)/(1+rz-1+sz-2+tz-3...)
and all you do is write down H = y/x, cross multiply to get
H0 * (1+az-1+bz-2+cz-3...) * x = (1+rz-1+sz-2+tz-3...) * y
and then isolate "y" by itself, making the output "y" a linear function of various delays of itself and of the input.
But designing filters (picking the a,b,c,etc.) is tougher than implementing them, for the most part.

Resources