PSPP: How can I perform a Wilcoxon test against a single reference value - median

I am using PSPP and want to compare a sample's median against a given median value but all the options I find compare two variables against each other. I have tried a workaround by definining a variable that's filled with my reference value:
NPAR TEST
/Wilcoxon [Variable of my actual data set] WITH [Variable filled with my reference value].
NPAR TEST
/SIGN [Variable of my actual data set] WITH [Variable filled with my reference value].
But this is a) a dodgy workaround, and b) the results are nowhere near the Wilcoxon test result I get with SPSS.
What would be the correct syntax, or a better workaround?

As far as I can see PSPP does not have direct implementation for this, however your workaround of comparing to a sample with a constant value set to the median appears to be correct rather than "a dodgy workaround".
This is confirmed in this IBM answer to the same question regarding SPSS:
The one-sample Wilcoxon test can also be handled as a special case of the Wilcoxon matched pairs test, with the second variable being a constant value equal to the null hypothesized value against which you want to test. Simply compute a constant variable, then use that along with your variable of interest in the paired samples test. For a discussion of why this is legitimate, refer to a nonparametric statistics text such as Section 5.1 of W. J. Conover's (1971) Practical Nonparametric Statistics (Wiley).
I don't have access to that textbook to check their answer, but I would consider IBM a reliable source anyway. I do not know why you would have found A different result using this method, it seems to behave correctly when I try it: perhaps double check you entered the data in the same way under SPSS and PSPP?

Related

Recursive methods on CUDD

This is a follow-up to a suggestion by #DCTLib in the post below.
Cudd_PrintMinterm, accessing the individual minterms in the sum of products
I've been pursuing part (b) of the suggestion and will share some pseudo-code in a separate post.
Meanwhile, in his part (b) suggestion, #DCTLib posted a link to https://github.com/VerifiableRobotics/slugs/blob/master/src/BFAbstractionLibrary/BFCudd.cpp. I've been trying to read this program. There is a recursive function in the classic Somenzi paper, Binary Decision Diagrams, which describes an algo to compute the number of satisfying assignments (below, Fig. 7). I've been trying to compare the two, slugs and Fig. 7. But having a hard time seeing any similarities. But then C is mostly inscrutable to me. Do you know if slugs BFCudd is based on Somenze fig 7, #DCTLib?
Thanks,
Gui
It's not exactly the same algorithm.
There are two main differences:
First, the "SatHowMany" function does not take a cube of variables to consider for counting. Rather, that function considers all variables. The fact that "recurse_getNofSatisfyingAssignments" supports cubes manifest in the function potentially returning NaN (not a number) if a variable is found in the BDD that does not appear in the cube. The rest of the differences seem to stem from this support.
Second, SatHowMany returns the number of satisfying assignments to all n variables for a node. This leads, for instance, to the division by 2 in line -4. "recurse_getNofSatisfyingAssignments" only returns the number of assignments for the remaining variables to be considered.
Both algorithms cache information - in "SatHowMany", it's called a table, in "recurse_getNofSatisfyingAssignments" it's called a buffer. Note that in line 24 of "recurse_getNofSatisfyingAssignments", there is a constant string thrown. This means that either the function does not work, or the code is never reached. Most likely it's the latter.
Function "SatHowMany" seems to assume that it gets a BDD node - it cannot be a pointer to a complemented BDD node. Function "recurse_getNofSatisfyingAssignments" works correctly with complemented nodes, as a DdNode* may store a pointer to a complemented node.
Due to the support for cubes, "recurse_getNofSatisfyingAssignments" supports flexible variable ordering (hence the lookup of "cuddI" which denotes for a variable where it is in the current BDD variable ordering). For function SatHowMany, the variable ordering does not make a difference.

SUM not working 'Invalid or missing field format'

I have an input file in this format: (length 20, 10 chars and 10 numerics)
jname1 0000500006
bname1 0000100002
wname1 0000400007
yname1 0000000006
jname1 0000100001
mname1 0000500012
mname2 0000700013
In my jcl I have defined my sysin data as such:
SYSIN DATA *
SORT FIELDS=(1,1,CH,A)
SUM FIELDS=(11,10,FD)
DATAEND
*
It works fine as long as I don't add the sum fields so I'm wondering if I'm using the wrong format for my numerics seeing as I know they start at field 11 and have a length of 10 the format is the only thing that could be wrong.
As you might have already realised the point of this JCL is to just list the values but grouped by the first letter of the name (so for the example data and JCL I have given it would group the numeric for mname1 and mname2 together but leave the other records untouched).
I'm kind of new at this so I was wonder what I need for the format if my numerics are like that in the input file.
If new to DFSORT, get hold of the DFSORT Getting Started guide for your version of DFSORT (http://www-01.ibm.com/support/docview.wss?uid=isg3T7000080).
This takes your through all the basic operations with many examples.
The DFSORT Application Programming Guide describes everything you need to know, in detail. Again with examples. Appendix C of that document contains all the data-types available (note, when you tried to use FD, FD is not valid data-type, so probably a typo). There are Tables throughout the document listing what data-types are available where, if there is a particular limit.
For advanced techniques, consult the DFSORT Smart Tricks publication here: http://www-01.ibm.com/support/docview.wss?uid=isg3T7000094
You need to understand a bit more the way data is stored on a Mainframe as well.
Decimals (which can be "packed-decimal" or "zoned-decimal") do not contain a decimal-point. The decimal-point is implied. In high-level languages you tell the compiler where the decimal-point is (in a fixed position) and the compiler does the alignments for you. In Assembler, you do everything yourself.
Decimals are 100% accurate, as there are machine-instructions which act directly on packed-decimal data giving packed-decimal results.
A field which actually contains a decimal-point, cannot be directly used in arithmetic.
An unsigned field is treated as positive when used in any arithmetic.
The SUM statement supports a limited number of numeric definitions, and you have chosen the correct one. It does not matter that your data is unsigned.
If the format of the output from SUM is not what you want, look at OPTION ZDPRINT (or NOZDPRINT).
If you want further formatting, you can use OUTREC or OUTFIL.
As an option to using SUM, you can use OUTFIL reporting functions (especially, although not limited to, if you want a report). You can use SECTIONS and TRAILER3 with TOT/TOTAL.
Something to watch for with SUM (which is not a problem with the reporting features) is if any given one (or more) of your SUMmed fields exceed the field size. To continue to use SUM if that happens, you need to extend the field in INREC and then get SUM to use the new, sufficient, size.
After some trial and error I finally found it, appearantly the format I needed to use was the ZD format (zoned decimal, signed), so my sysin becomes this:
SYSIN DATA *
SORT FIELDS=(1,1,CH,A)
SUM FIELDS=(11,10,ZD)
DATAEND
*
even though my records don't contain any decimals and they are unsigned, I don't really get it so if someone knows why it's like that please go ahead and explain it to me.
For now the way I'm going to remember it is this: Z = symbol for real (meaning integers so no decimals)

Excel Solver Curve Fitting Failing - MatLab recast

I am having some strange problems with excel's solver. Basically what I am trying to do is curve fit my data. I have two different lines, one is my calibration line and the other is the derived line that I am attempting to match up to the calibration line. My line depends on 19 different variable parameters (Perhaps this is too many? I have tried fewer without result) and I am using solver to adjust these parameters to make the two lines as close as possible.
For Example:
The QP column contains the variables I would like changed, changing these will draw me closer or further from the calibration curve. Each subsequent value of QP must be greater than the first.
Col=B Col=C
Power .QP_'
1 ..... 57000
2 ..... 65000
3 ..... 70000
4 ..... 80000
5 ..... 80000
Therefore my excel solver parameters look like this: C1:C19>=0,C1:C19<=100000 and C2>=C1, C3>=C2,C4>=C3... I have also tried making another column of the differences between each value and then saying that these must be diff>=0.
To compare this with my calibration curve I have taken the calibration curve data and subtracted my data derived from QP and then squared that to create my sum of the squares error. For example:
(Calibration-DerivedQP)^2=SS(x) <- where x represents the row number
Sum(SS(x))=SSE
SSE is what I have set solver to minimize. And upon changing QP everything automatically updates. There are no if statements being used and no pivot tables are used.
If I remove the parameters similar to C2>=C1 everything works perfectly, except the derived values are not feasible. But when the solver is run with these parameters, nothing gets changed and no matter which guesses I used as starting values ( so that I can ensure I haven't guessed a local minimum), the solver cannot improve upon my solution. This has led me to believe that something in my parameters is being broken, since I can very easily improve on my solution by guess and check. The rest of solvers settings are at the defaults, and the evolutionary method is used since my curve isn't smooth (I don't think) I had this working in the past and now something seems to be broken. Any ideas are appreciated! Thank you so much! Sorry if I am missing any critical information. I am also familiar with matlab and R if there are better methods in those languages.
I found the solution to my problem. I don't know if this will be helpful to anyone else since my problem vague and pretty specific to me. That being said, my problem was in the constraints. I changed some data on my excel sheet to allow for fewer constraints. An example might look like this:
Guess..........Squared......Added..................Q
-12..............(-12)^2....... 0
-16..............(-16)^2.......=(-16)^2+0.............256
+7.................(7)^2..........=(7)^2+(-16)^2+0....305
Now I allow solver to guess any number subject to minimal constraints.
Essentially, what is happening now, is the excel sheet allows for any guess that solver makes to work. By squaring the numbers it give me positive values, and the added column ensures that each successive value is equal to or greater than the first. This means there are very few constraints. I also changed the solver option from evolutionary to GRG Nonlinear.
Tips for getting solver to work:
Try and use the spreadsheet to set constraints (other than bounds, bounds seem to be good) wherever possible, the more constraints that I set in solver, the less likely my solution was to work.
Hope that helps, sorry if I have provided any incorrect information.

Manipulating/Clearing Variables via Lists: Mathematica

My problem (in Mathematica) is referring to variables given in a particular array and manipulating them in the following manner (as an example):
Inputs: vars={x,y,z}, system=some ODE like x^2+3*x*y+...etc
(note that I haven't actually created variables x y and z)
Aim:
To assign values to the variables in the list "var" with the intention of inputting these values into the system of ODEs. Then, once I am done, clear the values of the variables in the array vars so that it is in its original form {x,y,z} (and not something like {x,1,3} where y=1 and z=3). I want to do this by referring to the positional elements of vars (I aim not to know that x, y and z are the actual variables).
The reason why: I am trying to write a program that can have any number of variables and ODEs as defined by the user. Since the number of variables and the actual letters used for them are unknown, it is necessary to perform manipulations with the array itself.
Attempt:
A fixed number of variables is easy. For the arbitrary case, I have tried modules and blocks, but with no success. Consider the following code:
Clear[x,y,z,vars,svars]
vars={x,y,z}
svars=Map[ToString,vars]
Module[{vars=vars,svars=svars},
Symbol[svars[[1]]]//Evaluate=1
]
then vars={1,y,z} and not {x,y,z} after running this. I have done functional programming with lists, atoms etc. Thus is makes sense to me that vars is changed afterwards, because I have changed x and not vars. However, I cannot get "x" in the list of variables to remain local. Of course I could put in "x" itself, but that is particular to this specific case. I would prefer to put something like:
Clear[x,y,z,vars,svars]
vars={x,y,z}
svars=Map[ToString,vars]
Module[{vars=vars,svars=svars, vars[[1]]},
Symbol[svars[[1]]]//Evaluate=1
]
which of course doesn't work because vars[[1]] is not a symbol or an assignment to a symbol.
Other possibilities:
I found a function
assignToName[name_String, value_] :=
ToExpression[name, InputForm, Function[var, var = value, HoldAll]]
which looked promising. Basically name_String is the name of the variable and value is its new value. I attempted to do:
vars={x,y,z}
svars=Map[ToString,vars]
vars[[1]]=//Evaluate=1
assignToName[svars[[1]],svars[[1]]]
but then something likeD[x^2, vars[[1]]] doesn't work (x is not a valid variable).
If I am missing something, or if perhaps I am going down the wrong path, I'm open to trying other things.
Thanks.
I can't say that I followed your train(s) of thought very well, so these are fragments which might help you to answer your own questions than a coherent and fully-formed answer. But to answer your final 'question', I think you may be going down some wrong path(s).
In passing, note that evaluating the expression
vars = {x,y,z}
does in fact define those three variables though it doesn't define any rewrite rules (such as values) for them.
Given a polynomial poly you can extract the variables in it with the function Variables[poly] so something like
Variables[x^2+3*x*y]
should return
{x,y}
Note that I write 'should' rather than does because I don't have Mathematica on this machine so my syntax may be a bit wonky. Note also that your example ODE is nothing of the sort but it strikes me that you can probably write a wrapper to manipulate an ODE into a form from which Variables can extract the variables. Mathematica offers a lot of other functions for picking expressions apart and re-assembling them, follow the trails from Variables. It often allows the use of functions defined on Lists on expressions with other heads too so it's always worth experimenting a bit.
There are a couple of widely applicable ways to avoid setting values of variables in Mathematica. For instance, you could write
x^2+3*x*y/.{x->2,y->3}
which will evaluate to
22
but not set values for x and y. This is a very simple example of using (sets of) replacement rules for temporary assignment of values to variables
The other way to avoid setting values for variables is to define functions using Modules or Blocks both of which define their own contexts. The documentation will tell you all about these two and the differences between them.
I can't help thinking that all your clever tricks using Symbol, ToExpression and ToString are a bit beside the point. Spend some time familiarising yourself with Mathematica's in-built functionality before going further down that route, you may well find you don't need to.
Finally, writing, in any language, expressions such as
vars=vars,svars=svars
will lead to madness. It may be syntactically correct, you may even be able to decrypt the semantics when you first write code like that, but in a week's time you will curse your younger self for writing it.

Check if values of two string-type items are equal in a Zabbix trigger

I am monitoring an application using Zabbix and have defined a custom item which returns a string value. Since my item's values are actually checksums, they will only contain the characters [0-9a-f]. Two mirror copies of my application are running on two servers for the sake of redundancy. I would like to create a trigger which would take the item values from both machines and fire if they are not the same.
For a moment, let's forget about the moment when values change (it's not an atomic operation, so the system may see inconsistent state, which is not a real error, for a short time), since I could work around it by looking at several previous values.
The crux is: how to write a Zabbix trigger expression which could compare for equality the string values of two items (the same item on two mirror hosts, actually)?
Both according to the fine manual and as I confirmed in praxis, the standard operators = and # only work on numeric values, so I can't just write the natural {host1:myitem[param].last(0)} # {host2:myitem[param].last(0)}. Functions such as change() or diff() can only compare values of the same item at different points in time. Functions such as regexp() can only compare the item's value with a constant string/regular expression, not with another item's value. This is very limiting.
I could move the comparison logic into the script which my custom item executes, but it's a bit messy and not elegant, so if at all possible, I would prefer to have this logic inside my Zabbix trigger.
Perhaps despite the limitations listed above, someone can come up with a workaround?
Workaround:
{host1:myitem[param].change(0)} # {host2:myitem[param].change(0)}
When only one of the servers sees a modification since the previously received value, an event is triggered.
From the Zabbix Manual,
change (float, int, str, text, log)
Returns difference between last and previous values.
For strings:
0 - values are equal
1 - values differ
I believe, and am struggling with this EXACT situation to this myself, that the correct way to do this is via calculated items.
You want to create a new ITEM, not trigger (yet!), that performs a calculated comparison on multiple item values (Strings Difference, Numbers within range, etc).
Once you have that item, have the calculation give you a value you can trigger off of. You can use ANY trigger functions in your calculation along with arrhythmic operations.
Now to the issue (which I've submitted a feature request for because this is extremely limiting), most trigger expressions evaluate to a number or a 0/1 bool.
I think I have a solution for my problem, which is that I am tracking a version number from a webpage: e.g. v2.0.1, I believe I can use string connotation and regex in calculated items in order to convert my string values into multiple number values. As these would be a breeze to compare.
But again, this is convoluted and painful.
If you want my advice, have yourself or a dev look at the code for trigger expressions and see if you can submit a patch add one trigger function for simple string comparison. (Difference, Length, Possible conversion to numerical values (using binary and/or hex combinations) etc.)
I'm trying to work on a patch myself, but I don't have time as I have so much monitoring to implement and while zabbix is powerful, it's got several huge flaws. I still believe it's the best monitoring system out there.
Simple answer: Create a UserParameter until someone writes a patch.
You could change your items to return numbers instead of strings. Because your items are checksums that are using only [0-9a-f] characters, they are numbers written in hexadecimal. So you would need to convert the checksum to decimal number.
Because the checksum is a big number, you would need to limit the hexadecimal number to 8 characters for Numeric (unsigned) type before conversion. Or if you would want higher precision, you could use float (but that would be more work):
Numeric (unsigned) - 64bit unsigned integer
Numeric (float) - floating point number
Negative values can be stored.
Allowed range (for MySQL): -999999999999.9999 to 999999999999.9999 (double(16,4)).
I wish Zabbix would have .hashedUnsigned() function that would compute hash of a string and return it as a number. Such a function should be easy to write.

Resources