I'm running the following to check whether there's any difference between the median of the variable day1TotalStake across a series of day1AverageReturnQuartiles with the following statement:
proc npar1way data = hsb2a wilcoxon;
class day1AverageReturnQuartile;
var day1TotalStake;
run;
SAS will only output a Kruskal-Wallace statistic, however. It does report rank sums for each Quartile, but no overall statistic.
Should I be calling some other function?
Thanks.
I ran your code and it gave me the "Wilcoxon Two-Sample test" statistic (sum of ranks). If Wikipedia and my undergrad non-parametrics course are serving me, the "U" statistic can be easily derived from this value by subtracting a constant. If you're looking for assistance with that, you may want to ask at https://stats.stackexchange.com/.
Related
I am using PSPP and want to compare a sample's median against a given median value but all the options I find compare two variables against each other. I have tried a workaround by definining a variable that's filled with my reference value:
NPAR TEST
/Wilcoxon [Variable of my actual data set] WITH [Variable filled with my reference value].
NPAR TEST
/SIGN [Variable of my actual data set] WITH [Variable filled with my reference value].
But this is a) a dodgy workaround, and b) the results are nowhere near the Wilcoxon test result I get with SPSS.
What would be the correct syntax, or a better workaround?
As far as I can see PSPP does not have direct implementation for this, however your workaround of comparing to a sample with a constant value set to the median appears to be correct rather than "a dodgy workaround".
This is confirmed in this IBM answer to the same question regarding SPSS:
The one-sample Wilcoxon test can also be handled as a special case of the Wilcoxon matched pairs test, with the second variable being a constant value equal to the null hypothesized value against which you want to test. Simply compute a constant variable, then use that along with your variable of interest in the paired samples test. For a discussion of why this is legitimate, refer to a nonparametric statistics text such as Section 5.1 of W. J. Conover's (1971) Practical Nonparametric Statistics (Wiley).
I don't have access to that textbook to check their answer, but I would consider IBM a reliable source anyway. I do not know why you would have found A different result using this method, it seems to behave correctly when I try it: perhaps double check you entered the data in the same way under SPSS and PSPP?
In the case study "A Set Partitioning Problem" in the pulp documentation "sum" is used to express the constraints, for example:
#A guest must seated at one and only one table
for guest in guests:
seating_model += sum([x[table] for table in possible_tables
if guest in table]) == 1, "Must_seat_%s"%guest
Whereas "lpSum" seems to be used otherwise. Consider for example the following constraint in the case study A Transportation Problem
for b in Bars:
prob += lpSum([vars[w][b] for w in Warehouses])>=demand[b], "Sum_of_Products_into_Bar%s"%b
Why is "sum" used in the first example? And when to use "sum" v.s. "lpSum"?
You should be using lpSum every time you have at least one pulp variable inside the sum expression. Using sum is not wrong, just more inefficient. We should probably change the docs so they are consistent. Feel free to open an issue or, even better, do a PR so we can correct the Set Partitioning Problem example.
I have an optimization problem where I want to minimize for the total cost of a system, so I write an objective function that is the sum of my different costs. The problem includes using one of three machines each one with different cost at a different threshold of usage. I define each machine (model.Machine#) as a binary variable and declare the parameters of each machine cost model.Cost#). I am trying to get the cost to be able to minimize it but when I write the constraint:
model.Cost1*model.Machine1 + model.Cost2*model.Machine2 + model.Cost3*model.Machine3 == model.MachineCost
Where I also write:
model.Machine1 + model.Machine2 + model.Machine3 == 1
Gurobi is telling me that it can't handle an quadratic function referring to the first constraint mentioned above. However it is parameters multiplied by binary variables there isn't anything quadratic.
I know the question is vague and part of a larger problem but I hope you can understand what I am referring to and help me!
Thank you so much for your assistance!
What is model.MachineCost? Is it an Expression component with some kind of quadratic expression stored inside of it?
If not, can you start commenting out things in your model until you get down to a minimal working example (that causes this error) and post that? Otherwise, we can't be sure that there are not other quadratic pieces of the model that you are not showing.
I have an input file in this format: (length 20, 10 chars and 10 numerics)
jname1 0000500006
bname1 0000100002
wname1 0000400007
yname1 0000000006
jname1 0000100001
mname1 0000500012
mname2 0000700013
In my jcl I have defined my sysin data as such:
SYSIN DATA *
SORT FIELDS=(1,1,CH,A)
SUM FIELDS=(11,10,FD)
DATAEND
*
It works fine as long as I don't add the sum fields so I'm wondering if I'm using the wrong format for my numerics seeing as I know they start at field 11 and have a length of 10 the format is the only thing that could be wrong.
As you might have already realised the point of this JCL is to just list the values but grouped by the first letter of the name (so for the example data and JCL I have given it would group the numeric for mname1 and mname2 together but leave the other records untouched).
I'm kind of new at this so I was wonder what I need for the format if my numerics are like that in the input file.
If new to DFSORT, get hold of the DFSORT Getting Started guide for your version of DFSORT (http://www-01.ibm.com/support/docview.wss?uid=isg3T7000080).
This takes your through all the basic operations with many examples.
The DFSORT Application Programming Guide describes everything you need to know, in detail. Again with examples. Appendix C of that document contains all the data-types available (note, when you tried to use FD, FD is not valid data-type, so probably a typo). There are Tables throughout the document listing what data-types are available where, if there is a particular limit.
For advanced techniques, consult the DFSORT Smart Tricks publication here: http://www-01.ibm.com/support/docview.wss?uid=isg3T7000094
You need to understand a bit more the way data is stored on a Mainframe as well.
Decimals (which can be "packed-decimal" or "zoned-decimal") do not contain a decimal-point. The decimal-point is implied. In high-level languages you tell the compiler where the decimal-point is (in a fixed position) and the compiler does the alignments for you. In Assembler, you do everything yourself.
Decimals are 100% accurate, as there are machine-instructions which act directly on packed-decimal data giving packed-decimal results.
A field which actually contains a decimal-point, cannot be directly used in arithmetic.
An unsigned field is treated as positive when used in any arithmetic.
The SUM statement supports a limited number of numeric definitions, and you have chosen the correct one. It does not matter that your data is unsigned.
If the format of the output from SUM is not what you want, look at OPTION ZDPRINT (or NOZDPRINT).
If you want further formatting, you can use OUTREC or OUTFIL.
As an option to using SUM, you can use OUTFIL reporting functions (especially, although not limited to, if you want a report). You can use SECTIONS and TRAILER3 with TOT/TOTAL.
Something to watch for with SUM (which is not a problem with the reporting features) is if any given one (or more) of your SUMmed fields exceed the field size. To continue to use SUM if that happens, you need to extend the field in INREC and then get SUM to use the new, sufficient, size.
After some trial and error I finally found it, appearantly the format I needed to use was the ZD format (zoned decimal, signed), so my sysin becomes this:
SYSIN DATA *
SORT FIELDS=(1,1,CH,A)
SUM FIELDS=(11,10,ZD)
DATAEND
*
even though my records don't contain any decimals and they are unsigned, I don't really get it so if someone knows why it's like that please go ahead and explain it to me.
For now the way I'm going to remember it is this: Z = symbol for real (meaning integers so no decimals)
I am having some strange problems with excel's solver. Basically what I am trying to do is curve fit my data. I have two different lines, one is my calibration line and the other is the derived line that I am attempting to match up to the calibration line. My line depends on 19 different variable parameters (Perhaps this is too many? I have tried fewer without result) and I am using solver to adjust these parameters to make the two lines as close as possible.
For Example:
The QP column contains the variables I would like changed, changing these will draw me closer or further from the calibration curve. Each subsequent value of QP must be greater than the first.
Col=B Col=C
Power .QP_'
1 ..... 57000
2 ..... 65000
3 ..... 70000
4 ..... 80000
5 ..... 80000
Therefore my excel solver parameters look like this: C1:C19>=0,C1:C19<=100000 and C2>=C1, C3>=C2,C4>=C3... I have also tried making another column of the differences between each value and then saying that these must be diff>=0.
To compare this with my calibration curve I have taken the calibration curve data and subtracted my data derived from QP and then squared that to create my sum of the squares error. For example:
(Calibration-DerivedQP)^2=SS(x) <- where x represents the row number
Sum(SS(x))=SSE
SSE is what I have set solver to minimize. And upon changing QP everything automatically updates. There are no if statements being used and no pivot tables are used.
If I remove the parameters similar to C2>=C1 everything works perfectly, except the derived values are not feasible. But when the solver is run with these parameters, nothing gets changed and no matter which guesses I used as starting values ( so that I can ensure I haven't guessed a local minimum), the solver cannot improve upon my solution. This has led me to believe that something in my parameters is being broken, since I can very easily improve on my solution by guess and check. The rest of solvers settings are at the defaults, and the evolutionary method is used since my curve isn't smooth (I don't think) I had this working in the past and now something seems to be broken. Any ideas are appreciated! Thank you so much! Sorry if I am missing any critical information. I am also familiar with matlab and R if there are better methods in those languages.
I found the solution to my problem. I don't know if this will be helpful to anyone else since my problem vague and pretty specific to me. That being said, my problem was in the constraints. I changed some data on my excel sheet to allow for fewer constraints. An example might look like this:
Guess..........Squared......Added..................Q
-12..............(-12)^2....... 0
-16..............(-16)^2.......=(-16)^2+0.............256
+7.................(7)^2..........=(7)^2+(-16)^2+0....305
Now I allow solver to guess any number subject to minimal constraints.
Essentially, what is happening now, is the excel sheet allows for any guess that solver makes to work. By squaring the numbers it give me positive values, and the added column ensures that each successive value is equal to or greater than the first. This means there are very few constraints. I also changed the solver option from evolutionary to GRG Nonlinear.
Tips for getting solver to work:
Try and use the spreadsheet to set constraints (other than bounds, bounds seem to be good) wherever possible, the more constraints that I set in solver, the less likely my solution was to work.
Hope that helps, sorry if I have provided any incorrect information.