The Alloy Analyzer offers a "Output CNF to File" option, which means that I can send the constraints generated by Alloy to my favourite SAT solver. But how can I transfer the SAT result back to Alloy, so that I can visualise the solution?
Unfortunately there isn't. I agree it would be useful to have that option.
Related
Minisat is a constraint programming/satisfaction tool, there is a version of Minisat which works here in the browser http://www.msoos.org/2013/09/minisat-in-your-browser/
How can I express a scheduling problem with Minisat? Is there a higher level language which compiles to Minisat which would let me express it?
I mean for solving problems like exam timetabling. http://docs.jboss.org/drools/release/6.1.0.Final/optaplanner-docs/html_single/#examination
Another high level modeling language is Picat (http://picat-lang.org/), which have an option to solve/2 to convert to CNF when using the sat module, e.g. "solve([dump], Vars)". The syntax when using the sat module - as well as for the cp and mip modules - is similar to standard CLP syntax.
For some Picat examples, see my Picat page: http://hakank.org/picat/ .
SAT solvers like Minisat or Cryptominisat typically read a clause set of logical OR expressions in Conjunctive Normal Form (CNF). It takes an encoding step to translate your problem into this CNF format.
Circuit SAT Solvers process a nested Boolean expression rather than a CNF. But it appears that this type of solvers is nowadays outperformed by the CNF SAT Solvers.
Constraint programming solvers like Minizinc use a high level language which is easier to write and to comprehend. Depending on the features being used, Minizinc can translate its input language into a CNF/DIMACS format suitable for a SAT solver.
Peter Stuckey's paper "There are no CNF Problems" explains the idea. His slides also contain some insights on scheduling.
Have a look at Minizinc examples for scheduling written by Hakan Kjellerstrand.
Emmanuel Hebrard's Scheduling and SAT is an extensive treatment of the topic.
I worked on this project few months ago.
It was really interesting to do.
To use miniSAT (or any other SAT solvers),
you will have to reduce the Scheduling Problem to a SAT problem.
I can recommand you this question that I asked in 3 parts.
Class Scheduling to Boolean satisfiability [Polynomial-time reduction]
Class Scheduling to Boolean satisfiability [Polynomial-time reduction] part 2
Class Scheduling to Boolean satisfiability [Polynomial-time reduction] Final Part
And you will basically see, step by step, how to transform the Scheduling Problem to a SAT problem that MiniSAT can read and solve :).
Thanks again to #amit who was a very big help in this project.
With this answer, you will be able to solve N rooms with T teachers, who are teaching S subjects to G different group of students :) which is I think, enough for 99% of Scheduling Problems.
Today I wanted too look into options on SAT solving in haskell. First I tought about writing my own interface to the picosat solver.
Then I found out there is the SBV library.
It's interfaces to Z3, Yices, CVC4 and Boolector.
Also, I did a google search on github and it turs out there is even Picosat binding availiable.
Are there any other haskell bindings to SAT solvers that are worth looking at given the constraint of fast/high performance. Carification: that are as suitable for high performance SAT-solving (e.g., problems that run for days, as well as problems that need to finish as fast as possible as I check 2^20 or more SAT problems). For example, what I am particularly missing on hackage is a binding to a fast parrallel SAT solver like Plingeling. (Also, I found out about the current updated picosat binding on github more by accident and I very well might miss other options)
The default option of the SBV library is the Z3 SMT solver. Am I right in my educated guess that picosat is faster for plain SAT-solving than Z3?
Disclosure, I'm the author of the Haskell picosat bindings you mentioned.
SBV is really robust library that's been around for a while, it's good if you want an interface to external SMT or SAT solvers like Yices or Z3. Picosat is a much simpler library that I wrote simply because I wanted a library that could be installed simply without external dependencies.
Am I right in my educated guess that picosat is faster for plain SAT-solving than Z3?
I don't know what your performance constraints are, but as far as underlying solver libraries go you're not going to notice a significant difference between Z3 or Picosat until you hit really enormous problems ( billions of variables ). Both are very heavily optimized libraries and the bottleneck ( at least from the Haskell side ) is likely going to be marshalling data between the library and Haskell's runtime.
SBV is thread-safe.
Comparing Z3 and Lingeling for SAT performance is not an easy task. I'd hazard a guess that they would be more or less the same unless you take your time to figure out the exact parameters to fine tune their internal heuristics.
The good thing is that SBV provides a common interface, so you can change the solver by merely importing a different bridge:
import Data.SBV.Bridge.Z3
vs
import Data.SBV.Bridge.Boolector
and if you compile boolector to use lingeling, then you can test performance easily by merely changing one line of Haskell.
One simple question (but I haven't quite found an obvious answer in the NLP stuff I've been reading, which I'm very new to):
I want to classify emails with a probability along certain dimensions of mood. Is there an NLP package out there specifically dealing with this? Is there an obvious starting point in the literature I start reading at?
For example, if I got a short email something like "Hi, I'm not very impressed with your last email - you said the order amount would only be $15.95! Regards, Tom" then it might get 8/10 for Frustration and 0/10 for Happiness.
The actual list of moods isn't so important, but a short list of generally positive vs generally negative moods would be useful.
Thanks in advance!
--Trindaz on Fedang #NLP
You can do this with a number of different NLP tools, but nothing to my knowledge comes with it ready out of the box. Perhaps the easiest place to start would be with LingPipe (java), and you can use their very good sentiment analysis tutorial. You could also use NLTK if python is more your bent. There are some good blog posts over at Streamhacker that describe how you would use Naive Bayes to implement that.
Check out AlchemyAPI for sentiment analysis tools and scikit-learn or any other open machine learning library for the classifier.
if you have not decided to code the implementation, you can also have the data classified by some other tool. google prediction api may be an alternative.
Either way, you will need some labeled data and do the preprocessing. But if you use a tool that may help you get better accuracy easily.
Can anyone tell me what feature generators are with respect to natural language processors?
If I'm reading this correctly, I believe "feature generation" in this quote is referring to the process of extracting features from your text. Without going into too much detail this is basically getting the dimensions of your data you think would be useful for your prediction/classification task and putting it into a vector representation.
For example, suppose we were trying to create a classifier to determine if an e-mail was spam. We might extract features such as CONTAINS_WORD_NIGERIA or IS_FROM_PERSON_IN_CONTACT_LIST. Or if we were to follow the quote above we might make specialized features using the html tags such as PERCENT_OF_WORDS_IN_HREF_TAG. As you might imagine, you can go overboard when feature engineering, and the real challenge lies is in optimizing your feature set to give you good results on unseen data.
I'm writing an app to help facilitate some research, and part of this involves doing some statistical calculations. Right now, the researchers are using a program called SPSS. Part of the output that they care about looks like this:
They're really only concerned about the F and Sig. values. My problem is that I have no background in statistics, and I can't figure out what the tests are called, or how to calculate them.
I thought the F value might be the result of the F-test, but after following the steps given on Wikipedia, I got a result that was different from what SPSS gives.
This website might help you out a bit more. Also this one.
I'm working from a fairly rusty memory of a statistics course, but here goes nothing:
When you're doing analysis of variance (ANOVA), you actually calculate the F statistic as the ratio from the mean-square variances "between the groups" and the mean-square variances "within the groups". The second link above seems pretty good for this calculation.
This makes the F statistic measure exactly how powerful your model is, because the "between the groups" variance is explanatory power, and "within the groups" variance is random error. High F implies a highly significant model.
As in many statistical operations, you back-determine Sig. using the F statistic. Here's where your Wikipedia information comes in slightly handy. What you want to do is - using the degrees of freedom given to you by SPSS - find the proper P value at which an F table will give you the F statistic you calculated. The P value where this happens [F(table) = F(calculated)] is the significance.
Conceptually, a lower significance value shows a very strong ability to reject the null hypothesis (which for these purposes means to determine your model has explanatory power).
Sorry to any math folks if any of this is wrong. I'll be checking back to make edits!!!
Good luck to you. Stats is fun, just maybe not this part. =)
I assume from your question that your research colleagues want to automate the process by which certain statistical analyses are performed (i.e., they want to batch process data sets). You have two options:
1) SPSS is now scriptable through python (as of version 15) - go to spss.com and search for python. You can write python scripts to automate data analyses and extract key values from pivot tables, and then process the answers any way you like. This has the virtue of allowing an exact comparison between the results from your python script and the hand-calculated efforts in SPSS of your collaborators. Thus you won't have to really know any statistics to do this work (which is a key advantage)
2) You could do this in R, a free statistics environment, which could probably be scripted. This has the disadvantage that you will have to learn statistics to ensure that you are doing it correctly.
Statistics is hard :-). After a year of reading and re-reading books and papers and can only say with confidence that I understand the very basics of it.
You might wish to investigate ready-made libraries for whichever programming language you are using, because they are many gotcha's in math in general and statistics in particular (rounding errors being an obvious example).
As an example you could take a look at the R project, which is both an interactive environment and a library which you can use from your C++ code, distributed under the GPL (ie if you are using it only internally and publishing only the results, you don't need to open your code).
In short: don't do this by hand, link/use existing software. And sain_grocen's answer is incorrect. :(
These are all tests for significance of parameter estimates that are typically used in Multivariate response Multiple Regressions. These would not be simple things to do outside of a statistical programming environment. I would suggest either getting the output from a pre-existing statistical program, or using one that you can link to and use that code.
I'm afraid that the first answer (sain_grocen's) will lead you down the wrong path. His explanation is likely of a special case of what you are actually dealing with. The anova explained in his links is for a single variate response, in a balanced design. These aren't the F statistics you are seeing. The names in your output (Pillai's Trace, Hotelling's Trace,...) are some of the available multivariate versions. They have F distributions under certain assumptions. I can't explain a text books worth of material here, I would advise you to start by looking at
"Applied Multivariate Statistical Analysis" by Johnson and Wichern
Can you explain more why SPSS itself isn't a fine solution to the problem? Is it that it generates pivot tables as output that are hard to manipulate? Is it the cost of the program?
F-statistics can arise from any number of particular tests. The F is just a distribution (loosely: a description of the "frequencies" of groups of values), like a Normal (Gaussian), or Uniform. In general they arise from ratios of variances. Opinion: many statisticians (myself included), find F-based tests to be unstable (jargon: non-robust).
The particular output statistics (Pillai's trace, etc.) suggest that the original analysis is a MANOVA example, which as other posters describe is a complicated, and hard to get right procedure.
I'm guess also that, based on the MANOVA, and the use of SPSS, this is a psychology or sociology project... if not please enlighten. It might be that other, simpler models might actually be easier to understand and more repeatable. Consult your local university statistical consulting group, if you have one.
Good luck!
Here's an explanation of MANOVA ouptput, from a very good site on statistics and on SPSS:
Output with explanation:
http://faculty.chass.ncsu.edu/garson/PA765/manospss.htm
How and why to do MANOVA or multivariate GLM:
(same path as above, but terminating in '/manova.htm')
Writing software from scratch to calculate these outputs would be both lengthy and difficult;
there's lots of numerical problems and matrix inversions to do.
As Henry said, use Python scripts, or R. I'd suggest working with somebody who knows SPSS if scripting.
In addition, SPSS itself is capable of exporting the output tables to files using something called OMS.
A script within SPSS can do this.
Find out who in your research group knows SPSS and work with them.