Have a Strategy that does not uniformly choose between different strategies - python-hypothesis

I'd like to create a strategy C that, 90% of the time chooses strategy A, and 10% of the time chooses strategy B.
The random python library does not work even if I seed it since each time the strategy produces values, it generates the same value from random.
I looked at the implementation for OneOfStrategy and they use
i = cu.integer_range(data, 0, n - 1)
to randomly generate a number
cu is from the internals
import hypothesis.internal.conjecture.utils as cu
Would it be fine for my strategy to use cu.integer_range or is there another implementation?

Hypothesis does not allow users to control the probability of various choices within a strategy. You should not use undocumented interfaces either - hypothesis.internal is for internal use only and could break at any time!
I strongly recommend using C = st.one_of(A, B) and trusting Hypothesis with the details.

Related

How to extract at runtime the AST or the bytecode or anything that uniquelly, deterministically and universally identifies a function* in Haskell?

I have managed to do it in Python, since the interpreter provide the bytecodes, as shown below.
From the bytecodes it was easy to apply a hashing function.
import dis
from inspect import signature
# Add signature.
fargs = list(signature(f).parameters.keys())
if not fargs:
raise Exception(f"Missing function input parameters.")
clean = [fargs]
# Clean line numbers.
groups = [l for l in dis.Bytecode(f).dis().split("\n\n") if l]
for group in groups:
lines = [segment for segment in group.split(" ") if segment][1:]
clean.append(lines)
How could I extract a similar kind of deterministic universally unique ยน function identifier in Haskell?
Are there built-in libraries or GHC extensions for that?
1 - unique, probabilistically speaking
* - A function not in the math sense, since the possibilities to check programatically the equivalence of programs are very limited. So I am only interested in what is possible and costless attainable, i.e., generate a unique identity for functions with the exact same code (or bytecode, to avoid formatting noise). In my specific use case, even the signatures and parameter names should match, but that is not relevant for the point here, which is about how to inspect Haskell code at runtime.
Within a single run of the program, you might consider StableName. You can make a StableName for any value (including a function/closure), get a hash value for a StableName, and compare two StableNames for equality. If two things have the same StableName, then they're definitely the same. If they have the same StableName hash, then they're probably the same. Note that if you just need to be able to check equality and don't need the hashes, one option is reallyUnsafePtrEquality#, which just compares any two values by address.

How to define a strategy in hypothesis to generate a pair of similar recursive objects

I am new to hypothesis and I am looking for a way to generate a pair of similar recursive objects.
My strategy for a single object is similar to this example in the hypothesis documentation.
I want to test a function which takes a pair of recursive objects A and B and the side effect of this function should be that A==B.
My first approach would be to write a test which gets two independent objects, like:
#given(my_objects(), my_objects())
def test_is_equal(a, b):
my_function(a, b)
assert a == b
But the downside is that hypothesis does not know that there is a dependency between this two objects and so they can be completely different. That is a valid test and I want to test that too.
But I also want to test complex recursive objects which are only slightly different.
And maybe that hypothesis is able to shrink a pair of very different objects where the test fails to a pair of only slightly different objects where the test fails in the same way.
This one is tricky - to be honest I'd start by writing the same test you already have above, and just turn up the max_examples setting a long way. Then I'd probably write some traditional unit tests, because getting specific data distributions out of Hypothesis is explicitly unsupported (i.e. we try to break everything that assumes a particular distribution, using some combination of heuristics and a bit of feedback).
How would I actually generate similar recursive structures though? I'd use a #composite strategy to build them at the same time, and for each element or subtree I'd draw a boolean and if True draw a different element or subtree to use in the second object. Note that this will give you a strategy for a tuple of two objects and you'll need to unpack it inside the test; that's unavoidable if you want them to be related.
Seriously do try just cracking up max_examples on the naive approach first though, running Hypothesis for ~an hour is amazingly effective and I would even expect it to shrink the output fairly well.

CLPFD for real numbers

CLP(FD) allows user to set the domain for every wannabe-integer variable, so it's able to solve equations.
So far so good.
However you can't do the same in CLP(R) or similar languages (where you can do only simple inferences). And it's not hard to understand why: the fractional part of a number may have an almost infinite region, putted down by an implementation limit. This mean the search space will be too large to make any practical use for a solver which deals with floats like with integers. So it's the user task to write generator in CLP(R) and set constraint guards where needed to get variables instantiated with numbers (if simple inference is not possible).
So my question here: is there any CLP(FD)-like language over reals? I think it could be implemented by use of number rounding, searching and following incremental approximation.
There are at least some major CLP(FD) solvers that support real (decision) variables:
Gecode
JaCoP
ECLiPSe CLP (ic library)
Choco (using Ibex)
(The first three also support var float in MiniZinc.)
The answser to your question is yes. There is Constraint-based Solvers dedicated for floating numbers. I do not have a list of solvers but I know that that ibex http://www.ibex-lib.org is a library allowing the use of floats. You should also have a look at SMT-Solvers implementing the Real-Theory (http://smtlib.cs.uiowa.edu/solvers.shtml).

Can I repeatedly create & destroy random generator/distributor objects and destroy them (with a 'dice' class)?

I'm trying out the random number generation from the new library in C++11 for a simple dice class. I'm not really grasping what actually happens but the reference shows an easy example:
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(1,6);
int dice_roll = distribution(generator);
I read somewhere that with the "old" way you should only seed once (e.g. in the main function) in your application ideally. However I'd like an easily reusable dice class. So would it be okay to use this code block in a dice::roll() method although multiple dice objects are instantiated and destroyed multiple times in an application?
Currently I made the generator as a class member and the last two lines are in the dice:roll() methods. It looks okay but before I compute statistics I thought I'd ask here...
Think of instantiating a pseudo-random number generator (PRNG) as digging a well - it's the overhead you have to go through to be able to get access to water. Generating instances of a pseudo-random number is like dipping into the well. Most people wouldn't dig a new well every time they want a drink of water, why invoke the unnecessary overhead of multiple instantiations to get additional pseudo-random numbers?
Beyond the unnecessary overhead, there's a statistical risk. The underlying implementations of PRNGs are deterministic functions that update some internally maintained state to generate the next value. The functions are very carefully crafted to give a sequence of uncorrelated (but not independent!) values. However, if the state of two or more PRNGs is initialized identically via seeding, they will produce the exact same sequences. If the seeding is based on the clock (a common default), PRNGs initialized within the same tick of the clock will produce identical results. If your statistical results have independence as a requirement then you're hosed.
Unless you really know what you're doing and are trying to use correlation induction strategies for variance reduction, best practice is to use a single instantiation of a PRNG and keep going back to it for additional values.

What's the most idiomatic approach to multi-index collections in Haskell?

In C++ and other languages, add-on libraries implement a multi-index container, e.g. Boost.Multiindex. That is, a collection that stores one type of value but maintains multiple different indices over those values. These indices provide for different access methods and sorting behaviors, e.g. map, multimap, set, multiset, array, etc. Run-time complexity of the multi-index container is generally the sum of the individual indices' complexities.
Is there an equivalent for Haskell or do people compose their own? Specifically, what is the most idiomatic way to implement a collection of type T with both a set-type of index (T is an instance of Ord) as well as a map-type of index (assume that a key value of type K could be provided for each T, either explicitly or via a function T -> K)?
I just uploaded IxSet to hackage this morning,
http://hackage.haskell.org/package/ixset
ixset provides sets which have multiple indexes.
ixset has been around for a long time as happstack-ixset. This version removes the dependencies on anything happstack specific, and is the new official version of IxSet.
Another option would be kdtree:
darcs get http://darcs.monoid.at/kdtree
kdtree aims to improve on IxSet by offering greater type-safety and better time and space usage. The current version seems to do well on all three of those aspects -- but it is not yet ready for prime time. Additional contributors would be highly welcomed.
In the trivial case where every element has a unique key that's always available, you can just use a Map and extract the key to look up an element. In the slightly less trivial case where each value merely has a key available, a simple solution it would be something like Map K (Set T). Looking up an element directly would then involve first extracting the key, indexing the Map to find the set of elements that share that key, then looking up the one you want.
For the most part, if something can be done straightforwardly in the above fashion (simple transformation and nesting), it probably makes sense to do it that way. However, none of this generalizes well to, e.g., multiple independent keys or keys that may not be available, for obvious reasons.
Beyond that, I'm not aware of any widely-used standard implementations. Some examples do exist, for example IxSet from happstack seems to roughly fit the bill. I suspect one-size-kinda-fits-most solutions here are liable to have a poor benefit/complexity ratio, so people tend to just roll their own to suit specific needs.
Intuitively, this seems like a problem that might work better not as a single implementation, but rather a collection of primitives that could be composed more flexibly than Data.Map allows, to create ad-hoc specialized structures. But that's not really helpful for short-term needs.
For this specific question, you can use a Bimap. In general, though, I'm not aware of any common class for multimaps or multiply-indexed containers.
I believe that the simplest way to do this is simply with Data.Map. Although it is designed to use single indices, when you insert the same element multiple times, most compilers (certainly GHC) will make the values place to the same place. A separate implementation of a multimap wouldn't be that efficient, as you want to find elements based on their index, so you cannot naively associate each element with multiple indices - say [([key], value)] - as this would be very inefficient.
However, I have not looked at the Boost implementations of Multimaps to see, definitively, if there is an optimized way of doing so.
Have I got the problem straight? Both T and K have an order. There is a function key :: T -> K but it is not order-preserving. It is desired to manage a collection of Ts, indexed (for rapid access) both by the T order and the K order. More generally, one might want a collection of T elements indexed by a bunch of orders key1 :: T -> K1, .. keyn :: T -> Kn, and it so happens that here key1 = id. Is that the picture?
I think I agree with gereeter's suggestion that the basis for a solution is just to maintiain in sync a bunch of (Map K1 T, .. Map Kn T). Inserting a key-value pair in a map duplicates neither the key nor the value, allocating only the extra heap required to make a new entry in the right place in the index. Inserting the same value, suitably keyed, in multiple indices should not break sharing (even if one of the keys is the value). It is worth wrapping the structure in an API which ensures that any subsequent modifications to the value are computed once and shared, rather than recomputed for each entry in an index.
Bottom line: it should be possible to maintain multiple maps, ensuring that the values are shared, even though the key-orders are separate.

Resources