How to extract at runtime the AST or the bytecode or anything that uniquelly, deterministically and universally identifies a function* in Haskell?

How to extract at runtime the AST or the bytecode or anything that uniquelly, deterministically and universally identifies a function* in Haskell? - haskell

I have managed to do it in Python, since the interpreter provide the bytecodes, as shown below.
From the bytecodes it was easy to apply a hashing function.
import dis
from inspect import signature
# Add signature.
fargs = list(signature(f).parameters.keys())
if not fargs:
raise Exception(f"Missing function input parameters.")
clean = [fargs]
# Clean line numbers.
groups = [l for l in dis.Bytecode(f).dis().split("\n\n") if l]
for group in groups:
lines = [segment for segment in group.split(" ") if segment][1:]
clean.append(lines)
How could I extract a similar kind of deterministic universally unique ¹ function identifier in Haskell?
Are there built-in libraries or GHC extensions for that?
1 - unique, probabilistically speaking
* - A function not in the math sense, since the possibilities to check programatically the equivalence of programs are very limited. So I am only interested in what is possible and costless attainable, i.e., generate a unique identity for functions with the exact same code (or bytecode, to avoid formatting noise). In my specific use case, even the signatures and parameter names should match, but that is not relevant for the point here, which is about how to inspect Haskell code at runtime.

Within a single run of the program, you might consider StableName. You can make a StableName for any value (including a function/closure), get a hash value for a StableName, and compare two StableNames for equality. If two things have the same StableName, then they're definitely the same. If they have the same StableName hash, then they're probably the same. Note that if you just need to be able to check equality and don't need the hashes, one option is reallyUnsafePtrEquality#, which just compares any two values by address.

Related

How to make a TypedDict with integer keys?

Is it possible to use an integer key with TypedDict (similar to dict?).
Trying a simple example:
from typing import TypedDict
class Moves(TypedDict):
0: int=1
1: int=2
Throws: SyntaxError: illegal target for annotation
It seems as though only Mapping[str, int] is supported but I wanted to confirm. It wasn't specifically stated in the Pep docs.

The intent of TypedDict is explicit in the PEP's abstract (emphasis added):
This PEP proposes a type constructor typing.TypedDict to support the use case where a dictionary object has a specific set of string keys, each with a value of a specific type.
and given the intended use cases are all annotatable in class syntax, implicitly applies only to dicts keyed by strings that constitute valid identifiers (things you could use as attribute or keyword argument names), not even strings in general. So as intended, int keys aren't a thing, this is just for enabling a class that uses dict-like syntax to access the "attributes" rather than attribute access syntax.
While the alternative, backwards compatible syntax, allowed for compatibility with pre-3.6 Python, allows this (as well as allowing strings that aren't valid Python identifiers), e.g.:
Moves = TypedDict('Moves', {0: int, 1: int})
you could only construct it with dict literals (e.g. Moves({0: 123, 1: 456})) because the cleaner keyword syntax like Moves(0=123, 1=456) doesn't work. And even though that technically works at runtime (it's all just dicts under the hood after all), the actual type-checkers that validate your type correctness may not support it (because the intent and documented use exclusively handles strings that constitute valid identifiers).
Point is, don't do this. For the simple case you're describing here (consecutive integer integer "keys" starting from zero, where each position has independent meaning, where they may or may not differ by type), you really just want a tuple anyway:
Moves = typing.Tuple[int, int] # Could be [int, str] if index 1 should be a string
would be used for annotations the same way, and your actual point of use in the code would just be normal tuple syntax (return 1, 2).
If you really want to be able to use the name Moves when creating instances, on 3.9+ you could use PEP 585 to do (no import required):
Moves = tuple[int, int]
allowing you to write:
return Moves((1, 2))
when you want to make an "instance" of it. No runtime checking is involved (it's roughly equivalent to running tuple((1, 2)) at runtime), but static type-checkers should understand the intent.

Is there a way to use the namelist I/O feature to read in a derived type with allocatable components?

Is there a way to use the namelist I/O feature to read in a derived type with allocatable components?
The only thing I've been able to find about it is https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/269585 which ended on an fairly unhelpful note.
Edit:
I have user-defined derived types that need to get filled with information from an input file. So, I'm trying to find a convenient way of doing that. Namelist seems like a good route because it is so succinct (basically two lines). One to create the namelist and then a namelist read. Namelist also seems like a good choice because in the text file it forces you to very clearly show where each value goes which I find highly preferable to just having a list of values that the compiler knows the exact order of. This makes it much more work if I or anyone else needs to know which value corresponds to which variable, and much more work to keep clean when inevitably a new value is needed.
I'm trying to do something of the basic form:
!where myType_T is a type that has at least one allocatable array in it
type(myType_T) :: thing
namelist /nmlThing/ thing
open(1, file"input.txt")
read(1, nml=nmlThing)
I may be misunderstanding user-defined I/O procedures, but they don't seem to be a very generic solution. It seems like I would need to write a new one any time I need to do this action, and they don't seem to natively support the
&nmlThing
thing%name = "thing1"
thing%siblings(1) = "thing2"
thing%siblings(2) = "thing3"
thing%siblings(3) = "thing4"
!siblings is an allocatable array
/
syntax that I find desirable.
There are a few solutions I've found to this problem, but none seem to be very succinct or elegant. Currently, I have a dummy user-defined type that has arrays that are way large instead of allocatable and then I write a function to copy the information from the dummy namelist friendly type to the allocatable field containing type. It works just fine, but it is ugly and I'm up to about 4 places were I need to do this same type of operation in the code.
Hence trying to find a good solution.

If you want to use allocatable components, then you need to have an accessible generic interface for a user defined derived type input/output procedure (typically by the type having a generic binding for such a procedure). You link to a thread with an example with such a procedure.
Once invoked, that user defined derived type input/output procedure is then responsible for reading and writing the data. That can include invoking namelist input/output on the components of the derived type.
Fortran 2003 also offers derived types with length parameters. These may offer a solution without the need for a user defined derived type input/output procedure. However, use of derived types with length parameters, in combination with namelist, will put you firmly in the "highly experimental" category with respect to the current compiler implementation.

Manipulating/Clearing Variables via Lists: Mathematica

My problem (in Mathematica) is referring to variables given in a particular array and manipulating them in the following manner (as an example):
Inputs: vars={x,y,z}, system=some ODE like x^2+3*x*y+...etc
(note that I haven't actually created variables x y and z)
Aim:
To assign values to the variables in the list "var" with the intention of inputting these values into the system of ODEs. Then, once I am done, clear the values of the variables in the array vars so that it is in its original form {x,y,z} (and not something like {x,1,3} where y=1 and z=3). I want to do this by referring to the positional elements of vars (I aim not to know that x, y and z are the actual variables).
The reason why: I am trying to write a program that can have any number of variables and ODEs as defined by the user. Since the number of variables and the actual letters used for them are unknown, it is necessary to perform manipulations with the array itself.
Attempt:
A fixed number of variables is easy. For the arbitrary case, I have tried modules and blocks, but with no success. Consider the following code:
Clear[x,y,z,vars,svars]
vars={x,y,z}
svars=Map[ToString,vars]
Module[{vars=vars,svars=svars},
Symbol[svars[[1]]]//Evaluate=1
]
then vars={1,y,z} and not {x,y,z} after running this. I have done functional programming with lists, atoms etc. Thus is makes sense to me that vars is changed afterwards, because I have changed x and not vars. However, I cannot get "x" in the list of variables to remain local. Of course I could put in "x" itself, but that is particular to this specific case. I would prefer to put something like:
Clear[x,y,z,vars,svars]
vars={x,y,z}
svars=Map[ToString,vars]
Module[{vars=vars,svars=svars, vars[[1]]},
Symbol[svars[[1]]]//Evaluate=1
]
which of course doesn't work because vars[[1]] is not a symbol or an assignment to a symbol.
Other possibilities:
I found a function
assignToName[name_String, value_] :=
ToExpression[name, InputForm, Function[var, var = value, HoldAll]]
which looked promising. Basically name_String is the name of the variable and value is its new value. I attempted to do:
vars={x,y,z}
svars=Map[ToString,vars]
vars[[1]]=//Evaluate=1
assignToName[svars[[1]],svars[[1]]]
but then something likeD[x^2, vars[[1]]] doesn't work (x is not a valid variable).
If I am missing something, or if perhaps I am going down the wrong path, I'm open to trying other things.
Thanks.

I can't say that I followed your train(s) of thought very well, so these are fragments which might help you to answer your own questions than a coherent and fully-formed answer. But to answer your final 'question', I think you may be going down some wrong path(s).
In passing, note that evaluating the expression
vars = {x,y,z}
does in fact define those three variables though it doesn't define any rewrite rules (such as values) for them.
Given a polynomial poly you can extract the variables in it with the function Variables[poly] so something like
Variables[x^2+3*x*y]
should return
{x,y}
Note that I write 'should' rather than does because I don't have Mathematica on this machine so my syntax may be a bit wonky. Note also that your example ODE is nothing of the sort but it strikes me that you can probably write a wrapper to manipulate an ODE into a form from which Variables can extract the variables. Mathematica offers a lot of other functions for picking expressions apart and re-assembling them, follow the trails from Variables. It often allows the use of functions defined on Lists on expressions with other heads too so it's always worth experimenting a bit.
There are a couple of widely applicable ways to avoid setting values of variables in Mathematica. For instance, you could write
x^2+3*x*y/.{x->2,y->3}
which will evaluate to
22
but not set values for x and y. This is a very simple example of using (sets of) replacement rules for temporary assignment of values to variables
The other way to avoid setting values for variables is to define functions using Modules or Blocks both of which define their own contexts. The documentation will tell you all about these two and the differences between them.
I can't help thinking that all your clever tricks using Symbol, ToExpression and ToString are a bit beside the point. Spend some time familiarising yourself with Mathematica's in-built functionality before going further down that route, you may well find you don't need to.
Finally, writing, in any language, expressions such as
vars=vars,svars=svars
will lead to madness. It may be syntactically correct, you may even be able to decrypt the semantics when you first write code like that, but in a week's time you will curse your younger self for writing it.

How to find the optimal processing order?

I have an interesting question, but I'm not sure exactly how to phrase it...
Consider the lambda calculus. For a given lambda expression, there are several possible reduction orders. But some of these don't terminate, while others do.
In the lambda calculus, it turns out that there is one particular reduction order which is guaranteed to always terminate with an irreducible solution if one actually exists. It's called Normal Order.
I've written a simple logic solver. But the trouble is, the order in which it processes the constraints seems to have a huge effect on whether it finds any solutions or not. Basically, I'm wondering whether something like a normal order exists for my logic programming language. (Or wether it's impossible for a mere machine to deterministically solve this problem.)
So that's what I'm after. Presumably the answer critically depends on exactly what the "simple logic solver" is. So I will attempt to briefly describe it.
My program is closely based on the system of combinators in chapter 9 of The Fun of Programming (Jeremy Gibbons & Oege de Moor). The language has the following structure:
The input to the solver is a single predicate. Predicates may involve variables. The output from the solver is zero or more solutions. A solution is a set of variable assignments which make the predicate become true.
Variables hold expressions. An expression is an integer, a variable name, or a tuple of subexpressions.
There is an equality predicate, which compares expressions (not predicates) for equality. It is satisfied if substituting every (bound) variable with its value makes the two expressions identical. (In particular, every variable equals itself, bound or not.) This predicate is solved using unification.
There are also operators for AND and OR, which work in the obvious way. There is no NOT operator.
There is an "exists" operator, which essentially creates local variables.
The facility to define named predicates enables recursive looping.
One of the "interesting things" about logic programming is that once you write a named predicate, it typically works fowards and backwards (and sometimes even sideways). Canonical example: A predicate to concatinate two lists can also be used to split a list into all possible pairs.
But sometimes running a predicate backwards results in an infinite search, unless you rearrange the order of the terms. (E.g., swap the LHS and RHS of an AND or an OR somehwere.) I'm wondering whether there's some automated way to detect the best order to run the predicates in, to ensure prompt termination in all cases where the solution set is exactually finite.
Any suggestions?

Relevant paper, I think: http://www.cs.technion.ac.il/~shaulm/papers/abstracts/Ledeniov-1998-DCS.html
Also take a look at this: http://en.wikipedia.org/wiki/Constraint_logic_programming#Bottom-up_evaluation

What's the most idiomatic approach to multi-index collections in Haskell?

In C++ and other languages, add-on libraries implement a multi-index container, e.g. Boost.Multiindex. That is, a collection that stores one type of value but maintains multiple different indices over those values. These indices provide for different access methods and sorting behaviors, e.g. map, multimap, set, multiset, array, etc. Run-time complexity of the multi-index container is generally the sum of the individual indices' complexities.
Is there an equivalent for Haskell or do people compose their own? Specifically, what is the most idiomatic way to implement a collection of type T with both a set-type of index (T is an instance of Ord) as well as a map-type of index (assume that a key value of type K could be provided for each T, either explicitly or via a function T -> K)?

I just uploaded IxSet to hackage this morning,
http://hackage.haskell.org/package/ixset
ixset provides sets which have multiple indexes.
ixset has been around for a long time as happstack-ixset. This version removes the dependencies on anything happstack specific, and is the new official version of IxSet.
Another option would be kdtree:
darcs get http://darcs.monoid.at/kdtree
kdtree aims to improve on IxSet by offering greater type-safety and better time and space usage. The current version seems to do well on all three of those aspects -- but it is not yet ready for prime time. Additional contributors would be highly welcomed.

In the trivial case where every element has a unique key that's always available, you can just use a Map and extract the key to look up an element. In the slightly less trivial case where each value merely has a key available, a simple solution it would be something like Map K (Set T). Looking up an element directly would then involve first extracting the key, indexing the Map to find the set of elements that share that key, then looking up the one you want.
For the most part, if something can be done straightforwardly in the above fashion (simple transformation and nesting), it probably makes sense to do it that way. However, none of this generalizes well to, e.g., multiple independent keys or keys that may not be available, for obvious reasons.
Beyond that, I'm not aware of any widely-used standard implementations. Some examples do exist, for example IxSet from happstack seems to roughly fit the bill. I suspect one-size-kinda-fits-most solutions here are liable to have a poor benefit/complexity ratio, so people tend to just roll their own to suit specific needs.
Intuitively, this seems like a problem that might work better not as a single implementation, but rather a collection of primitives that could be composed more flexibly than Data.Map allows, to create ad-hoc specialized structures. But that's not really helpful for short-term needs.

For this specific question, you can use a Bimap. In general, though, I'm not aware of any common class for multimaps or multiply-indexed containers.

I believe that the simplest way to do this is simply with Data.Map. Although it is designed to use single indices, when you insert the same element multiple times, most compilers (certainly GHC) will make the values place to the same place. A separate implementation of a multimap wouldn't be that efficient, as you want to find elements based on their index, so you cannot naively associate each element with multiple indices - say [([key], value)] - as this would be very inefficient.
However, I have not looked at the Boost implementations of Multimaps to see, definitively, if there is an optimized way of doing so.

Have I got the problem straight? Both T and K have an order. There is a function key :: T -> K but it is not order-preserving. It is desired to manage a collection of Ts, indexed (for rapid access) both by the T order and the K order. More generally, one might want a collection of T elements indexed by a bunch of orders key1 :: T -> K1, .. keyn :: T -> Kn, and it so happens that here key1 = id. Is that the picture?
I think I agree with gereeter's suggestion that the basis for a solution is just to maintiain in sync a bunch of (Map K1 T, .. Map Kn T). Inserting a key-value pair in a map duplicates neither the key nor the value, allocating only the extra heap required to make a new entry in the right place in the index. Inserting the same value, suitably keyed, in multiple indices should not break sharing (even if one of the keys is the value). It is worth wrapping the structure in an API which ensures that any subsequent modifications to the value are computed once and shared, rather than recomputed for each entry in an index.
Bottom line: it should be possible to maintain multiple maps, ensuring that the values are shared, even though the key-orders are separate.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string