Source code of Groovy ASTNode - groovy

I implemented some AST transformations that are applied at compile time and for logging purposes I would like to emit "source" code that reflects injected AST nodes. The toString()/getText() methods return quite unreadable structure that quickly becomes complicated for long expressions.

The closest thing I can think of is the AstNodeToScriptAdapter which can be found here

Related

Groovy static analysis for finding variable modifications

I have a simple task which requires finding the modification of variables in a given code. This will be a static analysis. For instance, given a variable (e.g., age), I would like to create a list or tree (a data structure) that gives me what modifies this variable and preferably the function name that makes the modification (as a return) or any other auxiliary information. I start writing my script, yet I see that it's very error-prone as I need to consider many cases such as nested loops, etc.
Would you suggest me where to start?
If the code to be analyzed happens to be Groovy code then you could write an AST transformation (probably a global one) that walks the code and obtains the information you seek.
The Groovy documentation site has a section on AST Transformations, have a look at http://groovy-lang.org/metaprogramming.html#_compile_time_metaprogramming
This page describes existing AST xforms and how you can develop your own. I'd recommend browsing the code that implements the standard AST xforms such as #Immutable, #Cannonical, and others.
CodeNarc (http://codenarc.sourceforge.net/) is a static code analyzer for Groovy code, inspired by PMD. It also relies on AST xforms.
GContracts (https://github.com/andresteingress/gcontracts) is another tool implemented using AST xforms. These two can serve as a basis for understanding more about AST transformations.
OTOH if the analyzed code happens to be Java then AST transformations will not help you.

Is there a standardized way to transform functional code to imperative code?

I'm writing a small tool for generating php checks from javascript code, and I would like to know if anyone knows of a standard way of transforming functional code into imperative code?
I found this paper: Defunctionalization at Work it explains defunctionalization pretty well.
Lambdalifting and defunctionalization somewhat answered the question, but what about datastructures, we are still parsing lists as if they are all linkedlists. Would there be a way of transforming the linkedlists of functional languages into other high-level datastructures like c++ vectors or java arraylists?
Here are a few additions to the list of #Artyom:
you can convert tail recursion into loops and assignments
linear types can be used to introduce assignments, e.g. y = f x can be replaced with x := f x if x is linear and has the same type as y
at least two kinds of defunctionalization are possible: Reynolds-type defunctionalization when you replace a high-order application with a switch full of first-order applications, and inlining (however, recursive functions is not always possible to inline)
Perhaps you are interested in removing some language elements (such as higher-order functions), right?
For eliminating HOFs from a program, there are techniques such as defunctionalization. For removing closures, you can use lambda-lifting (aka closure conversion). Is this something you are interested in?
I think you need to provide a concrete example of code you have, and the target code you intend to produce, so that others may propose solutions.
Added:
Would there be a way of transforming the linkedlists of functional languages into other high-level datastructures like c++ vectors or java arraylists?
Yes. Linked lists are represented with pointers in C++ (a structure "node" with two fields: one for the "payload", another for the "next" pointer; empty list is then represented as a NULL pointer, but sometimes people prefer to use special "sentinel values"). Note that, if the code in the source language does not rely on the representation of singly linked lists (in the source language implementation), you can also implement the "cons"/"nil" operations using a vector in the target language (not sure if this suits your needs, though). The idea here is to give an alternative implementations for the familiar operations.
No, there is not.
The reason is that there is no such concrete and well defined thing like functional code or imperative code.
Such transformations exist only for concrete instances of your abstraction: for example, there are transformations from Haskell code to LLVM bytecode, F# code to CLI bytecode or Frege code to Java code.
(I doubt if there is one from Javascript to PHP.)
Depends on what you need. The usual answer is "there is no such tool", because the result will not be usable. However look at this from this standpoint:
The set of Assembler instructions in a computer defines an imperative machine. Hence the compiler needs to do such a translation. However I assume you do not want to have assembler code but something more readable.
Usually these kinds of heavy program transformations are done manually, if one is interested in the result, or automatically if the result will never be looked at by a human.

Is there an object-identity-based, thread-safe memoization library somewhere?

I know that memoization seems to be a perennial topic here on the haskell tag on stack overflow, but I think this question has not been asked before.
I'm aware of several different 'off the shelf' memoization libraries for Haskell:
The memo-combinators and memotrie packages, which make use of a beautiful trick involving lazy infinite data structures to achieve memoization in a purely functional way. (As I understand it, the former is slightly more flexible, while the latter is easier to use in simple cases: see this SO answer for discussion.)
The uglymemo package, which uses unsafePerformIO internally but still presents a referentially transparent interface. The use of unsafePerformIO internally results in better performance than the previous two packages. (Off the shelf, its implementation uses comparison-based search data structures, rather than perhaps-slightly-more-efficient hash functions; but I think that if you find and replace Cmp for Hashable and Data.Map for Data.HashMap and add the appropraite imports, you get a hash based version.)
However, I'm not aware of any library that looks answers up based on object identity rather than object value. This can be important, because sometimes the kinds of object which are being used as keys to your memo table (that is, as input to the function being memoized) can be large---so large that fully examining the object to determine whether you've seen it before is itself a slow operation. Slow, and also unnecessary, if you will be applying the memoized function again and again to an object which is stored at a given 'location in memory' 1. (This might happen, for example, if we're memoizing a function which is being called recursively over some large data structure with a lot of structural sharing.) If we've already computed our memoized function on that exact object before, we can already know the answer, even without looking at the object itself!
Implementing such a memoization library involves several subtle issues and doing it properly requires several special pieces of support from the language. Luckily, GHC provides all the special features that we need, and there is a paper by Peyton-Jones, Marlow and Elliott which basically worries about most of these issues for you, explaining how to build a solid implementation. They don't provide all details, but they get close.
The one detail which I can see which one probably ought to worry about, but which they don't worry about, is thread safety---their code is apparently not threadsafe at all.
My question is: does anyone know of a packaged library which does the kind of memoization discussed in the Peyton-Jones, Marlow and Elliott paper, filling in all the details (and preferably filling in proper thread-safety as well)?
Failing that, I guess I will have to code it up myself: does anyone have any ideas of other subtleties (beyond thread safety and the ones discussed in the paper) which the implementer of such a library would do well to bear in mind?
UPDATE
Following #luqui's suggestion below, here's a little more data on the exact problem I face. Let's suppose there's a type:
data Node = Node [Node] [Annotation]
This type can be used to represent a simple kind of rooted DAG in memory, where Nodes are DAG Nodes, the root is just a distinguished Node, and each node is annotated with some Annotations whose internal structure, I think, need not concern us (but if it matters, just ask and I'll be more specific.) If used in this way, note that there may well be significant structural sharing between Nodes in memory---there may be exponentially more paths which lead from the root to a node than there are nodes themselves. I am given a data structure of this form, from an external library with which I must interface; I cannot change the data type.
I have a function
myTransform : Node -> Node
the details of which need not concern us (or at least I think so; but again I can be more specific if needed). It maps nodes to nodes, examining the annotations of the node it is given, and the annotations its immediate children, to come up with a new Node with the same children but possibly different annotations. I wish to write a function
recursiveTransform : Node -> Node
whose output 'looks the same' as the data structure as you would get by doing:
recursiveTransform Node originalChildren annotations =
myTransform Node recursivelyTransformedChildren annotations
where
recursivelyTransformedChildren = map recursiveTransform originalChildren
except that it uses structural sharing in the obvious way so that it doesn't return an exponential data structure, but rather one on the order of the same size as its input.
I appreciate that this would all be easier if say, the Nodes were numbered before I got them, or I could otherwise change the definition of a Node. I can't (easily) do either of these things.
I am also interested in the general question of the existence of a library implementing the functionality I mention quite independently of the particular concrete problem I face right now: I feel like I've had to work around this kind of issue on a few occasions, and it would be nice to slay the dragon once and for all. The fact that SPJ et al felt that it was worth adding not one but three features to GHC to support the existence of libraries of this form suggests that the feature is genuinely useful and can't be worked around in all cases. (BUT I'd still also be very interested in hearing about workarounds which will help in this particular case too: the long term problem is not as urgent as the problem I face right now :-) )
1 Technically, I don't quite mean location in memory, since the garbage collector sometimes moves objects around a bit---what I really mean is 'object identity'. But we can think of this as being roughly the same as our intuitive idea of location in memory.
If you only want to memoize based on object identity, and not equality, you can just use the existing laziness mechanisms built into the language.
For example, if you have a data structure like this
data Foo = Foo { ... }
expensive :: Foo -> Bar
then you can just add the value to be memoized as an extra field and let the laziness take care of the rest for you.
data Foo = Foo { ..., memo :: Bar }
To make it easier to use, add a smart constructor to tie the knot.
makeFoo ... = let foo = Foo { ..., memo = expensive foo } in foo
Though this is somewhat less elegant than using a library, and requires modification of the data type to really be useful, it's a very simple technique and all thread-safety issues are already taken care of for you.
It seems that stable-memo would be just what you needed (although I'm not sure if it can handle multiple threads):
Whereas most memo combinators memoize based on equality, stable-memo does it based on whether the exact same argument has been passed to the function before (that is, is the same argument in memory).
stable-memo only evaluates keys to WHNF.
This can be more suitable for recursive functions over graphs with cycles.
stable-memo doesn't retain the keys it has seen so far, which allows them to be garbage collected if they will no longer be used. Finalizers are put in place to remove the corresponding entries from the memo table if this happens.
Data.StableMemo.Weak provides an alternative set of combinators that also avoid retaining the results of the function, only reusing results if they have not yet been garbage collected.
There is no type class constraint on the function's argument.
stable-memo will not work for arguments which happen to have the same value but are not the same heap object. This rules out many candidates for memoization, such as the most common example, the naive Fibonacci implementation whose domain is machine Ints; it can still be made to work for some domains, though, such as the lazy naturals.
Ekmett just uploaded a library that handles this and more (produced at HacPhi): http://hackage.haskell.org/package/intern. He assures me that it is thread safe.
Edit: Actually, strictly speaking I realize this does something rather different. But I think you can use it for your purposes. It's really more of a stringtable-atom type interning library that works over arbitrary data structures (including recursive ones). It uses WeakPtrs internally to maintain the table. However, it uses Ints to index the values to avoid structural equality checks, which means packing them into the data type, when what you want are apparently actually StableNames. So I realize this answers a related question, but requires modifying your data type, which you want to avoid...

What programming languages will let me manipulate the sequence of instructions in a method?

I have an upcoming project in which a core requirement will be to mutate the way a method works at runtime. Note that I'm not talking about a higher level OO concept like "shadow one method with another", although the practical effect would be similar.
The key properties I'm after are:
I must be able to modify the method in such a way that I can add new expressions, remove existing expressions, or modify any of the expressions that take place in it.
After modifying the method, subsequent calls to that method would invoke the new sequence of operations. (Or, if the language binds methods rather than evaluating every single time, provide me a way to unbind/rebind the new method.)
Ideally, I would like to manipulate the atomic units of the language (e.g., "invoke method foo on object bar") and not the assembly directly (e.g. "pop these three parameters onto the stack"). In other words, I'd like to be able to have high confidence that the operations I construct are semantically meaningful in the language. But I'll take what I can get.
If you're not sure if a candidate language meets these criteria, here's a simple litmus test:
Can you write another method called clean which:
accepts a method m as input
returns another method m2 that performs the same operations as m
such that m2 is identical to m, but doesn't contain any calls to the print-to-standard-out method in your language (puts, System.Console.WriteLn, println, etc.)?
I'd like to do some preliminary research now and figure out what the strongest candidates are. Having a large, active community is as important to me as the practicality of implementing what I want to do. I am aware that there may be some unforged territory here, since manipulating bytecode directly is not typically an operation that needs to be exposed.
What are the choices available to me? If possible, can you provide a toy example in one or more of the languages that you recommend, or point me to a recent example?
Update: The reason I'm after this is that I'd like to write a program which is capable of modifying itself at runtime in response to new information. This modification goes beyond mere parameters or configurable data, but full-fledged, evolved changes in behavior. (No, I'm not writing a virus. ;) )
Well, you could always use .NET and the Expression libraries to build up expressions. That I think is really your best bet as you can build up representations of commands in memory and there is good library support for manipulating, traversing, etc.
Well, those languages with really strong macro support (in particular Lisps) could qualify.
But are you sure you actually need to go this deeply? I don't know what you're trying to do, but I suppose you could emulate it without actually getting too deeply into metaprogramming. Say, instead of using a method and manipulating it, use a collection of functions (with some way of sharing state, e.g. an object holding state passed to each).
I would say Groovy can do this.
For example
class Foo {
void bar() {
println "foobar"
}
}
Foo.metaClass.bar = {->
prinltn "barfoo"
}
Or a specific instance of foo without effecting other instances
fooInstance.metaClass.bar = {->
println "instance barfoo"
}
Using this approach I can modify, remove or add expression from the method and Subsequent calls will use the new method. You can do quite a lot with the Groovy metaClass.
In java, many professional framework do so using the open source ASM framework.
Here is a list of all famous java apps and libs including ASM.
A few years ago BCEL was also very much used.
There are languages/environments that allows a real runtime modification - for example, Common Lisp, Smalltalk, Forth. Use one of them if you really know what you're doing. Otherwise you can simply employ an interpreter pattern for an evolving part of your code, it is possible (and trivial) with any OO or functional language.

How do I do automatic data serialization of data objects?

One of the huge benefits in languages that have some sort of reflection/introspecition is that objects can be automatically constructed from a variety of sources.
For example, in Java I can use the same objects for persisting to a db (with Hibernate), serializing to XML (with JAXB), and serializing to JSON (json-lib). You can do the same in Ruby and Python also usually following some simple rules for properties or annotations for Java.
Thus I don't need lots "Domain Transfer Objects". I can concentrate on the domain I am working in.
It seems in very strict FP like Haskell and Ocaml this is not possible.
Particularly Haskell. The only thing I have seen is doing some sort of preprocessing or meta-programming (ocaml). Is it just accepted that you have to do all the transformations from the bottom upwards?
In other words you have to do lots of boring work to turn a data type in haskell into a JSON/XML/DB Row object and back again into a data object.
I can't speak to OCaml, but I'd say that the main difficulty in Haskell is that deserialization requires knowing the type in advance--there's no universal way to mechanically deserialize from a format, figure out what the resulting value is, and go from there, as is possible in languages with unsound or dynamic type systems.
Setting aside the type issue, there are various approaches to serializing data in Haskell:
The built-in type classes Read/Show (de)serialize algebraic data types and most built-in types as strings. Well-behaved instances should generally be such that read . show is equivalent to id, and that the result of show can be parsed as Haskell source code constructing the serialized value.
Various serialization packages can be found on Hackage; typically these require that the type to be serialized be an instance of some type class, with the package providing instances for most built-in types. Sometimes they merely require an automatically derivable instance of the type-reifying, reflective metaprogramming Data class (the charming fully qualified name for which is Data.Data.Data), or provide Template Haskell code to auto-generate instances.
For truly unusual serialization formats--or to create your own package like the previously mentioned ones--one can reach for the biggest hammer available, sort of a "big brother" to Read and Show: parsing and pretty-printing. Numerous packages are available for both, and while it may sound intimidating at first, parsing and pretty-printing are in fact amazingly painless in Haskell.
A glance at Hackage indicates that serialization packages already exist for various formats, including binary data, JSON, YAML, and XML, though I've not used any of them so I can't personally attest to how well they work. Here's a non-exhaustive list to get you started:
binary: Performance-oriented serialization to lazy ByteStrings
cereal: Similar to binary, but a slightly different interface and uses strict ByteStrings
genericserialize: Serialization via built-in metaprogramming, output format is extensible, includes R5RS sexp output.
json: Lightweight serialization of JSON data
RJson: Serialization to JSON via built-in metaprogramming
hexpat-pickle: Combinators for serialization to XML, using the "hexpat" package
regular-xmlpickler: Serialization to XML of recursive data structures using the "regular" package
The only other problem is that, inevitably, not all types will be serializable--if nothing else, I suspect you're going to have a hard time serializing polymorphic types, existential types, and functions.
For what it's worth, I think the pre-processor solution found in OCaml (as exemplified by sexplib, binprot and json-wheel among others) is pretty great (and I think people do very similar things with Template Haskell). It's far more efficient than reflection, and can also be tuned to individual types in a natural way. If you don't like the auto-generated serializer for a given type foo, you can always just write your own, and it fits beautifully into the auto-generated serializers for types that include foo as a component.
The only downside is that you need to learn camlp4 to write one of these for yourself. But using them is quite easy, once you get your build-system set up to use the preprocessor. It's as simple as adding with sexp to the end of a type definition:
type t = { foo: int; bar: float }
with sexp
and now you have your serializer.
You wanted
to do lot of boring work to turn a data type in haskell into JSON/XML/DB Row object and back again into a data object.
There are many ways to serialize and unserialize data types in Haskell. You can use for example,
Data.Binary
Text.JSON
as well as other common formants (protocol buffers, thrift, xml)
Each package often/usually comes with a macro or deriving mechanism to allow you to e.g. derive JSON. For Data.Binary for example, see this previous answer: Erlang's term_to_binary in Haskell?
The general answer is: we have many great packages for serialization in Haskell, and we tend to use the existing class 'deriving' infrastructure (with either generics or template Haskell macros to do the actual deriving).
My understanding is that the simplest way to serialize and deserialize in Haskell is to derive from Read and Show. This is simple and isn't fullfilling your requirements.
However there are HXT and Text.JSON which seem to provide what you need.
The usual approach is to employ Data.Binary. This provides the basic serialisation capability. Binary instances for data types are easy to write and can easily be built out of smaller units.
If you want to generate the instances automatically then you can use Template Haskell. I don't know of any package to do this, but I wouldn't be surprised if one already exists.

Resources