Creating libraries from machine readable specifications in Haskell - haskell

I have a specification and I wish to transform it into a library. I can write a program that writes out Haskel source. However is there a cleaner way that would allow me to compile the specification directly (perhaps using templates)?
References to manuals and tutorials would be greatly appropriated.

Yes, you can use Template Haskell. The are a couple of approaches to using it.
One approach is to use quasiquotation to embed (parts of) the text of the specification in a quasiquotation within a source file. To implement it, you need to write a parser of the machine specification that outputs Haskell AST. This might be useful if the specification is relatively static, it makes sense to have subsets of the specification, or you want to manually map parts of the specification to different modules. This may also be useful, in addition to a different approach perhaps, to provide tools for users of the library to express things in terms of the specification.
Another approach is to execute IO in a normal Template Haskell splice. This would allow you to read the specification from a file (see addDependentFile too in this case), the network (don't do this), or to execute an arbitrary program to produce the Haskell AST needed. This might be more useful if the specification changes more often, or you want to keep a strict separation between the specification and code.
If it's much easier to produce Haskell source than Haskell AST, you can use a library like haskell-src-meta which will parse a string into Template Haskell AST.

Related

What do I need to learn to build an interpreter?

For my AQA A2-level Computing project, I've decided to create a basic interpreted programming language, outputting to Console. I don't know how to build an interpreter. I have a copy of the purple dragon book, which is all about compiler design, as user166390 said on an answer to this question that the initial steps to building a compiler are the same to build an interpreter. My question is: is this true?
Can I use the techniques described in the dragon book to write an interpreter? And if so, which steps do I need to use and learn how to use?
Do I need to write a lexical analyser, a syntax analyser, a semantic analyser and an intermediate code generator, for example?
Could I get away with writing a basic parser that reads each line of the source code, parses it, and executes the instruction straight away, or is that a notoriously bad idea?
Yes, you can use the techniques described in the dragon book to write an interpreter.
You need a lexical analyzer and a parser regardless.
As others have pointed out, you do need to write the code to do actual execution -- but for a simple interpreter, this can be essentially the same as the syntax-directed translation described in the dragon book.
Everything else is optional.
If you want to skip straight from the parser to execution, you can. That will leave you with a very simple language, which can be both good and bad -- look at Tcl for an example of such a language.
If you want to interpret each line as you parse it, you can do that, too; this is what most command-line interpreters (Unix shell scripts, Microsoft's cmd.com and PowerShell) do, as well as interactive "REPL's" (Read-Eval-Print-Loops) for languages like Python and Ruby.
"Semantic analyzer" seems vague to me, but sounds like it should include most kinds of load-time consistency checks. This is also optional, but there are advantages in an interpreter that won't take any old garbage and try to execute it as a program...
"Intermediate code" is also kind of vague, but it is arguably optional. If you aren't executing directly from the program string (as in Tcl), you need some kind of internal representation to store your code once you've read it in. One popular option is to execute from an internal tree structure, based more or less closely on your parse tree, which is arguably distinct from producing "intermediate code". On the other hand, if your "intermediate code" could be written out more or less directly from your internal tree structure, you might as well count the internal structure as your "intermediate code".
There are important issues that you haven't addressed; one that stands out is: how do you want to handle names? Presumably you will want the programmer to be able to define and use his own names (e.g., for variables, functions, and so forth), so you will need to implement some kind of mechanism for that.
Exactly how names are handled is a big design decision, with major implications for the usability and implementability of your language. The simplest option for implementation is to use a single, global hash map to implement a single, global namespace -- but note that this choice has well-known usability problems...
Could I get away with writing a basic parser that reads source code and executes the steps straight away?
You could but you'd be doing it the hard way.
Do I need to write a lexical analyser, a syntax analyser, a semantic analyser and an intermediate code generator, for example?
You can skip intermediate code generation except if you want to write a VM-based interpreter. Perl for example, used to execute its parse graph directly; this is in contrast with Java or Python, which produces intermediate byte code.
The interpreter part of a VM-based language is generally simpler than the interpreter that have to understand a parse graph (so each component in the system is simpler), however the complexity of the whole interpreter stack is generally simpler when you don't need to define an intermediate bytecode language. So pick your poison.

Functional programming languages introspection

I'm sketching a design of something (machine learning of functions) that will preferably want a functional programming language, and also introspection, specifically the ability to examine the program's own code in some nicely tractable format, and preferably also the ability to get machine generated code compiled at runtime, and I'm wondering what's the best language to write it in. Lisp of course has strong introspection capabilities, but the statically typed languages also have advantages; the ones I'm considering are:
F# - the .Net platform has a good story here, you can read byte code at run time and also emit byte code and get it compiled; I assume there's no problem accessing these facilities from F#.
Haskell, Ocaml - do these have similar facilities, either via byte code or parse tree?
Are there other languages I should also be looking at?
Haskell's introspection mechanism is Template Haskell, which supports compile time metaprogramming, and when combined with e.g. llvm, provides runtime metaprogramming facilities.
Ocaml has:
Camlp4 to manipulate Ocaml concrete syntax trees in Ocaml. The maintained implementation of Camlp4 is Camlp5.
MetaOCaml for full-scale multi-stage programming.
Ocamljit to generate native code at run time, but I don't think it's been maintained recently.
Ocaml-Java to compile Ocaml code for the Java virtual machine. I don't know if there are nice reflection capabilities.
Not really an answer, but note also the F# Quotations feature and library, for more homoiconicity stuff.
You might check out the typed variant of Racket (previously known as PLT Scheme). It retains most of the syntactic simplicity of Scheme, but provides a static type system. Since Racket is a Scheme, metaprogramming is par for the course, and the runtime can emit native code by way of a JIT.
The Haskell approach would be more along the lines of parsing the source. The Haskell Platform includes a complete source parser, or you can use the GHC API to get access that way.
I'd also look at Scala or Clojure which come with them all the libraries that have been developed for Java. You'll never need to worry if a library does not exist. But more to the point of your question, these languages give you the same reflection (or more powerful types) that you will find within Java.
I'm sketching a design of something (machine learning of functions) that will preferably want a functional programming language, and also introspection, specifically the ability to examine the program's own code in some nicely tractable format, and preferably also the ability to get machine generated code compiled at runtime, and I'm wondering what's the best language to write it in. Lisp of course has strong introspection capabilities, but the statically typed languages also have advantages; the ones I'm considering are:
Can you not just parse the source code like an ordinary interpreter or compiler? Why do you need introspection?
F# - the .Net platform has a good story here, you can read byte code at run time and also emit byte code and get it compiled; I assume there's no problem accessing these facilities from F#.
F# has a rudimentary quotation mechanism but you can only quote some expressions and not other kinds of code, most notably type definitions. Also, its evaluation mechanism is orders of magnitude slower than genuine compilation so it is basically completely useless. You can use reflection to analyze type definitions but, again, it is quite rudimentary.
You can read byte code but that has been compiled so a lot of information and structure has been lost.
F# also has lexing and parsing technology (most notably fslex, fsyacc and FParsec) but it is not as mature as OCaml's.
Haskell, Ocaml - do these have similar facilities, either via byte code or parse tree?
Haskell has Template Haskell but I've never heard of anyone using it (abandonware?).
OCaml has its Camlp4 macro system and a few people do use it but it is poorly documented.
As for lexing and parsing, Haskell has a few libraries (most notably Parsec) and OCaml has many libraries.
Are there other languages I should also be looking at?
Term rewrite languages like Mathematica would be an obvious choice because they make it trivial to manipulate code. The Pure language might be of interest.
You might also consider MetaOCaml for its run-time compilation capabilities.

is it possible to markup all programming languages under object oriented paradigm using a common markup schema?

i have planned to develop a tool that converts a program written in a programming language (eg: Java) to a common markup language (eg: XML) and that markup code is converted to another language (eg: C#).
in simple words, it is a programming language converter that converts program written in one language to another language.
i think it is possible but i don know where to start. i wanna know the possibilities to do so and information about some existing system.
What you are trying to do is extremely hard, but if you want to know what you are up for I've listed the steps you need to follow below:
First the hard bit:
First you obtain or derive an operational semantics for your source and target languages.
Then you enhance the semantics to capture your source and target memory models.
Then you need to unify the two enhanced-semantics within a common operational model.
Then you need to define a mapping from your source languages onto the common operational model.
Then you need to define a mapping from your operational model to your target language
Step 4, as you pointed out in your question, is trivial.
Step 1 is difficult, as most languages do not have sufficiently formal semantics specified; but I recommend checking out http://lucacardelli.name/TheoryOfObjects.html as this is the best starting point for building a traditional OO semantics.
Step 2 is almost certainly impossible in general, but may be merely obscenely difficult if you are willing to sacrifice some efficiency.
Step 3 will depend on how clean the result of step 1 turned out, but is going to be anything from delicate and tricky to impossible.
Step 5 is not going to be trivial, it is effectively writing a compiler.
Ultimately, what you propose to do is impossible in general, due to the difficulties inherited in steps 1 and 2. However it should be difficult, but doable, if you are willing to: severely restrict the source language constructs supported; pretty much forget handling threads correctly; and pick two languages with sufficiently similar semantics (ie. Java and C# are ok, but C++ and anything-else is not).
It depends on what languages you want to support, but in general this is a huge & difficult task unless you plan to only support a very small subset of each language.
The real problem is that each programming languages has different features (with some areas that overlap and others that don't) and different ways of solving the same problems -- and it's pretty tricky to detect the problem the programmer is trying to solve and convert that to a new idiom. :) And think about the differences between GUIs created in different languages....
See http://xmlvm.org/ as an example (a project aimed at converting between source code of many different languages, with an XML middle-point) -- the site covers in some depth the challenges they are tackling and the compromises they take, and (if you still have any interest in this kind of project...) ask more specific followup questions.
Notice specifically what the output source code looks like -- it's not at all readable, maintainable, efficient, etc..
It is "technically easy" to produce XML for any single langauge: build a parser, construct and abstract syntax tree, and dump out that tree as XML. (I build tools that do this off-the-shelf for many languages). By technically easy, I mean that the community knows how to do this (see any compiler textbook, e.g., Aho&Ullman Dragon book). I do not mean this is a trivial exercise in terms of effort, because real languages are complicated and messy; there have been many attempts to build C++ parsers and few successes. (I have one of the successes, and it was expensive to get right).
What is really hard (and I don't try to do) is produce XML according to a single schema in which the language semantics are exposed. And without that, it will be essentially impossible to write a translator from a generic XML to an arbitrary target language. This is known as the UNCOL problem and people have been looking since 1958 for the answer. I note that the Wikipedia article seems to indicate the problem is solved, but you can't find many references to UNCOL in the literature since 1961.
The closest attempt I've seen to this is the OMG's "ASTM" model (http://www.omg.org/spec/ASTM/1.0/Beta1/); it exports XMI which is XML. But the ASTM model has lots of escapes built into it to allow langauges that it doesn't model perfectly (AFAIK, that means every language) to extend the XMI in arbitrary ways so that the language-specific information can be encoded. Consequently each language parser produces a custom version of the XMI, and thus each reader has to pretty much know about the extensions and full generality vanishes.

how to develop domain specific language on top of another language?

say i found a good open source software/library written in python. i want to wrap some of the functions or methods that i have created into easy to understand language of my own.
do porter_stemm(DOC) (the DSL) would be equivalent to the function or series of methods written in python.
i want to create a DSL that is easy to learn, but need this DSL translated into the original open source software software.
im not sure if i am clear here but my intention is:
create an easy to learn code language that users can use to solve a problem in a certain niche.
this simple language needs to be translated or compiled or interpretated via some middleware into the original open source software's language (python).
Write your DSL's syntax and a parser for it, e.g., in Python, with pyparsing (simpler than the traditional lexx-yacc like approach) -- the parser can produce a tree of semantically meaningful nodes, and then you can (simpler) walk that tree and interpret it, or (a bit less simple) walk that tree and generate equivalent Python code. That's the approach I'd suggest for most host languages. Some host languages (Lisp and Scheme being the main ones) have powerful macros, so building a DSL out of macros is more common in those languages.
Embedding your DSL in the host language basically means you're not really doing a DSL but a more traditional framework, so that's really a different approach (possibly more powerful, but may be not quite as easy to learn for non-programmers;-).
Typically a DSL would still have the same basic syntax as its host language, and simply extend the language's capabilities. So everything for you is still written in Python, it's just Python with some domain-specific functions, variables, etc.
My experience with DSLs is mostly with the Lisp family of languages. There, a typical DSL might be implemented with Macros, which generate code-patterns that are specific to the DSL. (Plus, functions relating to the DSL). The point is, although the macros and functions may extend the capabilities of the language, they are still implemented in that language.
I suppose you could go a step farther and write an interpreter in Python - but there you still get Python runnability for free, as your interpreter translates your custom language back into Python.

Most dynamic dynamic programming language [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
It seems I've got to agree with this post when it states that
[...] code in dynamically typed languages follows static-typing conventions
Much dynamic language code I encounter does indeed seem to be quite static (thinking of PHP) whereas dynamic approaches look somewhat clumsy or unnecessary instead.
Most of the time, it's just about omitting type signatures, which, in the context of type-inference/structural typing, doesn't even have to imply dynamic typing at all.
So my question (and it's not meant to be too subjective) is, in which dynamic languages or fields of application are all these more advanced dynamic language features (that couln't be replicated in static/compiled languages that easily) actually and idomatically used.
Examples:
Reflection
First-class continuations
Runtime object alteration/generation
Metaprogramming
Run-time code evaluation
Non-existent member behaviour
What are useful applications for such techniques?
Some examples of widespread application of the above techniques are:
Continuations make their appearance in web frameworks like Rails or Seaside. They can be used to allow an API to fake a local context. In Seaside or Rails this makes the API behave much more like a local GUI form handler than an HTTP request handler, which serves to simplify the task of coding the application's user interface elements. However, although many dynamic languages have strong support for continuations they are certainly not unique to this type of language.
Reflection is quite widely used for O/R mappers and serialisation, but many statically typed langages support reflection as well. On duck typed languages it can be used to find out at runtime if a facility is implemented by looking at the object's metadata. Some O/R mappers (and similar tools) work by implementing accesses to instance variables and redirecting the updates to a cached record in the data access layer. This helps to make the persistence relatively transparent to the developer as the field accesses look much like local variables.
Runtime object alteration is slightly useful (think monkey-patching) but mostly a gimmick. There aren't many really killer uses for it that come to mind immediately, but people certainly do use it. One possible use for it is fixing slightly broken behaviour when subclassing is not an option for some reason.
Metaprogramming is quite a fuzzy definition for a term, but arguably generics and C++ templates are an example of metaprogramming - taking place on statically typed languages. On languages with metaclass support, custom metaclasses can be used to implement particular behaviours such as singletons or object registries.Another metaprogramming example is Smalltalk's #notImplemented: method which is called on attempts to invoke nonexistent methods. The method name and parameters are supplied to the implementor of #notImplemented:, and can subsequently be used to construct a method invocation reflectively. Trapping this can be used (for example) to implement generic proxy mechanisms.
LISP programmers would argue that LISP is the most dynamic language of all due to its first class support for diddling directly with the parse trees of the code (known as 'macros'). This facility makes implementing DSLs trivial in LISP - and integrating them transparently into your code base.
All features you enumerate are also available in statically typed languages some with constraints.
Reflection: Present in Java, C# (not type safe).
First-class continuations: restricted support in Scala (maybe others)
Runtime object alteration: Changing the type of an object is supported in a restriced form in C# with extension methods (will be in Java 7) and implicit type conversions in Scala. Although open class is not supported most of the use cases are covered by type conversions.
Metaprogramming: I would say Metaprogramming is the heading for a lot of related features like reflection, type changes at runtime, AOP etc.
So there is not a lot left that is supported only by dynamic languages to discuss. Support for example for Reflection circumvents the type system but it is useful in certain situations where this kind of flexibility is needed. The same is true in dynamic languages.
The open class feature supported by Ruby is something that compiled languages will never support. It is the most flexible form of Metaprogramming possible (with all the implications: security, performance, maintainability.) You can change classes of the platform. It's used by Ruby on Rails to create methods of domain objects from metadata on the fly. In a statically typed language you have at least to create (or generate the code of) the interface of your domain object.
If you're looking for the "most dymanic languages" all homoiconic languages like LISP and Prolog are good candidates. Interestingly, C# is somewhat homoiconic with the expression trees in LINQ.
You should visit Douglas Crockford's Wrrrld Wide Web and see his wizardry over Javascript. Javascript is usually written in pretty straightforward and simple manner, like slightly simplified C. But it's only the surface. The unmutable keywords are a small percent of the language power. Most of it lies in objects and methods exported by the system, and these are fully mutable. You can replace/extend methods on the fly, you can replace pretty deeply rooted system methods, nest eval(), load generated <SCRIPT> on the fly, and so on. This is usable in writing all kinds of language extensions, frameworks, toolboxes and such. Instead of 200 lines of code of your program in straightforward Javascript, you write 50 lines that modify how Javascript work, and another 50 that use the new syntax to get the work done. You can generate whole pages on the fly, including JS embedded in them. You turn webpage structure into data storage. You replace frequently used methods of popular objects, and your own, to change their behavior on the fly, changing not only looks but also function of a webpage in one click.
It really feels like Javascript becomes a metalanguage to modify the Javascript engine, and make Javascript function like a different language, then you further modify it using the already modified, and your actual, final app takes a dozen of extremely intuitive lines getting the language do exactly what it needs. Oh, and patches the countless bugs and shortcomings of Javascript implementation on MSIE in the process.
I won't claim Lisp is the "most dynamic" (I'm not even sure what that means), but Lisp programmers frequently do things that are difficult-to-impossible in other languages:
create new control structures
create new syntax for existing constructs (I think every metaclass I've ever seen has its own defwhatever form)
extend the runtime (every .emacs is a runtime extension, e.g., what would it take to write calendar-mode for another editor?)
Yegge talks about it some here w.r.t. Emacs, e.g., parse XML by converting it to s-expressions, writing functions for the tags you want to process, and actually running it.
Ultimately it's not languages that write dynamic code, it's programmers; and there's going to be a learning curve to adjust your patterns to styles you're not used to. So what types of work can make best use of dynamic capabilities? The first that comes to my mind is middleware; interfaces among heterogeneous systems; especially those with imperfectly documented APIs or APIs that change a lot, and data serialization is dynamic.
I'd say anywhere you see REST and jason being applied, you're more likely to find dynamic code, for instance, where javascript, php, perl, ruby, ... are popular at least partially because they are capable of dynamic adaptation.
Also, there's a lot of javascript browser code that deals with browser version and brand incmpatiblities using dynamic techniques.
Yes i feel JavaScript as good one.
JavaScript is so flexible that people working on different languages have different variants of it for them. Like Microsoft has Ajax library which has typical .NET/C# type syntax. Also there are some JavaScript libraries which uses $ which looks similar like PHP syntaxes. Its all there because JavaScript is bueaty How many other languages one can tell which can facilitates something like this?
And one should know about the JavaScript closure feature which is state of art and help create amazing algorithms with great results.

Resources