This question is about design / is fairly open-ended.
I'd like to use OpenCV, a large C++ library, from Haskell.
The closest solution at the moment is probably Arjun Comar's attempt to adapt the Python / Java binding generator.
See here, here, and here.
His approach generates a C interface, which is then wrapped using hsc2hs.
Due to OpenCV's lack of referential transparency in its API, as well as its frequent use of call parameters for output, for Arjun's approach to fully succeed he'll need to define a new API for OpenCV, and implement it in terms of the existing one.
So, it seems it might not be too much extra work to go whole-hog and define an API using an interface description languages (IDL), such as SWIG, protobuf-with-RPC, or Apache Thrift.
This would provide interfaces to a number of languages besides Haskell.
My questions:
Is there anything better than SWIG for a server-free solution?
(I just want to call into C++; I'd rather not go through a local server.)
If there's no good server-free solution, should I use protobuf-with-RPC or Thrift?
Related: How good is Thrift's Haskell support?
From the code, it looks like it needs updating (I see references to GHC 6).
Related: What's a good protobuf-with-RPC solution?
With Apache Thrift, you get Haskell support. You are correct, code is not generally "latest", but you rarely care. You can do complex things on other abstraction levels and keep things as simple as possible at messaging level.
Google Protobuf has no support for Haskell, nor does SWIG. With Protobuf you get C++, Java, JavaScript and Python, to my knowledge the main languages at Google. Have a look at this presentation. Without contest, Thrift and Protobuf are the best in house.
It seems in your case you have to go with Thrift, as it supports Haskell.
It sounds like the foreign function interface for C++ is what you want:
Hackage,
Github
Disclaimer: I haven't used it, only heard good things about it.
In Common Lisp, programs are often produced as binaries with a translator bundled inside. StumpWM is a good example.
How would one do the same with Haskell and OCaml?
It is not necessary to provide a debugger as well, as Common Lisp does, the aim is to make extensions while not depending on the whole translator package ( xmonad which requires GHC ).
P.S. I know about ocamlmktop, and it works great, except I don't really get why it requires "pervasives.cmi" and doesn't bundle it with the binary. So, best thing I can do is mycustomtoplevel -I /path/to/dir/with/pervasives.cmi/. Any way to override it?
This isn't really possible for (GHC) Haskell - you would either need to to ship the application binary + GHC so you can extend via GHC-API, or embed an extension language. I don't think there are any "off-the-shelf" extension languages to embed in Haskell at the moment, though HsLua might be close. This is a bridge to the the standard (C source) Lua. There was a thread on Haskell-cafe last month about extension languages written in Haskell, I think the answer was 'there aren't any'.
http://www.haskell.org/pipermail/haskell-cafe/2010-November/085830.html
With GHC, there is GHC-API, which allows you to embed ghci-like interpreters in your program. It's a quite low-level and often changing library, since it simply provides access to GHC internas.
Then, there is Hint, a library which aims to encapsulate ghc-api behind a well designed and more stable interface.
Nevertheless, I've recently switched from using either of these packages to using an external ghci. The external ghci process is controlled via standard input/output pipes. This change made it easy to stay compatible with GHC 6.12.x and 7.0.x, while our ghc-api code broke with GHC 7.x and hint didn't work out of the box either. I don't know whether there is a new version of hint available, which works with GHC 7.
For Ocaml, have you tried using findlib? See the section Custom Toploops.
I have a large program written with my own patched version of the GNU Eiffel (SmallEiffel) compiler. While I love the language I'm running into the problem that the compiler is O(n^2) or worse on the compiled system size. So I have to move soon.
ISE Eiffel the only alive Eiffel compiler is not an option for various reasons. Mostly because the compiled code runs way to slow.
I'm looking for a language which is:
imperative and OO
has generics/templates
compiles to native code and does not
require .NET/Java
statically typed (which means fast)
garbage collected
cross platform
not as ugly and braindead as C++
I couldn't come up with anything else then D but this looks a little bit to low level and non stable. Is there really none which satisfies this seven points?
OCaml, perhaps?
You could write in Java and compile to native-ish code with GCJ (it will be native code, but you'll need to link against a fair portion of code that makes up all the things Java needs at run-time. Your users will not need to install a JRE.)
Googling 'object oriented native code compiler' brings up Objective Caml before Eiffel.
If you're willing to take your chances on a research compiler, check out the Diesel language and the native-code Vortex compiler (written for Diesel in Diesel). It is a research project, but it is stable, and Craig Chambers is one of the best people in the business.
What about Python?
It is OO, scripted language, runs fast, has generic templates.
Let's assume for the moment that C++ is not a functional programming language. If you want to write a compiler using LLVM for the back-end, and you want to use a functional programming language and its bindings to LLVM to do your work, you have two choices as far as I know: Objective Caml and Haskell. If there are others, then I'd like to know about those too.
I'm not asking for subjective opinions, so please don't give this the subjective tag. I want to make up my own mind about this, but I'm not sure I know what are all the trade-offs. So, StackOverflow to the rescue. What are the trade-offs?
Either OCaml or Haskell would be a good choice. Why not check out the LLVM tutorials for each language? The LLVM tutorial for OCaml is here: http://llvm.org/docs/tutorial/OCamlLangImpl1.html
Haskell has more momentum these days, but there are plenty of good parsing libraries for OCaml as well including the PEG parser generator Aurochs, Menhir, and the GLR parser generator Dypgen. Also check out this presentation on pcl a monadic parser combinator library for OCaml (like Parsec for Haskell) there's some good info in there comparing Haskell's and OCaml's approach: http://osp.janestreet.com/files/pcl.pdf
Some will say that laziness gives Haskell the edge in parsing, but you can get laziness in OCaml as well.
Haskell has higher level bindings to LLVM than OCaml (the Haskell ones provide some interesting type safety guarantees) and Haskell has by far more libraries to use (1700 packages on http://hackage.haskell.org) making it easier to glue together components.
Availability of native bindings need not constrain your choice of language. There is a third option, apart from using bindings or generating IR text directly:
You can use a language-neutral serialization format, such as Google's Protocol Buffers, to serve as the bridge from your front-end to your back-end. Protocol buffers are, after all, just ASTs in disguise.
Your front end, implemented in a functional language, then does what it is best at -- parsing, type checking, desugaring, core-to-core transformations, etc -- and the C++ backend takes the IR from your frontend and uses LLVM's feature-complete-by-definition native C++ API to do lowering from your-language-IR to LLVM IR. This makes it much easier to handle "advanced" features of LLVM such as debug metadata.
I'm using this strategy with hprotoc and associated Haskell bindings for protocol buffers, and am very happy with the results. There is much to be said for using the right tool for the job!
OCaml is the only functional language with bindings in the LLVM distro itself and documentation on llvm.org such as the Kaleidoscope tutorial. If you have OCaml installed when you build and install LLVM then it will automatically build and install the LLVM bindings for OCaml as well. Moreover, these OCaml bindings have been in use for years so they are mature and reliable.
I have been developing HLVM in OCaml using the standard LLVM bindings and found OCaml+LLVM to be an extremely powerful combination. HLVM provides tuples, arrays, unions, TCO of all tail calls, generic printing, FFI to C, JIT compilation and parallel garbage collection with a VM weighing in at under 2kLOC of OCaml code that took only a few man-weeks to develop from scratch. HLVM's numerical performance already far exceeds that of today's fastest open source FPLs including OCaml itself. I have published articles in the OCaml Journal describing how LLVM can be used from OCaml for everything from basic expression evaluation to advanced topics such as parallelism and garbage collection. You may also like this mini example.
I need to get around to writing that programming language I've been meaning to write. How do you kids do it these days? I've been out of the loop for over a decade; are you doing it any differently now than we did back in the pre-internet, pre-windows days? You know, back when "real" coders coded in C, used the command line, and quibbled over which shell was superior?
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily) but how do you build the compiler and standard libraries and so forth? What tools do you kids use these days?
One consideration that's new since the punched card era is the existence of virtual machines already bountifully provided with "standard libraries." Targeting the JVM or the .NET CLR instead of ye olde "language walled garden" saves you a lot of bootstrapping. If you're creating a compiled language, you may also find Java byte code or MSIL an easier compile target than machine code (of course, if you're in this for the fun of creating a tight optimising compiler then you'll see this as a bug rather than a feature).
On the negative side, the idioms of the JVM or CLR may not be what you want for your language. So you may still end up building "standard libraries" just to provide idiomatic interfaces over the platform facility. (An example is that every languages and its dog seems to provide its own method for writing to the console, rather than leaving users to manually call System.out.println or Console.WriteLine.) Nevertheless, it enables an incremental development of the idiomatic libraries, and means that the more obscure libraries for which you never get round to building idiomatic interfaces are still accessible even if in an ugly way.
If you're considering an interpreted language, .NET also has support for efficient interpretation via the Dynamic Language Runtime (DLR). (I don't know if there's an equivalent for the JVM.) This should help free you up to focus on the language design without having to worry so much about the optimisation of the interpreter.
I've written two compilers now in Haskell for small domain-specific languages, and have found it to be an incredibly productive experience. The parsec library makes playing with syntax easy, and interpreters are very simple to write over a Haskell data structure. There is a description of writing a Lisp interpreter in Haskell that I found helpful.
If you are interested in a high-performance backend, I recommend LLVM. It has a concise and elegant byte-code and the best x86/amd64 generating backend you can find. There is an optional garbage collector, and some experimental backends that target the JVM and CLR.
You can write a compiler in any language that produces LLVM bytecode. If you are adventurous enough to learn Haskell but want LLVM, there are a set of Haskell-LLVM bindings.
What has changed considerably but hasn't been mentioned yet is IDE support and interoperability:
Nowadays we pretty much expect Intellisense, step-by-step execution and state inspection "right in the editor window", new types that tell the debugger how to treat them and rather helpful diagnostic messages. The old "compile .x -> .y" executable is not enough to create a language anymore. The environment is nothing to focus on first, but affects willingness to adopt.
Also, libraries have become much more powerful, noone wants to implement all that in yet another language. Try to borrow, make it easy to call existing code, and make it easy to be called by other code.
Targeting a VM - as itowlson suggested - is probably a good way to get started. If that turns out a problem, it can still be replaced by native compilers.
I'm pretty sure you do what's always been done.
Write some code, and show your results to the world.
As compared to the olden times, there are some tools to make your job easier though. Might I suggest ANTLR for parsing your language grammar?
Speaking as someone who just built a very simple assembly like language and interpreter, I'd start out with the .NET framework or similar. Nothing can beat the powerful syntax of C# + the backing of the entire .NET community when attempting to write most things. From here i designed a simple bytecode format and assembly syntax and proceeeded to write my interpreter + assembler.
Like i said, it was a very simple language.
You should not accept wimpy solutions like using the latest tools. You should bootstrap the language by writing a minimal compiler in Visual Basic for Applications or a similar language, then write all the compilation tools in your new language and then self-compile it using only the language itself.
Also, what is the proposed name of the language?
I think recently there have not been languages with ALL CAPITAL LETTER names like COBOL and FORTRAN, so I hope you will call it something like MIKELANG with all capital letters.
Not so much an implementation but a design decision which effects implementation - if you make every statement of your language have a unique parse tree without context, you'll get something that it's easy to hand-code a parser, and that doesn't require large amounts of work to provide syntax highlighting for. Similarly simple things like using a different symbol for module namespaces and object namespaces ( unlike Java which uses . for both package and class namespaces ) means you can parse the code without loading every module that it refers to.
Standard libraries - include the equivalent of everything in C99 standard libraries other than setjmp. Add whatever else you need for your domain. Work out an easy way to do this, either something like SWIG or an in-line FFI such as Ruby's [can't remember module name] and Python's ctypes.
Building as much of the language in the language is an option, but projects which start out doing either give up (rubinius moved to using C++ for parts of its standard library), or is only for research purposes (Mozilla Narcissus)
I am actually a kid, haha. I've never written an actual compiler before or designed a language, but I have finished The Red Dragon Book, so I suppose I have somewhat of an idea (I hope).
It would depend firstly on the grammar. If it's LR or LALR I suppose tools like Bison/Flex would work well. If it's more LL, I'd use Spirit, which is a component of Boost. It allows you to write the language's grammar in C++ in an EBNF-like syntax, so no muddling around with code generators; the C++ compiler compiles the grammar for you. If any of these fail, I'd write an EBNF grammar on paper, and then proceed to do some heavy recursive descent parsing, which seems to work; if C++ can be parsed pretty well using RDP (as GCC does it), then I suppose with enough unit tests and patience you could write entire compilers using RDP.
Once I have a parser running and some sort of intermediate representation, it then depends on how it runs. If it's some bytecode or native code compiler, I'll use LLVM or libJIT to process it. LLVM is more suited for general compilation, but I like the libJIT API and documentation better. Alternatively, if I'm really lazy, I'll generate C code and let GCC do the actual compilation. Another alternative, is to target an existing VM, like Parrot or the JVM or the CLR. Parrot is the VM being designed for Perl. If it's just an interpreter, I'll walk the syntax tree.
A radical alternative is to use Prolog, which has syntax features which remarkably simulate EBNF. I have no experience with it though, and if I am not wrong (which I am almost certainly going to be), Prolog would be quite slow if used to parse heavy duty programming languages with a lot of syntactical constructs and quirks (read: C++ and Perl).
All this I'll do in C++, if only because I am more used to writing in it than C. I'd stay away from Java/Python or anything of that sort for the actual production code (writing compilers in C/C++ help to make it portable), but I could see myself using them as a prototyping language, especially Python, which I am partial towards. Of course, I've never actually done any of this before, so I'm not one to say.
On lambda-the-ultimate there's a link to Create Your Own Programming Language by Marc-André Cournoyer, which appears to describe how to leverage some modern tools for creating little languages.
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily)
Just a hint: Look at some quite different languages first, before designing a new languge (i.e. languages with a very different evaluation strategy). Haskell and Oz come to mind. Though you should also know Prolog and Scheme. A year ago I also was like "hey, let's design a language that behaves exactly as I want", but fortunatly I looked at those other languages first (or you could also say unfortunatly, because now I don't know how I want a language to behave anymore...).
Before you start creating a language you should read this:
Hanspeter Moessenboeck, The Art of Niklaus Wirth
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf
There's a big shortcut to implementing a language that I don't see in the other answers here. If you use one of Lukasiewicz's "unparenthesized" forms (ie. Forward Polish or Reverse Polish) you don't need a parser at all! With reverse polish, the dependencies go right-to-left so you simply execute each token as it's scanned. With forward polish, it's the reverse of that, so you actually execute the program "backwards", simplifying subexpressions until reaching the starting token.
To understand why this works, you should investigate the 3 primary tree-traversal algorithms: pre-order, in-order, post-order. These three traversals are the inverse of the parsing task that a language reader (i. parser) has to perform. Only the in-order notation "requires" a recursive decent to re-construct the expression tree. With the other two, you can get away with just a stack.
This may require more "thinking' and less "implementing".
BTW, if you've already found an answer (this question is a year old), you can post that and accept it.
Real coders still code in C. Just that it's a litte sharper.
Hmmm... language design? or writing a compiler?
If you want to write a compiler, you'd use Flex + Bison. (google)
Not an easy answer, but..
You essentially want to define a set of rules written in text (tokens) and then some parser that checks these rules and assembles them into fragments.
http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/
People can spend years on this, The above article talks about using two tools (Flex and Bison) That can be used to turn text into code you can feed to a compiler.
First I spent a year or so to actually think how the language should look like. At the same time I helped in developing Ioke (www.ioke.org) to learn language internals.
I have chosen Objective-C as implementation platform as it's fast (enough), simple and rich language. It also provides test framework so agile approach is a go. It also has a rich standard library I can build upon.
Since my language is simple on syntactic level (no keywords, only literals, operators and messages) I could go with Ragel (http://www.complang.org/ragel/) for building scanner. It's fast as hell and simple to use.
Now I have a working object model, scanner and simple operator shuffling plus standard library bootstrap code. I can even run a simple programs - as long as they fit in one file that is :)
Of course older techniques are still common (e.g. using Flex and Bison) many newer language implementations combine the lexing and parsing phase, by using a parser based on a parsing expression grammar (PEG). This works for recursive descent parsers created using combinators, or memoizing Packrat parsers. Many compilers are built using the Antlr framework also.
Use bison/flex which is the gnu version of yacc/lex. This book is extremely helpful.
The reason to use bison is it catches any conflicts in the language. I used it and it made my life many years easier (ok so i'm on my 2nd year but the first 6months was a few years ago writing it in C++ and the parsing/conflicts/results were terrible! :(.)
If you want to write a compiler obviously you need to read the Dragon Book ;)
Here is another good book that I have just read. It is practical and easier to understand than the Dragon Book:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0
Mike --
If you're interested in an efficient native-code-generating compiler for Windows so you can get your bearings -- without wading through all the unnecessary widgets, gadgets, and other nonsense that clutter today's machines -- I recommend the Osmosian Order's Plain English development system. It includes a unique interface, a simplified file manager, a friendly text editor, a handy hexadecimal dumper, the compiler/linker (of course), and a wysiwyg page-layout application for documentation. Written entirely in Plain English, it is a quick download (less than a megabyte), small enough to understand in short order (about 25,000 lines of Plain English code, with just 4,000 in the compiler/linker), yet powerful enough to reproduce itself on a bottom-of-the-line Dell in less than three seconds. Really: three seconds. And it's free to all who write and ask for a copy, including the source code and and a rather humorous tongue-in-cheek 100-page manual. See www.osmosian.com for details on how to get a copy, or write to me directly with questions or comments: Gerry.Rzeppa#pobox.com