Implementing "Generator" support in a custom language

Implementing "Generator" support in a custom language - dsl

I've got a bit of fettish for language design and I'm currently playing around with my own hobby language. (http://rogeralsing.com/2010/04/14/playing-with-plastic/)
One thing that really makes my mind bleed is "generators" and the "yield" keyword.
I know C# uses AST transformation to transform enumerator methods into statemachines.
But how does it work in other languages?
Is there any way to get generator support in a language w/o AST transformation?
e.g. Does languages like Python or Ruby resort to AST transformations to solve this to?
(The question is how generators are implemented under the hood in different languages, not how to write a generator in one of them)

Generators are basically semi-coroutines with some annoying limitations. So, obviously, you can implement them using semi-coroutines (and full coroutines, of course).
If you don't have coroutines, you can use any of the other universal control flow constructs. There are a lot of control flow constructs that are "universal" in the sense that every control flow construct (including all the other universal control flow constructs), including coroutines and thus generators can be (more or less) trivially transformed into only that universal construct.
The most well-known of those is probably GOTO. With just GOTO, you can build any other control flow construct: IF-THEN-ELSE, WHILE, FOR, REPEAT-UNTIL, FOREACH, exceptions, threads, subroutine calls, method calls, function calls and so on, and of course also coroutines and generators.
Almost all CPUs support GOTO (although in a CPU, they usually call it jmp). In fact, in many CPUs, GOTO is the only control flow construct, although today native support for at least subroutine calls (call) and maybe some primitive form of exception handling and/or concurrency primitive (compare-and-swap) are usually also built in.
Another well-known control flow primitive are continuations. Continuations are basically a more structured, better manageable and less evil variant of GOTO, especially popular in functional languages. But there also some low-level languages that base their control flow on continuations, for example the Parrot Virtual Machine uses continuations for control flow and I believe there are even some continuation-based CPUs in some research lab somewhere.
C has a sort-of "crappy" form of continuations (setjmp and longjmp), that are much less powerful and less easy to use than "real" continuations, but they are plenty powerful enough to implement generators (and in fact, can be used to implement full continuations).
On a Unix platform, setcontext can be used as a more powerful and higher level alternative to setjmp/longjmp.
Another control flow construct that is well-known, but doesn't probably spring to mind as a low-level substrate build other control flow constructs on top of, are exceptions. There is a paper that shows that exceptions can be more powerful than continuations, thus making exceptions essentially equivalent to GOTO and thus universally powerful. And, in fact, exceptions are sometimes used as universal control flow constructs: the Microsoft Volta project, which compiled .NET bytecode to JavaScript, used JavaScript exceptions to implement .NET threads and generators.
Not universal, but probably powerful enough to implement generators is just plain tail call optimization. (I might be wrong, though. I unfortunately don't have a proof.) I think you can transform a generator into a set of mutually tail-recursive functions. I know that state machines can be implemented using tail calls, so I'm pretty sure generators can, too, since, after all, C# implements generators as state machines. (I think this works especially well together with lazy evaluation.)
Last but not least, in a language with a reified call stack (like most Smalltalks for example), you can build pretty much any kind of control flow constructs you want. (In fact, a reified call stack is basically the procedural low-level equivalent to the functional high-level continuation.)
So, what do other implementations of generators look like?
Lua doesn't have generators per se, but it has full asymmetric coroutines. The main C implementation uses setjmp/longjmp to implement them.
Ruby also doesn't have generators per se, but it has Enumerators, that can be used as generators. Enumerators are not part of the language, they are a library feature. MRI implements Enumerators using continuations, which in turn are implemented using setjmp/longjmp. YARV implements Enumerators using Fibers (which is how Ruby spells "coroutines"), and those are implemented using setjmp/longjmp. I believe JRuby currently implements Enumerators using threads, but they want to switch to something better as soon as the JVM gains some better control flow constructs.
Python has generators that are actually more or less full-blown coroutines. CPython implements them using setjmp/longjmp.

Related

Elixir and Haskell interoperability

As a platform for handling concurrent problems, Elixir/OTP seems to be the best suited solution.
When writing an application with a web interface, consider the case in which I want to reason about, and decouple, the application logic using another functional language - namely haskell (due to benefits like its advanced detection of errors at compile time, static typing, etc). I would then handle concurrency using GenServers, and attach a web interface using Phoenix.Channels.
Is this setup even possible using NIFs? Also, would true concurrency be maintained? I'm not sure that I'm following the correct line of reasoning here, but would a new haskell process be able to be spawned in line with GenServer demands, and would the two be able communicate efficiently?

This setup is certainly possible using NIFs and GHC's FFI with a small amount of boilerplate written in C. But NIFs are best used for short synchronous computations with no side effects and I get the feeling that that isn't what these operations are.
You'd probably be better off with C Nodes for the Haskell parts of the application. Most of the documentation you'll find for that will be for Erlang and not Elixir, but given Elixir's easy interop with Erlang, it should be pretty straight forward (someone's even written an example). Most of the hard work will be to do with writing a Haskell "C Node", for which a cursory glance at hackage and github turns up nothing.

What's the status of current Functional Reactive Programming implementations?

I'm trying to visualize some simple automatic physical systems (such things as pendulum, robot arms,etc.) in Haskell.
Often those systems can be described by equations like
df/dt = c*f(t) + u(t)
where u(t) represents some kind of 'intelligent control'. Those systems look to fit very nicely in the Functional Reactive Programming paradigm.
So I grabbed the book "The Haskell School of Expression" by Paul Hudak,
and found that the domain specific language "FAL" (for Functional Animation Language) presented there actually works quite pleasently for my simple toy systems (although some functions, notably integrate, seemed to be a bit too lazy for an efficient use, but easily fixable).
My question is, what's the more mature, up-to-date, well-maintained, performance-tuned alternative for more advanced, or even practical applications today?
This wiki page lists several options for Haskell, but I'm not clear about the following respects:
The status of "reactive", the project from Conal Eliott who is (as I understand it) one of the inventers of this programming paradigm, looks a bit stale. I love his code, but maybe I should try other more up-to-date alternatives? What's the primary difference between them, in terms of syntax/performance/runtime-stability?
To quote from a survey in 2011, Section 6, "... FRP implementations are still not efficient enough or predictable enough in performance to be used effectively in domains which require latency guarantees ...". Alghough the survey suggests some interesting possible optimizations, given the fact that FRP is there for more than 15 years, I get the impression that this performance problem might be something very or even inherently difficult to solve at least within a few years. Is this true?
The same author of the survey talks about "time leaks" in his blog. Is the problem unique to FRP, or something we are generally having when programming in a pure, non-strict language? Have you ever found it just too difficult to stabilize an FRP-based system over time, if not performant enough?
Is this still a research level project? Are the people like plant engineers, robotics engineers, financial engineers, etc. actually using them (in whaterver language that suits their needs)?
Although I personally prefer a Haskell implementation, I'm open to other suggestions. For example, it would be particularly fun to have an Erlang implementation --- it would then be very easy to have an intelligent, adaptive, self-learning server process!

Right now there are mainly two practical Haskell libraries out there for functional reactive programming. Both are maintained by single persons, but are receiving code contributions from other Haskell programmers as well:
Netwire focusses on efficiency, flexibility and predictability. It has its own event paradigm and can be used in areas where traditional FRP does not work, including network services and complex simulations. Style: applicative and/or arrowized. Initial author and maintainer: Ertugrul Söylemez (this is me).
reactive-banana builds on the traditional FRP paradigm. While it is practical to use it also serves as ground for classic FRP research. Its main focus is on user interfaces and there is a ready-made interface to wx. Style: applicative. Initial author and maintainer: Heinrich Apfelmus.
You should try both of them, but depending on your application you will likely find one or the other to be a better fit.
For games, networking, robot control and simulations you will find Netwire to be useful. It comes with ready-made wires for those applications, including various useful differentials, integrals and lots of functionality for transparent event handling. For a tutorial visit the documentation of the Control.Wire module on the page I linked.
For graphical user interfaces currently your best choice is reactive-banana. It already has a wx interface (as a separate library reactive-banana-wx) and Heinrich blogs a lot about FRP in this context including code samples.
To answer your other questions: FRP isn't suitable in scenarios where you need real-time predictability. This is largely due to Haskell, but unfortunately FRP is difficult to realize in lower level languages. As soon as Haskell itself becomes real-time-ready, FRP will get there, too. Conceptually Netwire is ready for real-time applications.
Time leaks aren't really a problem anymore, because they are largely related to the monadic framework. Practical FRP implementations simply don't offer a monadic interface. Yampa has started this and Netwire and reactive-banana both build on that.
I know of no commercial or otherwise large scale projects using FRP right now. The libraries are ready, but I think the people aren't – yet.

Although there are some good answers already, I'm going to attempt to answer your specific questions.
reactive is not usable for serious projects, due to time leak problems. (see #3). The current library with the most similar design is reactive-banana, which was developed with reactive as an inspiration, and in discussion with Conal Elliott.
Although Haskell itself is inappropriate for hard real-time applications, it is possible to use Haskell for soft realtime applications in some cases. I'm not familiar with current research, but I don't believe this is an insurmountable problem. I suspect that either systems like Yampa, or code generation systems like Atom, are possibly the best approach to solving this.
A "time leak" is a problem specific to switchable FRP. The leak occurs when a system is unable to free old objects because it may need them if a switch were to occur at some point in the future. In addition to a memory leak (which can be quite severe), another consequence is that, when the switch occurs, the system must pause while the chain of old objects is traversed to generate current state.
Non-switchable frp libraries such as Yampa and older versions of reactive-banana don't suffer from time leaks. Switchable frp libraries generally employ one of two schemes: either they have a special "creation monad" in which FRP values are created, or they use an "aging" type parameter to limit the contexts in which switches can occur. elerea (and possibly netwire?) use the former, whereas recent reactive-banana and grapefruit use the latter.
By "switchable frp", I mean one which implements Conal's function switcher :: Behavior a -> Event (Behavior a) -> Behavior a, or identical semantics. This means that the shape of the network can dynamically switch as it's run.
This doesn't really contradict #ertes's statement about monadic interfaces: it turns out that providing a Monad instance for an Event makes time leaks possible, and with either of the above approaches it's no longer possible to define the equivalent Monad instances.
Finally, although there's still a lot of work remaining to be done with FRP, I think some of the newer platforms (reactive-banana, elerea, netwire) are stable and mature enough that you can build reliable code from them. But you may need to spend a lot of time learning the ins and outs in order to understand how to get good performance.

I'm going to list a couple of items in the Mono and .Net space and one from the Haskell space that I found not too long ago. I'll start with Haskell.
Elm - link
Its description as per its site:
Elm aims to make front-end web development more pleasant. It
introduces a new approach to GUI programming that corrects the
systemic problems of HTML, CSS, and JavaScript. Elm allows you to
quickly and easily work with visual layout, use the canvas, manage
complicated user input, and escape from callback hell.
It has its own variant of FRP. From playing with its examples it seems pretty mature.
Reactive Extensions - link
Description from its front page:
The Reactive Extensions (Rx) is a library for composing asynchronous
and event-based programs using observable sequences and LINQ-style
query operators. Using Rx, developers represent asynchronous data
streams with Observables, query asynchronous data streams using LINQ
operators, and parameterize the concurrency in the asynchronous data
streams using Schedulers. Simply put, Rx = Observables + LINQ +
Schedulers.
Reactive Extensions comes from MSFT and implements many excellent operators that simplify handling events. It was open sourced just a couple of days ago. It's very mature and used in production; in my opinion it would have been a nicer API for the Windows 8 APIs than the TPL-library provides; because observables can be both hot and cold and retried/merged etc, while tasks always represent hot or done computations that are either running, faulted or completed.
I've written server-side code using Rx for asynchronocity, but I must admit that writing functionally in C# can be a bit annoying. F# has a couple of wrappers, but it's been hard to track the API development, because the group is relatively closed and isn't promoted by MSFT like other projects are.
Its open sourcing came with the open sourcing of its IL-to-JS compiler, so it could probably work well with JavaScript or Elm.
You could probably bind F#/C#/JS/Haskell together very nicely using a message broker, like RabbitMQ and SocksJS.
Bling UI Toolkit - link
Description from its front page:
Bling is a C#-based library for easily programming images, animations,
interactions, and visualizations on Microsoft's WPF/.NET. Bling is
oriented towards design technologists, i.e., designers who sometimes
program, to aid in the rapid prototyping of rich UI design ideas.
Students, artists, researchers, and hobbyists will also find Bling
useful as a tool for quickly expressing ideas or visualizations.
Bling's APIs and constructs are optimized for the fast programming of
throw away code as opposed to the careful programming of production
code.
Complimentary LtU-article.
I've tested this, but not worked with it for a client project. It looks awesome, has nice C# operator overloading that form the bindings between values. It uses dependency properties in WPF/SL/(WinRT) as event sources. Its 3D animations work well on reasonable hardware. I would use this if I end up on a project in need for visualizations; probably porting it to Windows 8.
ReactiveUI - link
Paul Betts, previously at MSFT, now at Github, wrote that framework. I've worked with it pretty extensively and like the model. It's more decoupled than Blink (by its nature from using Rx and its abstractions) - making it easier to unit test code using it. The github git client for Windows is written in this.
Comments
The reactive model is performant enough for most performance-demanding applications. If you are thinking of hard real-time, I'd wager that most GC-languages have problems. Rx, ReactiveUI create some amount of small object that need to be GCed, because that's how subscriptions are created/disposed and intermediate values are progressed in the reactive "monad" of callbacks. In general on .Net I prefer reactive programming over task-based programming because callbacks are static (known at compile time, no allocation) while tasks are dynamically allocated (not known, all calls need an instance, garbage created) - and lambdas compile into compiler-generated classes.
Obviously C# and F# are strictly evaluated, so time-leak isn't a problem here. Same for JS. It can be a problem with replayable or cached observables though.

Design patterns for building loosely-coupled systems in dynamic/scripting languages

I have lots of experience building enterprise apps using Java/C# and have become accustomed to all the trappings that come with object-oriented, statically typed languages. Specifically, I've become quite adept at dealing with system complexity by using the standard tools of the trade:
interfaces/abstract types
object composition
dependency inversion
I'm being asked to engineer a fairly complex back-end message processing system using a dynamic, functional language (Lua). Functional languages are all the rage these days (NodeJs, JavaScript etc), so I'm happy to use this as an opportunity to jump on said bandwagon.
Can anyone suggest a sample application or architecture I can use to learn about using things such as first class functions, closures, currying to build a complex, loosely coupled system?
Many thanks!

I will suggest looking on the libs/frameworks below, they are really well designed,
keep in mind that javascript and lua are very similar, just replace objects with
tables add coroutines and "nice" syntax and you have Lua.
Lua
Luvit node.js in Lua.
node.js
Express micro web framework.
Mocha Unit testing framework.

I've done a quite a bit of research on "design patterns" that can be applied in dynamic languages with first class function support, and here are my findings.
Currying == Dependency Injection. Currying allows you to take a function and repackage it as a new function with one or more of its parameter values already assigned. This is very similar to an IoC container instantiating a class "bootstrapped" with all it's dependencies and ready for consumption by clients.
First Class Functions == Command pattern. Since first class functions can be passed around like values, you basically get the Command pattern for free and without the overhead.
References:
First Class Functions == Command pattern
Functional Dependency Injection via Currying

Haskell, FFI, and Grand Central Dispatch?

I'm considering a functional language that will play well with my environment of C/Objective-C under FreeBSD, OSX, iOS. It looks like my best bet is to create functional-language libraries for specific functions, written in Haskell, compile with GHC, and use FFI to call this functional code as a standard C call.
My question is, how do I handle concurrency in this situation? One motivation for using a functional language is that for my problems where I want to operate on immutable datasets, I want to get a lot of parallelization going. However, using the approach I detail here, will I get ANY parallelization? It appears I can compile and dictate to use 2 threads, but is there any way to use GCD instead of threading (for all the reasons GCD is better than threads, such as the amount of parallelization automatically scaling per-platform)? Or, going with FFI as I describe, do I completely lose the ability to multithread?
This language seems like the best match for what I'm trying to do, but I want to learn if it's the right fit before I devote a significant amount of time to truly learn it

GHC's runtime replaces the need for GCD, doesn't it? And it already provides cross-platform parallelism.

Why are functional languages considered a boon for multi threaded environments?

I hear a lot about functional languages, and how they scale well because there is no state around a function; and therefore that function can be massively parallelized.
However, this makes little sense to me because almost all real-world practical programs need/have state to take care of. I also find it interesting that most major scaling libraries, i.e. MapReduce, are typically written in imperative languages like C or C++.
I'd like to hear from the functional camp where this hype I'm hearing is coming from..

It's important to add one word: "there's no shared state".
Any meaningful program (in any language) changes the state of the world. But (some) functional languages make it impossible to access the same resource from multiple threads simultaneously. The absence of shared state makes multithreading safe.

Functional languages such as Haskell, Scheme and others have what are called "pure functions". A pure function is a function with no side effects. It doesn't modify any other state in the program. This is by definition threadsafe.
Of course you can write pure functions in imperative languages. You also find multi-paradigm languages like Python, Ruby and even C# where you can do imperative programming, functional programming or both.
But the point of Haskell (etc) is that you can't write a non-pure function. Well that's not strictly true but it's mostly true.
Similarly, many imperative languages have immutable objects for much the same reason. An immutable object is one whose state doesn't change once created. Again by definition an immutable object is threadsafe.

You're talking about two different things and don't realize it.
Yes, most real-world programs have state somewhere, but if you want to do multithreading, that state should not be everywhere, and in fact, the fewer places it's in, the better. In functional programs, the default is not to have state, and you can introduce state exactly where you need it and nowhere else. Those parts that are dealing with state will not be as easily multithreaded, but since all the rest of your program is free of side-effects and thus it doesn't matter what order those parts are executed in, it removes a huge barrier to parallelization.

However, this makes little sense to me because almost all real-world
practical programs need/have state to take care of.
You'd be surprised! Yes, all programs need some state (I/O in particular) but often you don't need much more. Just because most programs have heaps of state doesn't mean they need it.
Programming in a functional language encourages you to use less state, and thus your programs become easier to parallelise.
Many functional languages are "impure" which means they allow some state. Haskell doesn't, but Haskell has monads which basically let you get something from nothing: you get state using stateless constructs. Monads are a bit fiddly to work with which is why Haskell gives you a strong incentive to restrict state to as small a part of your program as possible.
I also find it interesting that most major scaling libraries, i.e.
MapReduce, are typically written in imperative languages like C or C++.
Programming concurrent applications is "hard" in C/C++. That's why it's best to do all the dangerous stuff in a library which is heavily tested and inspected. But you still get the flexibility and performance of C/C++.

Higher order functions. Consider a simple reduction operation, summing the elements of an array. In an imperative language, programmers typically write themselves a loop and perform reductions one element at a time.
But that code isn't easy to make multi-threaded. When you write a loop you're assuming an order of operations and you have to spell out how to get from one element to the next. You'd really like to just say "sum the array" and have the compiler, or runtime, or whatever, make the decision about how to work through the array, dividing up the task as necessary between multiple cores, and combining those results together. So instead of writing a loop, with some addition code embedded inside it, an alternative is to pass something representing "addition" into a function that can do the divvying. As soon as you do that, you're writing functionally. You're passing a function (addition) into another function (the reducer). If you write this way then it not only makes more readable code, but when you change architecture, or want to write for heterogeneous architecture, you don't have to change the summer, just the reducer. In practice you might have many different algorithms that all share one reducer so this is a big payoff.
This is just a simple example. You may want to build on this. Functions to apply other functions on 2D arrays, functions to apply functions to tree structures, functions to combine functions to apply functions (eg. if you have a hierarchical structure with trees above and arrays below) and so on.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string