Expression trees vs IL.Emit for runtime code specialization - c#-4.0

I recently learned that it is possible to generate C# code at runtime and I would like to put this feature to use. I have code that does some very basic geometric calculations like computing line-plane intersections and I think I could gain some performance benefits by generating specialized code for some of the methods because many of the calculations are performed for the same plane or the same line over and over again. By specializing the code that computes the intersections I think I should be able to gain some performance benefits.
The problem is that I'm not sure where to begin. From reading a few blog posts and browsing MSDN documentation I've come across two possible strategies for generating code at runtime: Expression trees and IL.Emit. Using expression trees seems much easier because there is no need to learn anything about OpCodes and various other MSIL related intricacies but I'm not sure if expression trees are as fast as manually generated MSIL. So are there any suggestions on which method I should go with?

The performance of both is generally same, as expression trees internally are traversed and emitted as IL using the same underlying system functions that you would be using yourself. It is theoretically possible to emit a more efficient IL using low-level functions, but I doubt that there would be any practically important performance gain. That would depend on the task, but I have not come of any practical optimisation of emitted IL, compared to one emitted by expression trees.
I highly suggest getting the tool called ILSpy that reverse-compiles CLR assemblies. With that you can look at the code actually traversing the expression trees and actually emitting IL.
Finally, a caveat. I have used expression trees in a language parser, where function calls are bound to grammar rules that are compiled from a file at runtime. Compiled is a key here. For many problems I came across, when what you want to achieve is known at compile time, then you would not gain much performance by runtime code generation. Some CLR JIT optimizations might be also unavailable to dynamic code. This is only an opinion from my practice, and your domain would be different, but if performance is critical, I would rather look at native code, highly optimized libraries. Some of the work I have done would be snail slow if not using LAPACK/MKL. But that is only a piece of the advice not asked for, so take it with a grain of salt.

If I were in your situation, I would try alternatives from high level to low level, in increasing "needed time & effort" and decreasing reusability order, and I would stop as soon as the performance is good enough for the time being, i.e.:
first, I'd check to see if Math.NET, LAPACK or some similar numeric library already has similar functionality, or I can adapt/extend the code to my needs;
second, I'd try Expression Trees;
third, I'd check Roslyn Project (even though it is in prerelease version);
fourth, I'd think about writing common routines with unsafe C code;
[fifth, I'd think about quitting and starting a new career in a different profession :) ],
and only if none of these work out, would I be so hopeless to try emitting IL at run time.
But perhaps I'm biased against low level approaches; your expertise, experience and point of view might be different.

Related

GHC Partial Evaluation and Separate Compilation

Whole-program compilers like MLton create optimized binaries in part to their ability to use the total source of the binary to perform partial evaluation: aggressively inlining constants and evaluating them until stuck—all during compilation!
This has been explored public ally a bit in the Haskell space by Gabriel Gonzalez's Morte.
Now my understanding is that Haskell does not do very much of this—if any at all. The cited reason I understand is that it is antithetical to separate compilation. This makes sense to prohibit partial evaluation across source-file boundaries, but it seems like in-file partial evaluation would still be an option.
As far as I know, in-file partial evaluation is still not performed, though.
My question is: is this true? If so, what are the tradeoffs for performing in-file partial evaluation? If not, what is an example file where one can improve compiled performance by putting more functionality into the same file?
(Edit: To clarify the above, I know there are a lot of questions as to what the best set of reductions to perform are—many are undecidable! I'd like to know the tradeoffs made in an "industrial strength" compiler with separate compilation that live at a level above choosing the right equational theory if there are any interesting things to talk about there. Things like compilation speed or file bloat are more toward the scope I'm interested in. Another question in the same space might be: "Why can't MLton get separate compilation just by compiling each module separately, leaving the API exposed, and then linking them all together?")
This is definitely an optimization that a small set of people are interested in and are pursuing. The Google search term to find information on it is "supercompilation". I believe there are at least two approaches floating about at the moment.
It seems one of the big tradeoffs is compilation-time resources (time and memory both), and at the moment the performance wins of paying these costs appear to be somewhat unpredictable. There's quite some work left. A few links:
A page on the GHC wiki
Neil Mitchell's Supero
Max Bolingbroke's Supercompilation by evaluation

Haskell for mission-critical systems [duplicate]

I've been curious to understand if it is possible to apply the power of Haskell to embedded realtime world, and in googling have found the Atom package. I'd assume that in the complex case the code might have all the classical C bugs - crashes, memory corruptions, etc, which would then need to be traced to the original Haskell code that
caused them. So, this is the first part of the question: "If you had the experience with Atom, how did you deal with the task of debugging the low-level bugs in compiled C code and fixing them in Haskell original code ?"
I searched for some more examples for Atom, this blog post mentions the resulting C code 22KLOC (and obviously no code:), the included example is a toy. This and this references have a bit more practical code, but this is where this ends. And the reason I put "sizable" in the subject is, I'm most interested if you might share your experiences of working with the generated C code in the range of 300KLOC+.
As I am a Haskell newbie, obviously there may be other ways that I did not find due to my unknown unknowns, so any other pointers for self-education in this area would be greatly appreciated - and this is the second part of the question - "what would be some other practical methods (if) of doing real-time development in Haskell?". If the multicore is also in the picture, that's an extra plus :-)
(About usage of Haskell itself for this purpose: from what I read in this blog post, the garbage collection and laziness in Haskell makes it rather nondeterministic scheduling-wise, but maybe in two years something has changed. Real world Haskell programming question on SO was the closest that I could find to this topic)
Note: "real-time" above is would be closer to "hard realtime" - I'm curious if it is possible to ensure that the pause time when the main task is not executing is under 0.5ms.
At Galois we use Haskell for two things:
Soft real time (OS device layers, networking), where 1-5 ms response times are plausible. GHC generates fast code, and has plenty of support for tuning the garbage collector and scheduler to get the right timings.
for true real time systems EDSLs are used to generate code for other languages that provide stronger timing guarantees. E.g. Cryptol, Atom and Copilot.
So be careful to distinguish the EDSL (Copilot or Atom) from the host language (Haskell).
Some examples of critical systems, and in some cases, real-time systems, either written or generated from Haskell, produced by Galois.
EDSLs
Copilot: A Hard Real-Time Runtime Monitor -- a DSL for real-time avionics monitoring
Equivalence and Safety Checking in Cryptol -- a DSL for cryptographic components of critical systems
Systems
HaLVM -- a lightweight microkernel for embedded and mobile applications
TSE -- a cross-domain (security level) network appliance
It will be a long time before there is a Haskell system that fits in small memory and can guarantee sub-millisecond pause times. The community of Haskell implementors just doesn't seem to be interested in this kind of target.
There is healthy interest in using Haskell or something Haskell-like to compile down to something very efficient; for example, Bluespec compiles to hardware.
I don't think it will meet your needs, but if you're interested in functional programming and embedded systems you should learn about Erlang.
Andrew,
Yes, it can be tricky to debug problems through the generated code back to the original source. One thing Atom provides is a means to probe internal expressions, then leaves if up to the user how to handle these probes. For vehicle testing, we build a transmitter (in Atom) and stream the probes out over a CAN bus. We can then capture this data, formated it, then view it with tools like GTKWave, either in post-processing or realtime. For software simulation, probes are handled differently. Instead of getting probe data from a CAN protocol, hooks are made to the C code to lift the probe values directly. The probe values are then used in the unit testing framework (distributed with Atom) to determine if a test passes or fails and to calculate simulation coverage.
I don't think Haskell, or other Garbage Collected languages are very well-suited to hard-realtime systems, as GC's tend to amortize their runtimes into short pauses.
Writing in Atom is not exactly programming in Haskell, as Haskell here can be seen as purely a preprocessor for the actual program you are writing.
I think Haskell is an awesome preprocessor, and using DSEL's like Atom is probably a great way to create sizable hard-realtime systems, but I don't know if Atom fits the bill or not. If it doesn't, I'm pretty sure it is possible (and I encourage anyone who does!) to implement a DSEL that does.
Having a very strong pre-processor like Haskell for a low-level language opens up a huge window of opportunity to implement abstractions through code-generation that are much more clumsy when implemented as C code text generators.
I've been fooling around with Atom. It is pretty cool, but I think it is best for small systems. Yes it runs in trucks and buses and implements real-world, critical applications, but that doesn't mean those applications are necessarily large or complex. It really is for hard-real-time apps and goes to great lengths to make every operation take the exact same amount of time. For example, instead of an if/else statement that conditionally executes one of two code branches that might differ in running time, it has a "mux" statement that always executes both branches before conditionally selecting one of the two computed values (so the total execution time is the same whichever value is selected). It doesn't have any significant type system other than built-in types (comparable to C's) that are enforced through GADT values passed through the Atom monad. The author is working on a static verification tool that analyzes the output C code, which is pretty cool (it uses an SMT solver), but I think Atom would benefit from more source-level features and checks. Even in my toy-sized app (LED flashlight controller), I've made a number of newbie errors that someone more experienced with the package might avoid, but that resulted in buggy output code that I'd rather have been caught by the compiler instead of through testing. On the other hand, it's still at version 0.1.something so improvements are undoubtedly coming.

Using Polymorphic Code for Legitimate Purposes?

I recently came across the term Polymorphic Code, and was wondering if anyone could suggest a legitimate (i.e. in legal and business appropriate software) reason to use it in a computer program? Links to real world examples would be appreciated!
Before someone answers, telling us all about the benefits of polymorphism in object oriented programming, please read the following definition for polymorphic code (taken from Wikipedia):
"Polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact. That is, the code changes itself each time it runs, but the function of the code in whole will not change at all."
Thanks, MagicAndi.
Update
Summary of answers so far:
Runtime optimization of the original code
Assigning a "DNA fingerprint" to each individual copy of an application
Obfuscate a program to prevent reverse-engineering
I was also introduced to the term 'metamorphic code'.
Runtime optimization of the original code, based on actual performance statistics gathered when running the application in its real environment and real inputs.
Digitally watermarking music is something often done to determine who was responsible for leaking a track, for example. It makes each copy of the music unique so that copies can be traced back to the original owner, but doesn't affect the audible qualities of the track.
Something similar could be done for compiled software by running each individual copy through a polymorphic engine before distributing it. Then if a cracked version of this software is released onto the Internet, the developer might be able to tell who cracked it by looking for specific variations produced the polymorphic engine (a sort of DNA test). As far as I know, this technique has never been used in practice.
It's not exactly what you were looking for I guess, since the polymorphic engine is not distributed with the code, but I think it's the closest to a legitimate business use you will find for this kind of technique.
Polymorphic code is a nice thing, but metamorphic is even nicer. To the legitimate uses: well, I can't think of anything other than anti-cracking and copy protection. Look at vx.org.ua if you wan't real world uses (not that legitimate though)
As Sami notes, on-the-fly optimisation is an excellent application of polymorphic code. A great example of this is the Fastest Fourier Transform in the West. It has a number of solvers at its disposal, which it combines with self-profiling to adjust the code path and solver parameters on subsequent executions. The result is the program optimises itself for your computing environment, getting faster with subsequent runs!
A related idea that may possibly be of interest is computational steering. This is the practice of altering the execution path of large simulations as the run proceeds, to focus on areas of interest to the researcher. The overall purpose of the simulation is not changed, but the feedback cycle acts to optimise the calculation. In this case the executable code is not being explicitly rewritten, but the effect from a user perspective is similar.
Polymorph code can be used to obfuscate weak or proprietary algorithms - that may use encryption e. g.. There're many "legitimate" uses for that. The term legitimate these days is kind of narrow-minded when it comes to IT. The core-paradigms of IT contain security. Whether you use polymorph shellcode in exploits or detect such code with an AV scanner. You have to know about it.
Obfuscate a program i.e. prevent reverse-engineering: goal being to protect IP (Intellectual Property).

Trivial mathematical problems as language benchmarks

Why do people insist on using trivial mathematical problems like finding numbers in the Fibonacci sequence for language benchmarks? Don't these usually get optimized to relativistic speeds? Isn't the brunt of the bottlenecks usually in I/O, system API calls, operations on strings and structures, processing large quantities of data, abstract object-oriented stuff, etc?
It is a throwback to the old days, when compiler technology for what we would now call basic math was still evolving rapidly.
Now, compiler evolution is more focused on exploiting new instructions for niche operations, 64-bit math, and so on.
Micro-benchmarks such as the ones you mention were useful, though, when evaluating the efficiency of the hotspot compiler when Java was first launched, and in evaluating the efficiency of .NET versus C/C++.
Your suggestion that I/O and system calls are the likely bottlenecks is correct, at least for some space of problems. But I notice you suggested string operations. One person's irrelevant micro-benchmark is another person's critical performance metric.
EDIT: ps, I also remember using linpack and other micro-benchmarks to compare versions of the JVM, and to compare vendors of the JVM. From v4 to v5 there was a big jump in perf, I guess the JIT compiler got more effective. Also, IBM's JVM was ahead of Sun's at that time, on Windows-x86.
Because if you want to benchmark the language/compiler, these "math problems" are good indicators of the "bare speed" of the generated code. Either they use the iterative solution, which is a tight loop and indicates how well can the compiler push the instructions to the processor, or they use the recursive solution, which indicates how does it handle recursive calls of short functions (inlining, tail-recursion etc.) (although the Ackermann function is usually used for that too).
Usually, the benchmark suite for the language contain tests benchmarking other parts as well - eg. gzip compression, text searching, object creation, virtual function call, exception throw/catch benchmarks.
The other things you've noticed, syscalls and IO are usually not included because
syscalls are in fact not that slow - applications don't spend significant porion of the time in the kernel, except for test specifically targeted at them or when something is seriously wrong with the program
syscall and IO performance does not depend on the language, but rather on the OS & hardware
I'd think a simple, well-established algorithm would remove the possibility that the benchmark is biased (whether through ignorance or malice) to favor one language. It is very difficult to write a complex program in two different languages exactly the same. Testing something like the efficiency of a multithreaded application in c# vs java, for example, would require developers skilled in multithreaded development both languages, and there would still be questions as to whether the benchmark app properly represents the general case, or if it is misrepresenting a special case that only one language handles well.
Back when the sieve of eratosthanes was a popular benchmark for C compilers, I thought it would be funny if one of the compiler authors would recognize the sieve code and replace it with a pre-computed lookup.

switch to parallel coding

we all writing code for single processor.
i wonder when we all are able to write code on multi processors?
what do we need (software tools, logic, algorithms) for this switching?
edit: in my view, as we do many task parallely, same way we need to convert those real life solutions(algorithms) to computer lang. just as OOPs coding did for procedural coding. OOPs is more real life coding style than procedural one. so i hope for that kind of solutions.
I think the most important requirement is a good language that has native constructs that support parallelism or one that can automatically generate parallel code. There are quite a few languages that fit that description, but none of them is popular enough to really be considered for mainstream use. That, in turn is caused by several things:
By their very nature, these languages are very different from today's imperative languages, and are therefor harder to learn (or at least seem that way).
They often lack good tools and libraries, making them unusable for any "real" project.
Of course, if it were more popular more people would be willing to learn it and there would be more support, so it's a kind of cycle that's pretty hard to break out of. I guess all we can do is hope. :)
An example of a language designed with heavy parallelization in mind is Erlang - and it's actually used in commercial projects.
What we need are natural abstractions for highly-concurrent algorithms. Actors (think: Erlang) go a long way in this direction, but they aren't a one-size-fits-all solution. Some more specific abstractions like fork/join or map/reduce can be even easier to apply to common problems.
The trick with all of these concurrency abstractions is they require functional-style programming. Concurrency doesn't mesh well with shared mutable state. As they say, "Locks considered harmful". Since most developers come from a strictly imperative background, switching to a shared-nothing continuation passing approach is often extremely challenging.
Incidentally, with respect to concurrency abstractions, Clojure has some very interesting features in this direction. Not only does it have sort-of actors, but it also defines a transactional memory model (think: databases) along with a global, atomic references mechanism. These two features allow concurrent operations to share "mutable" state without ever having to worry about locking or race conditions.
In the end, it comes down to education. Much of the needed theoretical work into concurrency abstractions has already been done, we just need to accept it. Unfortunately, as Erlang and Haskell prove, sometimes the best ideas remain relegated to an extremely fringe demographic. Hopefully efforts like Scala and Clojure will succeed in bringing the more advanced abstractions into the mainstream by sneaking them onto an existing, well-supported platform (the JVM).
Unfortunately for massive concurrent programming - unless there is a breakthrough in compilers to help, we will be throwing out a lot of what we know about algorithms (I think Don Knuth even said that). Read about Erlang for a glimpse of this possible future.
There are several tools/languages that are popular or are gaining popularity. If you use FORTRAN, C, or C++, you can use OpenMP (not too hard to implement) or the Message Passing Interface (MPI) libraries (powerful and greatest speedup potential, but also complex and difficult). OpenMP uses preprocessor directives to mark areas that can be parallelized, especially loops. MPI uses messages that pass data back and forth between processes, and the greatest difficulty is keeping everything synchronized without hitting bottlenecks and keeping processes waiting. I would say MPI is definitely on the way out, however. It's become clear in the scientific/high-performance computing communities that the speedup is rarely worth the additional development time.
As for up and coming languages, check out Fortress. It's still being designed, but the goal is to create a language even easier for scientific computing than FORTRAN. Programs will be specified in a very high level mathematical syntax. Additionally, parallelism will be implicit; the programmer will have to work to do things in serial. Plus, it's being championed by Sun and is based on java, so it will be portable.
There is no simple answer, and in many ways even the complex answers are currently inadequate or incomplete. You'll get a better answer if you are more specific about the replies you want: pointers to dev libraries and tools, instructional materials, pointers to current research projects and issues in this area, or something else?
The most important requirement is to be able to split your problem into smaller problems that can be solved independently of each other. Once you've worked out how you're going to do that, everything else is easier to think about and further questions of implementation (e.g. "parts of my calculation depend on other parts - how do I wait for them to have finished?") become concrete, specific things you can research or ask here about.
for java you can now look to Parallel Java Library or DPJ(deterministic Parallel Java!)
It will offer you great help in extracting parallelism from codes!!

Resources