For my PHP LLVM backend I'd like to try out the OCaml GC. Is it possible to use it with LLVM? Especially:
Is the OCaml GC decoupled enough to be used outside of the compiler?
Is the LLVM GC interface mature enough to be used with the OCaml GC?
This shouldn't represent too much work as the OCaml GC is already handled in some way in LLVM: . This means that stack frames descriptors are correctly emitted for function calls (not the smallest ones, but this should improve with current LLVM GC handling developments). An old version of LLVM's documentation tells that OCaml gc doesn't use write barriers, which is erroneous. So you should be careful to ensure that the generated code is correct for assignments.
For the LLVM GC interface, the current one is quite restricted and does not allows to generate very efficient code, but this should be sufficient to prototype while waiting for the next version that should contain some important changes on that side.
While it appears as though it would be relatively easy to tear out the OCaml GC and Frankenstein it into a different project, I'm not sure this is really something you would want to do in practice.
The OCaml garbage collector was designed with a functional programming style in mind, and this GC architecture might be a liability for a language such as PHP, which is usually not used in a functional style.
If you are set on doing this then I would suggest either waiting a few months for multicore support to be accepted into the OCaml compiler/runtime or using one of the various projects trying to bring multicore support to OCaml at the moment (the most serious of these would probably be this project by the people over at OCamllabs). Right now the OCaml GC lacks true multicore support, and while this isn't really much of a problem in practice, some people can't seem to live without it.
I read these sentences in wikipedia about ROP:
"Return-oriented programming is an advanced version of a stack smashing attack. Generally, these types of attacks arise when an adversary manipulates the call stack by taking advantage of a bug in the program, often a buffer overrun."
That means if buffer overrun don't occur, ROP will not occur. But some compilers (in my case LLVM) supports detection of buffer overflow, but defense against ROP is open in them.
I'm confused. Is there something that I didn't consider?
According to this Wikipedia article
Clang supports three buffer overflow detectors, namely
AddressSanitizer (-fsanitize=address), -fsanitize=bounds, and
SafeCode. These systems have different tradeoffs in terms of
performance penalty, memory overhead, and classes of detected bugs.
So it can only detect certain classes of bugs (not all of them), which means that it has false negatives.
The problem mainly lies with the fact that any static analysis of programs cannot, in general, be both sound and complete. That is, any static analysis trying to detect buffer overflow will either have false positives and/or false negatives. This is a corollary of Rice's theorem, which intuitively states that any nontrivial property of programs is generally undecidable. The word "generally" here is important and means for all programs.
A false positive is when a static analysis flags a program statement as a buffer overflow while it is not.
A false negative is when a static analysis flags a program statement as a safe buffer access while it is not.
The most widely adopted approach in many fields not just buffer overflow detection (e.g., signature-based intrusion detection) is to tolerate false negatives rather false positives because false positives will otherwise be too many and will inundate programmers and obscure the real problems. That approach is also applied if the detection problem is decidable but too complex (e.g., NP-hard) to solve exactly. Bottom line, approximations permeate computer science.
Several techniques can be used to detect the possibility of a buffer overflow at run-time.
One, very cheap and enabled by default in code generated by modern compilers, only detects buffer overflows on the stack, only probabilistically (the buffer overflow has, say, (232-1)/232 chances of being detected), and only checks for them just before returning from protected functions (which is a very good time to check but means the detection is not instantaneous). It works by inserting a “canary” value that the attacker wouldn't be able to predict at the top of the stack frame when the function begins its execution, and by checking the value of the canary at the end of the function 's execution.
The above technique is very interesting because it is cheap; in one single check, it protects well against all stack-based buffer overflows that could have happened during the function. However:
It does not protect against heap-based buffer overflows.
The attacker can limit themselves to a small stack-based buffer overflow in order to change the contents of some local variables and function arguments, without overwriting the canary. This may be enough to gain control of the target.
The attacker can try their luck at guessing the value of the canary. With one chance in 232 at each try, if they are allowed to cause as many buffer overflows as they want (say, in a server that is not monitored and is automatically restarted after each crash), they may eventually get lucky.
According to this article, GCC developers felt that even protecting every function with a single check was still too much (the option I have been describing is called -fstack-protector in GCC), and they used heuristics to omit the check from various functions, including some where it could have been useful.
At the other end of the spectrum, there exist techniques that detect all possibilities of buffer overflow at run-time, at a greater cost.
There is nothing impossible about systematically preventing all buffer overflows at run-time. It is only more expensive than the above cheap check. Some techniques, while more efficient, are cheap enough to belong in compiler features, so you many find them as disabled-by-default options in Clang or GCC. The techniques that detect all buffer overflows, on heap as well as on stack, incur an overhead that can go up to, say, 900% (they make execution 10 times slower). Most people in charge of deploying software do not find this compromise acceptable, and thus these techniques are found only in specialized academic C compilers, in source-to-source transformation tools sold separately from the compiler, or in sound static analyzers used as C interpreters.
To reiterate, there is nothing impossible about detecting all buffer overflows at run-time. Rice's theorem does not apply. The instrumentation techniques to do so are only too expensive (in run-time speed) to be used in practice.
Another possibility, which is differently expensive, is to statically check that the program does not have any possibility of buffer overflow, for any possible input that might be sent to it. This way, the program can be compiled with an ordinary compiler and run at full speed without risk of remote code execution. This is where Rice's theorem begin to apply: it says that it is impossible to make an automatic static analyzer that guarantees that all safe programs are safe. This is no issue in practice, because Rice's theorem does not say that it is impossible to guarantee that one particular safe program is safe.
The important thing is to build a static analyzer that never says “safe” for a program that it cannot guarantee to be safe. The static analyzer is always allowed to say “maybe”, and the only difficulty in practice is if it says “maybe” too often, because it is never sure that any program is safe.
The above kind of static analyzer is called a sound static analyzer. They are a little difficult to use and are mostly discussed in academia only, but for instance, I work at a company that sells a sound static analyzer and the expertise to apply it to security-critical C software components. The first C library we verified is PolarSSL, used in a specific configuration. Because we have checked that no buffer overflow can occur for any message sent from the network to a PolarSSL server in the configuration we chose, it can be compiled with an ordinary compiler and is safe from all consequences of buffer overflows (and generally of C's undefined behaviors), including ROP attacks.
I am looking for a programming language that is fast like C and C++ and has a garbage collector and is not prone to buffer overflows. I am looking for something between Java/C# and C/C++. Is there such a language?
Checking for buffer overflows and collecting garbage has a cost: if you need these features, then you will not get the speed of C/C++. Tradeoff.
Java and C# are very, very close to C++ speed in most types of applications, so unless you need something very specific, I suggest you go with one of those 2 languages.
If you just want a garbage collector for C++, you can get one here.
You could take a look at D. It's a compiled language with most of the features from C++ in addition to garbage collection and some others.
Language "speed" is highly application dependent. The JVM is darn fast for certain kinds of code--hot spot can actually be faster than native code. On the other hand, functional style and a good optimized can let you get good performance with less code--often Haskell apps are as fast in practice as ones in C.
For a real cross of Java/C# and C++ the best place to look is the D language. It has garbage collection, and optional access to malloc and free and even inline assembly for C level performance. It has enough safety to be less prone to buffer overflows, but you can still have them.
You can always garbage collect C/C++, but it will cost you. Java, Haskell, ML, even Python can use garbage collectors that know what values might be pointers, so are faster than using a collector for C, C++, or D.
what are the ideas of preventing buffer overflow attacks? and i heard about Stackguard,but until now is this problem completely solved by applying stackguard or combination of it with other techniques?
after warm up, as an experienced programmer
Why do you think that it is so
difficult to provide adequate
defenses for buffer overflow attacks?
Edit: thanks for all answers and keeping security tag active:)
There's a bunch of things you can do. In no particular order...
First, if your language choices are equally split (or close to equally split) between one that allows direct memory access and one that doesn't , choose the one that doesn't. That is, use Perl, Python, Lisp, Java, etc over C/C++. This isn't always an option, but it does help prevent you from shooting yourself in the foot.
Second, in languages where you have direct memory access, if classes are available that handle the memory for you, like std::string, use them. Prefer well exercised classes to classes that have fewer users. More use means that simpler problems are more likely to have been discovered in regular usage.
Third, use compiler options like ASLR and DEP. Use any security related compiler options that your application offers. This won't prevent buffer overflows, but will help mitigate the impact of any overflows.
Fourth, use static code analysis tools like Fortify, Qualys, or Veracode's service to discover overflows that you didn't mean to code. Then fix the stuff that's discovered.
Fifth, learn how overflows work, and how to spot them in code. All your coworkers should learn this, too. Create an organization-wide policy that requires people be trained in how overruns (and other vulns) work.
Sixth, do secure code reviews separately from regular code reviews. Regular code reviews make sure code works, that it passes functional tests, and that it meets coding policy (indentation, naming conventions, etc). Secure code reviews are specifically, explicitly, and only intended to look for security issues. Do secure code reviews on all code that you can. If you have to prioritize, start with mission critical stuff, stuff where problems are likely (where trust boundaries are crossed (learn about data flow diagrams and threat models and create them), where interpreters are used, and especially where user input is passed/stored/retrieved, including data retrieved from your database).
Seventh, if you have the money, hire a good consultant like Neohapsis, VSR, Matasano, etc. to review your product. They'll find far more than overruns, and your product will be all the better for it.
Eighth, make sure your QA team knows how overruns work and how to test for them. QA should have test cases specifically designed to find overruns in all inputs.
Ninth, do fuzzing. Fuzzing finds an amazingly large number of overflows in many products.
Edited to add: I misread the question. THe title says, "what are the techniques" but the text says "why is it hard".
It's hard because it's so easy to make a mistake. Little mistakes, like off-by-one errors or numeric conversions, can lead to overflows. Programs are complex beassts, with complex interactions. Where there's complexity there's problems.
Or, to turn the question back on you: why is it so hard to write bug-free code?
Buffer overflow exploits can be prevented. If programmers were perfect, there would be no
unchecked buffers, and consequently, no buffer overflow exploits. However, programmers are not
perfect, and unchecked buffers continue to abound.
Only one technique is necessary: Don't trust data from external sources.
There's no magic bullet for security: you have to design carefully, code carefully, hold code reviews, test, and arrange to fix vulnerabilities as they arise.
Fortunately, the specific case of buffer overflows has been a solved problem for a long time. Most programming languages have array bounds checking and do not allow programs to make up pointers. Just don't use the few that permit buffer overflows, such as C and C++.
Of course, this applies to the whole software stack, from embedded firmware¹ up to your application.
¹ For those of you not familiar with the technologies involved, this exploit can allow an attacker on the network to wake up and take control of a powered off computer. (Typical firewall configurations block the offending packets.)
You can run analyzers to help you find problems before the code goes into production. Our Memory Safety Checker will find buffer overuns, bad pointer faults, array access errors, and memory management mistakes in C code, by instrumenting your code to watch for mistakes at the moment they are made. If you want the C program to be impervious to such errors, you can simply use the results of the Memory Safety analyzer as the production version of your code.
In modern exploitation the big three are:
NX Bit
Modern builds of GCC applies Canaries by default. Not all ASLR is created equally, Windows 7, Linux and *BSD have some of the best ASLR. OSX has by far the worst ASLR implementation, its trivial to bypass. Some of the most advanced buffer overflow attacks use exotic methods to bypass ASLR. The NX Bit is by far the easist method to byapss, return-to-libc style attacks make it a non-issue for exploit developers.
If you've bought into the functional programming paradigm, the chances are that you like both Erlang and Haskell. Both have purely functional cores and other goodness such as lightweight threads that make them a good fit for a multicore world. But there are some differences too.
Erlang is a commercially proven fault-tolerant language with a mature distribution model. It has a seemingly unique feature in its ability to upgrade its version at runtime via hot code loading. (Way cool!)
Haskell, on the otherhand, has the most sophisticated type system of any mainstream language. (Where I define 'mainstream' to be any language that has a published O'Reilly book so Haskell counts.) Its straightline single threaded performance looks superior to Erlang's and its lightweight threads look even lighter too.
I am trying to put together a development platform for the rest of my coding life and was wondering whether it was possible to mix Erlang and Haskell to achieve a best of breed platform. This question has two parts:
I'd like to use Erlang as a kind of fault tolerant MPI to glue GHC runtime instances together. There would be one Erlang process per GHC runtime. If "the impossible happened" and the GHC runtime died, then the Erlang process would detect that somehow and die too. Erlang's hot code loading and distribution features would just continue to work. The GHC runtime could be configured to use just one core, or all cores on the local machine, or any combination in between. Once the Erlang library was written, the rest of the Erlang level code should be purely boilerplate and automatically generated on a per application basis. (Perhaps by a Haskell DSL for example.) How does one achieve at least some of these things?
I'd like Erlang and Haskell to be able to share the same garabage collector. (This is a much further out idea than 1.) Languages that run on the JVM and the CLR achieve greater mass by sharing a runtime. I understand there are technical limitations to running Erlang (hot code loading) and Haskell (higher kinded polymorphism) on either the JVM or the CLR. But what about unbundling just the garbage collector? (Sort of the start of a runtime for functional languages.) Allocation would obviously still have to be really fast, so maybe that bit needs to be statically linked in. And there should be some mechansim to distinguish the mutable heap from the immutable heap (incuding lazy write once memory) as GHC needs this. Would it be feasible to modify both HIPE and GHC so that the garbage collectors could share a heap?
Please answer with any experiences (positive or negative), ideas or suggestions. In fact, any feedback (short of straight abuse!) is welcome.
Thanks for all 4 replies to date - each taught me at least one useful thing that I did not know.
Regarding the rest of coding life thing - I included it slightly tongue in cheek to spark debate, but it is actually true. There is a project that I have in mind that I intend to work on until I die, and it needs a stable platform.
In the platform I have proposed above, I would only write Haskell, as the boilerplate Erlang would be automatically generated. So how long will Haskell last? Well Lisp is still with us and doesn't look like it is going away anytime soon. Haskell is BSD3 open source and has achieved critical mass. If programming itself is still around in 50 years time, I would expect Haskell, or some continuous evolution of Haskell, will still be here.
Update 2 in response to rvirding's post
Agreed - implementing a complete "Erskell/Haslang" universal virtual machine might not be absolutely impossible, but it would certainly be very difficult indeed. Sharing just the garbage collector level as something like a VM, while still difficult, sounds an order of magnitude less difficult to me though. At the garbage collection model, functional languages must have a lot in common - the unbiquity of immutable data (including thunks) and the requirement for very fast allocation. So the fact that commonality is bundled tightly with monolithic VMs seems kind of odd.
VMs do help achieve critical mass. Just look at how 'lite' functional languages like F# and Scala have taken off. Scala may not have the absolute fault tolerance of Erlang, but it offers an escape route for the very many folks who are tied to the JVM.
While having a single heap makes
message passing very fast it
introduces a number of other problems,
mainly that doing GC becomes more
difficult as it has to be interactive
and globally non-interruptive so you
can't use the same simpler algorithms
as the per-process heap model.
Absolutely, that makes perfect sense to me. The very smart people on the GHC development team appear to be trying to solve part of the problem with a parallel "stop the world" GC.
(Obviously "stop the world" would not fly for general Erlang given its main use case.) But even in the use cases where "stop the world" is OK, their speedups do not appear to be universal. So I agree with you, it is unlikely that there is a universally best GC, which is the reason I specified in part 1. of my question that
The GHC runtime could be configured to
use just one core, or all cores on the
local machine, or any combination in
In that way, for a given use case, I could, after benchmarking, choose to go the Erlang way, and run one GHC runtime (with a singlethreaded GC) plus one Erlang process per core and let Erlang copy memory between cores for good locality.
Alternatively, on a dual processor machine with 4 cores per processor with good memory bandwidth on the processor, benchmarking might suggest that I run one GHC runtime (with a parallel GC) plus one Erlang process per processor.
In both cases, if Erlang and GHC could share a heap, the sharing would probably be bound to a single OS thread running on a single core somehow. (I am getting out of my depth here, which is why I asked the question.)
I also have another agenda - benchmarking functional languages independently of GC. Often I read of results of benchmarks of OCaml v GHC v Erlang v ... and wonder how much the results are confounded by the different GCs. What if choice of GC could be orthogonal to choice of functional language? How expensive is GC anyway? See this devil advocates blog post
by my Lisp friend John Fremlin, which he has, charmingly, given his post title "Automated garbage collection is rubbish". When John claims that GC is slow and hasn't really sped up that much, I would like to be able to counter with some numbers.
A lot of Haskell and Erlang people are interested in the model where Erlang supervises distribution, while Haskell runs the shared memory nodes in parallel doing all the number crunching/logic.
A start towards this is the haskell-erlang library:
And we have similar efforts in Ruby land, via Hubris:
The question now is to find someone to actually push through the Erlang / Haskell interop to find out the tricky issues.
You're going to have an interesting time mixing GC between Haskell and Erlang. Erlang uses a per-process heap and copies data between processes -- as Haskell doesn't even have a concept of processes, I'm not sure how you would map this "universal" GC between the two. Furthermore, for best performance, Erlang uses a variety of allocators, each with slightly tweaked behaviours that I'm sure would affect the GC sub-system.
As with all things in software, abstraction comes at a cost. In this case, I rather suspect you'd have to introduce so many layers to get both languages over their impedance mismatch that you'd wind up with a not very performant (or useful) common VM.
Bottom line -- embrace the difference! There are huge advantages to NOT running everything in the same process, particularly from a reliability standpoint. Also, I think it's a little naive to expect one language/VM to last you for the rest of your life (unless you plan on a.) living a short time or b.) becoming some sort of code monk that ONLY works on a single project). Software development is all about mental agility and being willing to use the best available tools to build fast, reliable code.
Although this is a pretty old thread, if readers are still interested then it's worth taking a look at Cloud Haskell, which brings Erlang style concurrency and distribution to the GHC stable.
The forthcoming distributed-process-platform library adds support for OTP-esque constructs like gen_servers, supervision trees and various other "haskell flavoured" abstractions borrowed from and inspired by Erlang/OTP.
You could use an OTP gen_supervisor process to monitor Haskell instances that you spawn with open_port(). Depending on how the "port" exited, you would then be able to restart it or decide that it stopped on purpose and let the corresponding Erlang process die, too.
Fugheddaboudit. Even these language-independent VMs you speak of have trouble with data passed between languages sometimes. You should just serialize data between the two somehow: database, XML-RPC, something like that.
By the way, the idea of a single platform for the rest of your life is probably impractical, too. Computing technology and fashion change too often to expect that you can keep using just one language forever. Your very question points this out: no one language does everything we might wish, even today.
As dizzyd mentioned in his comment not all data in messages is copied, large binaries exist outside of the process heaps and are not copied.
Using a different memory structure to avoid having separate per-process heaps is certainly possible and has been done in a number of earlier implementations. While having a single heap makes message passing very fast it introduces a number of other problems, mainly that doing GC becomes more difficult as it has to be interactive and globally non-interruptive so you can't use the same simpler algorithms as the per-process heap model.
As long as we use have immutable data-structures there is no problem with robustness and safety. Deciding on which memory and GC models to use is a big trade-off, and unfortunately there universally best model.
While Haskell and Erlang are both functional languages they are in many respects very different languages and have very different implementations. It would difficult to come up with an "Erskell" (or Haslang) machine which could handle both languages efficiently. I personally think it is much better to keep them separate and to make sure you have a really good interface between them.
The CLR supports tail call optimization with an explicit tail opcode (as used by F#), which the JVM doesn't (yet) have an equivalent, which limits the implementation of such a style of language. The use of separate AppDomains does allow the CLR to hot-swap code (see e.g. this blog post showing how it can be done).
With Simon Peyton Jones working just down the corridor from Don Syme and the F# team at Microsoft Research, it would be a great disappointment if we didn't eventually see an IronHaskell with some sort of official status. An IronErlang would be an interesting project -- the biggest piece of work would probably be porting the green-threading scheduler without getting as heavyweight as the Windows Workflow engine, or having to run a BEAM VM on top the CLR.