what are the ideas of preventing buffer overflow attacks? and i heard about Stackguard,but until now is this problem completely solved by applying stackguard or combination of it with other techniques?
after warm up, as an experienced programmer
Why do you think that it is so
difficult to provide adequate
defenses for buffer overflow attacks?
Edit: thanks for all answers and keeping security tag active:)
There's a bunch of things you can do. In no particular order...
First, if your language choices are equally split (or close to equally split) between one that allows direct memory access and one that doesn't , choose the one that doesn't. That is, use Perl, Python, Lisp, Java, etc over C/C++. This isn't always an option, but it does help prevent you from shooting yourself in the foot.
Second, in languages where you have direct memory access, if classes are available that handle the memory for you, like std::string, use them. Prefer well exercised classes to classes that have fewer users. More use means that simpler problems are more likely to have been discovered in regular usage.
Third, use compiler options like ASLR and DEP. Use any security related compiler options that your application offers. This won't prevent buffer overflows, but will help mitigate the impact of any overflows.
Fourth, use static code analysis tools like Fortify, Qualys, or Veracode's service to discover overflows that you didn't mean to code. Then fix the stuff that's discovered.
Fifth, learn how overflows work, and how to spot them in code. All your coworkers should learn this, too. Create an organization-wide policy that requires people be trained in how overruns (and other vulns) work.
Sixth, do secure code reviews separately from regular code reviews. Regular code reviews make sure code works, that it passes functional tests, and that it meets coding policy (indentation, naming conventions, etc). Secure code reviews are specifically, explicitly, and only intended to look for security issues. Do secure code reviews on all code that you can. If you have to prioritize, start with mission critical stuff, stuff where problems are likely (where trust boundaries are crossed (learn about data flow diagrams and threat models and create them), where interpreters are used, and especially where user input is passed/stored/retrieved, including data retrieved from your database).
Seventh, if you have the money, hire a good consultant like Neohapsis, VSR, Matasano, etc. to review your product. They'll find far more than overruns, and your product will be all the better for it.
Eighth, make sure your QA team knows how overruns work and how to test for them. QA should have test cases specifically designed to find overruns in all inputs.
Ninth, do fuzzing. Fuzzing finds an amazingly large number of overflows in many products.
Edited to add: I misread the question. THe title says, "what are the techniques" but the text says "why is it hard".
It's hard because it's so easy to make a mistake. Little mistakes, like off-by-one errors or numeric conversions, can lead to overflows. Programs are complex beassts, with complex interactions. Where there's complexity there's problems.
Or, to turn the question back on you: why is it so hard to write bug-free code?
Buffer overflow exploits can be prevented. If programmers were perfect, there would be no
unchecked buffers, and consequently, no buffer overflow exploits. However, programmers are not
perfect, and unchecked buffers continue to abound.
Only one technique is necessary: Don't trust data from external sources.
There's no magic bullet for security: you have to design carefully, code carefully, hold code reviews, test, and arrange to fix vulnerabilities as they arise.
Fortunately, the specific case of buffer overflows has been a solved problem for a long time. Most programming languages have array bounds checking and do not allow programs to make up pointers. Just don't use the few that permit buffer overflows, such as C and C++.
Of course, this applies to the whole software stack, from embedded firmware¹ up to your application.
¹ For those of you not familiar with the technologies involved, this exploit can allow an attacker on the network to wake up and take control of a powered off computer. (Typical firewall configurations block the offending packets.)
You can run analyzers to help you find problems before the code goes into production. Our Memory Safety Checker will find buffer overuns, bad pointer faults, array access errors, and memory management mistakes in C code, by instrumenting your code to watch for mistakes at the moment they are made. If you want the C program to be impervious to such errors, you can simply use the results of the Memory Safety analyzer as the production version of your code.
In modern exploitation the big three are:
ASLR
Canary
NX Bit
Modern builds of GCC applies Canaries by default. Not all ASLR is created equally, Windows 7, Linux and *BSD have some of the best ASLR. OSX has by far the worst ASLR implementation, its trivial to bypass. Some of the most advanced buffer overflow attacks use exotic methods to bypass ASLR. The NX Bit is by far the easist method to byapss, return-to-libc style attacks make it a non-issue for exploit developers.
Related
I am developing a Windows Form Application in C#.I have heard that one should not use built in methods and functions in code since hackers have deep understanding of such built in methods and know how to fail them Instead one should always use his/her own functions and methods and if not then call built in functions intelligently from those newly made functions.How much is that true?
A supporting example in favour of my argument is that I have seen developer always develope there own made encryption algorithm like AES,DES,RC4 and Hash functions since they believe that built in encryption algorithm have many times backdoor in them.
What?! No, no, no! Whoever told you this is just wrong.
There is a common fallacy that published source code is more vulnerable to "h4ckerz" because it is available for anyone to spot the flaws in. However, I'm glad you mentioned crypto, because this is an area where this line of reasoning really stands out as the fallacy it is.
One of the most popular questions of all time on https://security.stackexchange.com/ is about a developer (in the OP he was given the pseudonym "Dave") who shared this fear of published code. Dave, like the developer you saw, was trying to homebrew his own encryption algorithm. Here's one of the most popular comments in that thread:
Dave has a fundamentally false premise, that the security of an algorithm relies on (even partially) its obscurity - that's not the case. The security of a hashing algorithm relies on the limits of our understanding of mathematics, and, to a lesser extent, the hardware ability to brute-force it. Once Dave accepts this reality (and it really is reality, read the Wikipedia article on hashing), it's a question of who is smarter - Dave by himself, or a large group of specialists devoted to this very particular problem. (emphasis added)
As a matter of fact, as it stands now, the top two memes on Security.SE are "Don't roll your own" and "Don't be a Dave".
While this has all been about crypto, this applies in general to most open-source software. The chance that a backdoor will get found and fixed goes up with each new set of eyes laid on the code. This should be a simple and uncontroversial premise: the more people are looking for something, the higher the chance it will be found. Yes, this applies to malicious users looking for exploits. However, it also applies to power users, white hat hackers, security researchers, cryptographers, professional developers, and others working for "good", which generally (hopefully) outnumber those working for "evil". This also implicitly relies on the false premise that hackers need to see the source code to do bad things. This should be obviously false based on the sheer number of proprietary systems whose source code has never been published (various Microsoft and Adobe programs come to mind) which have been inundated with vulnerabilities for years. Maybe having source code to read makes the hacker's job easier, but maybe not -- is it easier to pore over source code looking for an attack vector or to just use scanning tools and scripts against a compiled binary?
tl;dr Don't be a Dave. Rolling your own means you have to be the best at everything to succeed, instead of taking a sampling of the best the community has to offer.
Heartbleed
In your comment, you rebut:
Then why was the Heartbleed bug in openSSL not found and corrected [earlier]?
Because no one was looking at it. That's the sad truth. Here's the difference -- what happened once someone did find it? Now tens of thousands of security researchers, crypto experts, and others are looking at it. Suppose the same kind of vulnerability existed in one of the proprietary products I mentioned earlier, which it very well could. Once it's caught (if it's caught), ask yourself:
Could the team of programmers at the company responsible benefit from the help of the entire worldwide community of security experts, cryptographers, and other analysts right now?
If a bug this critical were discovered (and that's a big if!) in your software, would you be prepared to deal with the fallout caused by your custom implementation?
Unless you know of specific failure modes or weaknesses of the built-in methods your application would use and know how to minimize or eliminate them, it is probably better to use the methods provided by the language or library designers, which will often be both more efficient and more secure than what an average programmer would come up with on the fly for a particular project.
Your example absolutely does not support your view: developing your own encryption algorithm without some serious background in the domain and review by cryptanalysts, and then employing it in security-critical code, is a recipe for disaster. Even developing your own custom implementation of an industry standard encryption algorithm can present problems, and almost certainly will if you are inexperienced at it.
What are the methods for protecting an Exe file from Reverse Engineering.Many Packers are available to pack an exe file.Such an approach is mentioned in http://c-madeeasy.blogspot.com/2011/07/protecting-your-c-programexe-files-from.html
Is this method efficient?
The only good way to prevent a program from being reverse-engineered ("understood") is to revise its structure to essentially force the opponent into understanding Turing Machines. Essentially what you do is:
take some problem which generally proven to be computationally difficult
synthesize a version of that whose outcome you know; this is generally pretty easy compared to solving a version
make the correct program execution dependent on the correct answer
make the program compute nonsense if the answer is not correct
Now an opponent staring at your code has to figure what the "correct" computation is, by solving algorithmically hard problems. There's tons of NP-hard problems that nobody has solved efficiently in the literature in 40 years; its a pretty good bet if your program depends on one of these, that J. Random Reverse-Engineer won't suddenly be able to solve them.
One generally does this by transforming the original program to obscure its control flow, and/or its dataflow. Some techniques scramble the control flow by converting some control flow into essentially data flow ("jump indirect through this pointer array"), and then implementing data flow algorithms that require precise points-to analysis, which is both provably hard and has proven difficult in practice.
Here's a paper that describes a variety of techniques rather shallowly but its an easy read:
http://www.cs.sjsu.edu/faculty/stamp/students/kundu_deepti.pdf
Here's another that focuses on how to ensure that the obfuscating transformations lead to results that are gauranteed to be computationally hard:
http://www.springerlink.com/content/41135jkqxv9l3xme/
Here's one that surveys a wide variety of control flow transformation methods,
including those that provide levels of gaurantees about security:
http://www.springerlink.com/content/g157gxr14m149l13/
This paper obfuscates control flows in binary programs with low overhead:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.3773&rank=2
Now, one could go through a lot of trouble to prevent a program from being decompiled. But if the decompiled one was impossible to understand, you simply might not bother; that's the approach I'd take.
If you insist on preventing decompilation, you can attack that by considering what decompilation is intended to accomplish. Decompilation essentially proposes that you can convert each byte of the target program into some piece of code. One way to make that fail, is to ensure that the application can apparently use each byte
as both computer instructions, and as data, even if if does not actually do so, and that the decision to do so is obfuscated by the above kinds of methods. One variation on this is to have lots of conditional branches in the code that are in fact unconditional (using control flow obfuscation methods); the other side of the branch falls into nonsense code that looks valid but branches to crazy places in the existing code. Another variant on this idea is to implement your program as an obfuscated interpreter, and implement the actual functionality as a set of interpreted data.
A fun way to make this fail is to generate code at run time and execute it on the fly; most conventional languages such as C have pretty much no way to represent this.
A program built like this would be difficult to decompile, let alone understand after the fact.
Tools that are claimed to a good job at protecting binary code are listed at:
https://security.stackexchange.com/questions/1069/any-comprehensive-solutions-for-binary-code-protection-and-anti-reverse-engineeri
Packing, compressing and any other methods of binary protection will only every serve to hinder or slow reversal of your code, they have never been and never will be 100% secure solutions (though the marketing of some would have you believe that). You basically need to evaluate what sort of level of hacker you are up against, if they are script kids, then any packer that require real effort and skill (ie:those that lack unpacking scripts/programs/tutorials) will deter them. If your facing people with skills and resources, then you can forget about keeping your code safe (as many of the comments say: if the OS can read it to execute it, so can you, it'll just take a while longer). If your concern is not so much your IP but rather the security of something your program does, then you might be better served in redesigning in a manner where it cannot be attack even with the original source (chrome takes this approach).
Decompilation is always possible. The statement
This threat can be eliminated to extend by packing/compressing the
executable(.exe).
on your linked site is a plain lie.
Currently many solutions can be used to protect your application from being anti-compiled. Such as compressing, Obfuscation, Code snippet, etc.
You can looking for a company to help you achieve this.
Such as Nelpeiron, the website is:https://www.nalpeiron.com/
Which can cover many platforms, Windows, Linux, ARM-Linux, Android.
What is more Virbox is also can be taken into consideration:
The website is: https://lm-global.virbox.com/index.html
I recommend is because they have more options to protect your source code, such as import table protection, memory check.
I've heard a bit about using automated theorem provers in attempts to show that security vulnerabilities don't exist in a software system. In general this is fiendishly hard to do.
My question is has anyone done work on using similar tools to find vulnerabilities in existing or proposed systems?
Eidt: I'm NOT asking about proving that a software system is secure. I'm asking about finding (ideally previously unknown) vulnerabilities (or even classes of them). I'm thinking like (but an not) a black hat here: describe the formal semantics of the system, describe what I want to attack and then let the computer figure out what chain of actions I need to use to take over your system.
Yes, a lot of work has been done in this area. Satisfiability (SAT and SMT) solvers are regularly used to find security vulnerabilities.
For example, in Microsoft, a tool called SAGE is used to eradicate buffer overruns bugs from windows.
SAGE uses the Z3 theorem prover as its satisfiability checker.
If you search the internet using keywords such as “smart fuzzing” or “white-box fuzzing”, you will find several other projects using satisfiability checkers for finding security vulnerabilities.
The high-level idea is the following: collect execution paths in your program (that you didn't manage to exercise, that is, you didn't find an input that made the program execute it), convert these paths into mathematical formulas, and feed these formulas to a satisfiability solver.
The idea is to create a formula that is satisfiable/feasible only if there is an input that will make the program execute the given path.
If the produced formula is satisfiable (i.e., feasible), then the satisfiability solver will produce an assignment and the desired input values. White-box fuzzers use different strategies for selecting execution paths.
The main goal is to find an input that will make the program execute a path that leads to a crash.
So, at least in some meaningful sense, the opposite of proving something is secure is finding code paths for which it isn't.
Try Byron Cook's TERMINATOR project.
And at least two videos on Channel9. Here's one of them
His research is likely to be a good starting point for you to learn about this extremely interesting area of research.
Projects such as Spec# and Typed-Assembly-Language are related too. In their quest to move the possibility of safety checks from runtime back to compile-time, they allow the compiler to detect many bad code paths as compilation errors. Strictly, they don't help your stated intent, but the theory they exploit might be useful to you.
I'm currently writing a PDF parser in Coq together with someone else. While the goal in this case is to produce a secure piece of code, doing something like this can definitely help with finding fatal logic bugs.
Once you've familiarized yourself with the tool, most proof become easy. The harder proofs yield interesting test cases, that can sometimes trigger bugs in real, existing programs. (And for finding bugs, you can simply assume theorems as axioms once you're sure that there's no bug to find, no serious proving necessary.)
About a moth ago, we hit a problem parsing PDFs with multiple / older XREF tables. We could not prove that the parsing terminates. Thinking about this, I constructed a PDF with looping /Prev Pointers in the Trailer (who'd think of that? :-P), which naturally made some viewers loop forever. (Most notably, pretty much any poppler-based viewer on Ubuntu. Made me laugh and curse Gnome/evince-thumbnailer for eating all my CPU. I think they fixed it now, tho.)
Using Coq to find lower-level bugs will be difficult. In order to prove anything, you need a model of the program's behavior. For stack / heap problems, you'll probably have to model the CPU-level or at least C-level execution. While technically possible, I'd say this is not worth the effort.
Using SPLint for C or writing a custom checker in your language of choice should be more efficient.
STACK and KINT used constraint solvers to find vulnerabilities in many OSS projects, like the linux kernel and ffmpeg. The project pages point to papers and code.
It's not really related to theorem-proving, but fuzz testing is a common technique for finding vulnerabilities in an automated way.
There is the L4 verified kernel which is trying to do just that. However, if you look at the history of exploitation, completely new attack patterns are found and then a lot of software written up to that point is very vulnerable to attacks. For instance, format string vulnerabilities weren't discovered until 1999. About a month ago H.D. Moore released DLL Hijacking and literally everything under windows is vulnerable.
I don't think its possible to prove that a piece of software is secure against an unknown attack. At least not until a theorem is able to discover such an attack, and as far as I know this hasn't happened.
Disclaimer: I have little to no experience with automated theorem provers
A few observations
Things like cryptography are rarely ever "proven", just believed to be secure. If your program uses anything like that, it will only be as strong as the crypto.
Theorem provers can't analyze everything (or they would be able to solve the halting problem)
You would have to define very clearly what insecure means for the prover. This in itself is a huge challenge
Yes. Many theorem proving projects show the quality of their software by demonstrating holes or defects in software. To make it security related, just imagine finding a hole in a security protocol. Carlos Olarte's Ph.D. thesis under Ugo Montanari has one such example.
It is in the application. Not really the theorem prover itself that has anything to do with security or special knowledge thereof.
I'm insterested to know the techniques that where used to discover vulnerabilities. I know the theory about buffer overflows, format string exploits, ecc, I also wrote some of them. But I still don't realize how to find a vulnerability in an efficient way.
I don't looking for a magic wand, I'm only looking for the most common techniques about it, I think that looking the whole source is an epic work for some project admitting that you have access to the source. Trying to fuzz on the input manually isn't so comfortable too. So I'm wondering about some tool that helps.
E.g.
I'm not realizing how the dev team can find vulnerabilities to jailbreak iPhones so fast.
They don't have source code, they can't execute programs and since there is a small number of default
programs, I don't expect a large numbers of security holes. So how to find this kind of vulnerability
so quickly?
Thank you in advance.
On the lower layers, manually examining memory can be very revealing. You can certainly view memory with a tool like Visual Studio, and I would imagine that someone has even written a tool to crudely reconstruct an application based on the instructions it executes and the data structures it places into memory.
On the web, I have found many sequence-related exploits by simply reversing the order in which an operation occurs (for example, an online transaction). Because the server is stateful but the client is stateless, you can rapidly exploit a poorly-designed process by emulating a different sequence.
As to the speed of discovery: I think quantity often trumps brilliance...put a piece of software, even a good one, in the hands of a million bored/curious/motivated people, and vulnerabilities are bound to be discovered. There is a tremendous rush to get products out the door.
There is no efficient way to do this, as firms spend a good deal of money to produce and maintain secure software. Ideally, their work in securing software does not start with a looking for vulnerabilities in the finished product; so many vulns have already been eradicated when the software is out.
Back to your question: it will depend on what you have (working binaries, complete/partial source code, etc). On the other hand, it is not finding ANY vulnerability but those that count (e.g., those that the client of the audit, or the software owner). Right?
This will help you understand the inputs and functions you need to worry about. Once you localized these, you may already have a feeling of the software's quality: if it isn't very good, then probably fuzzing will find you some bugs. Else, you need to start understanding these functions and how the input is used within the code to understand whether the code can be subverted in any way.
Some experience will help you weight how much effort to put at each task and when to push further. For example, if you see some bad practices being used, then delve deeper. If you see crypto being implemented from scratch, delve deeper. Etc
Aside from buffer overflow and format string exploits, you may want to read a bit on code injection. (a lot of what you'll come across will be web/DB related, but dig deeper) AFAIK this was a huge force in jailbreaking the iThingies. Saurik's mobile substrate allow(s) (-ed?) you to load 3rd party .dylibs, and call any code contained in those.
I've read and finished both Reversing: Secrets of Reverse Engineering and Hacking: The Art of Exploitation. They both were illuminating in their own way but I still feel like a lot of the techniques and information presented within them is outdated to some degree.
When the infamous Phrack Article, Smashing the Stack for Fun and Profit, was written 1996 it was just before what I sort of consider the Computer Security "golden age".
Writing exploits in the years that followed was relatively easy. Some basic knowledge in C and Assembly was all that was required to perform buffer overflows and execute some arbitrary shell code on a victims machine.
To put it lightly, things have gotten a lot more complicated. Now security engineers have to contend with things like Address Space Layout Randomization (ASLR), Data Execution Prevention (DEP), Stack Cookies, Heap Cookies, and much more. The complexity of writing exploits went up at least an order of magnitude.
You can't event run most of the buffer overrun exploits in the tutorials you'll find today without compiling with a bunch of flags to turn off modern protections.
Now if you want to write an exploit you have to devise ways to turn off DEP, spray the heap with your shell-code hundreds of times and attempt to guess a random memory location near your shellcode. Not to mention the pervasiveness of managed languages in use today that are much more secure when it comes to these vulnerabilities.
I'm looking to extend my security knowledge beyond writing toy-exploits for a decade old system. I'm having trouble locating resources that help address the issues of writing exploits in the face of all the protections I outlined above.
What are the more advanced and prevalent papers, books or other resources devoted to contending with the challenges of writing exploits for modern systems?
You mentioned 'Smashing the stack'. Research-wise this article was out-dated before it was even published. The late 80s Morris worm used it (to exploit fingerd IIRC). At the time it caused a huge stir because back then every server was written in optimistic C.
It took a few (10 or so) years, but gradually everyone became more conscious of security concerns related to public-facing servers.
The servers written in C were subjected to lots of security analysis and at the same time server-side processing branched out into other languages and runtimes.
Today things look a bit different. Servers are not considered a big target. These days it's clients that are the big fish. Hijack a client and the server will allow you to operate under that client's credentials.
The landscape has changed.
Personally I'm a sporadic fan of playing assembly games. I have no practical use for them, but if you want to get in on this I'd recommend checking out the Metasploit source and reading their mailing lists. They do a lot of crazy stuff and it's all out there in the open.
I'm impressed, you are a leet hacker Like me. You need to move to web applications. The majority of CVE numbers issued in the past few years have been in web applications.
Read these two papers:
http://www.securereality.com.au/studyinscarlet.txt
http://www.ngssoftware.com/papers/HackproofingMySQL.pdf
Get a LAMP stack and install these three applications:
http://sourceforge.net/projects/dvwa/ (php)
http://sourceforge.net/projects/gsblogger/ (php)
http://www.owasp.org/index.php/Category:OWASP_WebGoat_Project (j2ee)
You should download w3af and master it. Write plugins for it. w3af is an awesome attack platform, but it is buggy and has problems with DVWA, it will rip up greyscale. Acunetix is a good commercial scanner, but it is expensive.
I highly recommend "The Shellcoder's Handbook". It's easily the best reference I've ever read when it comes to writing exploits.
If you're interested writing exploits, you're likely going to have to learn how to reverse engineer. For 99% of the world, this means IDA Pro. In my experience, there's no better IDA Pro book than Chris Eagle's "The IDA Pro Book". He details pretty much everything you'll ever need to do in IDA Pro.
There's a pretty great reverse engineering community at OpenRCE.org. Tons of papers and various helpful apps are available there. I learned about this website at an excellent bi-annual reverse engineering conference called RECon. The next event will be in 2010.
Most research these days will be "low-hanging fruit". The majority of talks at recent security conferences I've been to have been about vulnerabilities on mobile platforms (iPhone, Android, etc) where there are few to none of the protections available on modern OSes.
In general, there won't be a single reference out there that will explain how to write a modern exploit, because there's a whole host of protections built into OSes. For example, say you've found a heap vulnerability, but that pesky new Safe Unlinking feature in Windows is keeping you from gaining execution. You'd have to know that two geniuses researched this feature and found a flaw.
Good luck in your studies. Exploit writing is extremely frustrating, and EXTREMELY rewarding!
Bah! The spam thingy is keeping me from posting all of my links. Sorry!
DEP (Data Execution Prevention), NX (No-Execute) and other security enhancements that specifically disallow execution are easily by-passed by using another exploit techniques such as Ret2Lib or Ret2Esp. When an application is compiled it usually is done so with other libraries (Linux) or DLLs (Windows). These Ret2* techniques simply call an existing function() that resides in memory.
For example, in a normal exploit you may overflow the stack and then take control of the return address (EIP) with the address of a NOP Sled, your Shellcode or an Environmental Variable that contains your shellcode. When attempting this exploit on a system that does not allow the stack to be executable your code will not run. Instead, when you overflow the return address (EIP) you can point it to an existing function within memory such as system() or execv(). You pre populate the required registers with the parameters this function expects and now you can call /bin/sh without having to execute anything from the stack.
For more information look here:
http://web.textfiles.com/hacking/smackthestack.txt