How should dangerous code snippets be published? - security

When discussing (asking/answering questions about, writing blog posts about, etc.) some programming matters, it may be desirable to give source code examples of what you're talking about; but in some cases these snippets may be dangerous, not because they are directly harmful but because they seem to work at first but only set up for problems later. Two examples would be when discussing concurrency issues, where the code works most of the time but rarely and non-deterministically fails, and when discussing security issues, where the code seems to work but can in fact be exploited; and there could be other examples.
It is necessary to be able to discuss such issues, to foster awareness of them at least. However, I am always worried that someone will come from a search engine, barely read the post, copy and paste the snippet and use it for something; more subtly, someone may read the post, try out the code in a test project and confirm it can indeed be exploited (as he is encouraged to do), then some time later reuse the dangerous code, as he has forgotten the code is dangerous and there is no longer a blog post explaining why the code is dangerous around the snippet.
So I am wondering how to mark such code so that no part of it somehow makes it to production (or if it ever does, then the responsible party could not plausibly deny awareness).
One way I came up with is to put:
an #error (or similar) directive inside each of the functions, as well as
repeated comments warning of the dangerousness of the code (since someone who will try out the code in a test project to confirm the issue will have removed the #error directive).
But since these comments would only clutter up the snippet when reading on the web, I make them the same color as the background (or at least I am trying to; see how I put it in action here, I incidentally have a question on doctype.com asking how to best do this).
If that seems completely overkill, remember that concurrency (and security) issues are very dangerous so I want to do all I can (within reason) to prevent my snippets from causing issues in real software; I am sometimes comparing this to fissile material handling.
(I honestly don't know whether it would be best suited for programmers.stackexchange.com or here, so I'm asking here first; feel free to move to programmers.stackexchange.com if it turns out it would be better there.)

You make a very good point and I think that you handle it pretty well right now.
However, the #error lines show up in the blog post for me, they are not white.
I think that you shouldn't worry so much about it being picked up by a feed or something like that. If the code is pulled away from the warning message on your blog, it's more important to have the #error lines visible.
But overall, I like your system. I might be good idea to set some standard for this, though, as programmers.
I would however add a link to the original post explaining why it is bad, too. That is way more important than just saying it is.
So to summarize: good idea, we should think of a standard. Make sure to include a link to a why.

Personally, yes, I think it's overkill.
I don't think you need to concern yourself with someone who extracts and uses the code without reading the context in which it's given. Such a programmer will likely be making so many other mistakes as to render using your code largely irrelevant.
In short they will have and be creating bigger problems.

Related

How to implement code in a manner that lessens the possibility of complete re-works [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I had a piece of work thrown out due to a single minor spec change that turned out not to have been spec'ed correctly. If it had been done right at the start of the project then most of that work would have never have been needed in the first place.
What are some good tips/design principles that keep these things from happening?
Or to lessen the amount of re-working to code that is needed in order to implement feature requests or design changes mid implementation?
Modularize. Make small blocks of code that do their job well. However, thats only the beginning. Its usually a large combination of factors that contribute to code so bad it needs a complete rework. Everything from highly unstable requirements, poor design, lack of code ownership, the list goes on and on.
Adding on to what others have brought up: COMMUNICATION.
Communication between you and the customer, you and management, you and the other developers, you and your QA department, communication between everyone is key. Make sure management understands reasonable timeframes and make sure both you and the customer understand exactly what it is that your building.
Take the time to keep communication open with the customer that your building the product for. Make milestones and setup a time to display the project to the customer at each milestone. Even if the customer is completely disappointed with a milestone when you show it, you can scratch what you have and start over from the last milestone. This also requires that your work be built in blocks that work independent of one another as Csunwold stated.
Points...
Keep open communication
Be open and honest with progress of product
Be willing to change daily as to the needs of the customers business and specifications for the product change.
Software requirements change, and there's not much one can do about that except for more frequent interaction with clients.
One can, however, build code that is more robust in face of change. It won't save you from throwing out code that meets a requirement that nobody needs anymore, but it can reduce the impact of such changes.
For example, whenever this applies, use interfaces rather than classes (or the equivalent in your language), and avoid adding operations to the interface unless you are absolutely sure you need them. By building your programs that way you are less likely to rely on knowledge of a specific implementation, and you're less likely to implement things that you would not need.
Another advantage of this approach is that you can easily swap one implementation for another. For example, it sometimes pays off to write the dumbest (in efficiency) but the fastest to write and test implementation for your prototype, and only replace it with something smarter in the end when the prototype is the basis of the product and the performance actually matters. I find that this is a very effective way to avoid premature optimizations, and thus throwing away stuff.
modularity is the answer, as has been said. but it can be a hard answer to use in practice.
i suggest focussing on:
small libraries which do predefined things well
minimal dependencies between modules
writing interfaces first is a good way to achieve both of these (with interfaces used for the dependencies). writing tests next, against the interfaces, before the code is written, often highlights design choices which are un-modular.
i don't know whether your app is UI-intensive; that can make it more difficult to be modular. it's still usually worth the effort, but if not then assume that it will be thrown away before long and follow the iceberg principle, that 90% of the work is not tied to the UI and so easier to keep modular.
finally, i recommend "the pragmatic programmer" by andrew hunt and dave thomas as full of tips. my personal favourite is DRY -- "don't repeat yourself" -- any code which says the same thing twice smells.
iterate small
iterate often
test between iterations
get a simple working product out asap so the client can give input.
Basically assume stuff WILL get thrown out, so code appropriately, and don't get far enough into something that having it be thrown out costs a lot of time.
G'day,
Looking through the other answers here I notice that everyone is mentioning what to do for your next project.
One thing that seems to be missing though is having a washup to find out why the spec. was out of sync. with the actual requirements needed by the customer.
I'm just worried that if you don't do this, no matter what approach you are taking to implementing your next project, if you've still got that mismatch between actual requirements and the spec. for your next project then you'll once again be in the same situation.
It might be something as simple as bad communication or maybe customer requirement creep.
But at least if you know the cause and you can try and help minimise the chances of it happening again.
Not knocking what other answers are saying and there's some great stuff there, but please learn from what happened so that you're not condemned to repeat it.
HTH
cheers,
Sometimes a rewrite is the best solution!
If you are writing software for a camera, you could assume that the next version will also do video, or stereo video or 3d laser scanning and include all hooks for all this functionality, or you could write such a versatile extensible astronaut architecture that it could cope with the next camera including jet engines - but it will cost so much in money, resources and performance that you might have been better off not doing it.
A complete rewrite for new functionality in a new role isn't always a bad idea.
Like csunwold said, modularizing your code is very important. Write it so that if one piece falls prone to errors, it doesn't muck up the rest of the system. This way, you can debug a single buggy section while being able to safely rely on the rest.
Beyond this, documentation is key. If your code is neatly and clearly annotated, reworking it in the future will be infinitely easier for you or whoever happens to be debugging.
Using source control can be helpful too. If you find a piece of code doesn't work properly, there's always the opportunity to revert back to a past robust iteration.
Although it doesn't directly apply to your example, when writing code I try to keep an eye out for ways in which I can see the software evolving in the future.
Basically I try to anticipate where the software will go, but critically, I resist the temptation to implement any of the things I can imagine happening. All I am after is trying to make the APIs and interfaces support possible futures without implementing those features yet, in the hope that these 'possible scenarios' help me come up with a better and more future-proof interface.
Doesn't always work ofcourse.

Writing easily modified code

What are some ways in which I can write code that is easily modified?
The one I have learned from experience is that I almost always need to write one to throw away. That way I have developed a sense of the domain knowledge and program structure required before coding the actual application.
The general guidelines are offcourse
High cohesion, low coupling
Dont repeat yourself
Recognize design patterns and implement them
Dont recognize design patterns where they are not existing or necassary
Use a coding standard, stick to it
Comment everyting that should be commented, when in doubt : comment
Use unit tests
Write comments and tests before implementation, that way you know exactly what you want to do
And when it goes wrong : refactor, refactor, refactor. With good tests you can be sure nothing breaks
And oh yeah:
read this : http://www.pragprog.com/the-pragmatic-programmer
Everything (i think) above and more is in it
I think your emphasis on modifiability is more important than readability. It is not hard to make something easy to read, but the real test of how well it is understood comes when someone else (or you) has to modify it in repsonse to changing requirements.
What I try to do is assume that modifications will be necessary, and if it is not really clear how to do them, leave explicit directions in the code for how to do them.
I assume that I may have to do some educating of the reader of the code to get him or her to know how to modify the code properly. This requires energy on my part, and it requires energy on the part of the person reading the code.
So while I admire the idea of literate programming, that can be easily read and understood, sometimes it is more like math, where the only way to do it is for the reader to buckle down, pay close attention, re-read it a few times, and make sure they understand.
Readability helps a lot: If you do something non-obvious, or you are taking a shortcut, comment. Comments are places where you can go back and refactor if you have time later. Use sensible names for everything, makes it easier to understand what is going on.
Continuous revision will let you move from that first draft to a better one without throwing away (too much) work. Any time you rewrite from scratch you may lose lessons learned. As you code, use refactoring tools to eliminate code representing areas of exploration that are no longer needed, and to make obvious things that were obscure. The first one reduces the amount that you need to maintain; the second reduces the effort per square foot. (Sqft makes about as much sense as lines of code, really.)
Modularize appropriately and enforce encapsulation and separation of logic between your modules. You don't want too many dependencies on any one part of the code or that part becomes inherently harder to understand.
Considering using tried and true methods over cutting edge ones. You give up some functionality for predictability.
Finally, if this is code that people will be using before and after modification, you need(ed) to have an appropriate API insulating your code from theirs. Having a strong API lets you change things behind the scenes without needing to alert all your consumers. I think there's a decent article on Coding Horror about this.
Hang Your Code Out to D.R.Y.
I learned this early when assigned the task of changing the appearance of a web-interface. The code was in C, which I hated, and was compiled to a CGI executable. And, worse, it was built on a library that was abandoned—no updates, no support, and too many man-hours put into its use to change it. On top of the framework was a disorderly web of code, consisting of various form and element builders, custom string implementations, and various other arcane things (for a non-C programmer to commit suicide with).
For each change I made there were several, sometimes many, exceptions to the output HTML. Each one of these exceptions required a small change or improvement in the form builder, thanks to the language there's no inheritance and therefore only functions and structs, and instead of putting the hours in the team instead wrote these exceptions frequently.
In my inexperience I was forced to change the output of each exception, rather than consolidate the changes in an improved form builder. But, trawling through 15,000 lines of code for several hours after ineffective changes would induce code-burn, and a fogginess that took a night's sleep to cure.
Always run your code through the DRY-er.
The easiest way to modify a code is NOT to write code. Write pseudo code not just for algo but how your code should be structured if you are unsure.
Designing while writing code never works...for me :-)
Here is my current experience: I'm working (Java) with a kind of database schema that might often change (fields added/removed, data types modified). My strategy is to parse this schema and to generate the code with apache velocity. The BaseClass generated is never modified by the programmer. Else, a MyClass extends BaseClass is created and the logical components of this class (e.g. toString() ! )are implemented using the 'getters' and the 'setters' of the super class.

Vulnerability & Exploit Case Studies

I understand the general idea of how vulnerabilities are exploited. Buffer overflows and stuff like that, but I guess I don't REALLY get it.
Are there useful sources of information that explain this area well? Maybe case studies about how particular vulnerabilities were exploited?
Even more interesting would be how projects you have worked on suffered from these kinds of issues.
I'm not trying to learn about currently existing vulnerabilities that I could exploit. I'm trying to get a feel for how this area could have an impact on any projects I may work on.
iss.net has articles on different examples of exploits, mainly explaining how to secure your system.
The corelancoder tutorial! A must read
https://www.corelan.be/index.php/2009/07/19/exploit-writing-tutorial-part-1-stack-based-overflows/
Part 1 is a single BOF on windows, ... , Part 12 is ROP. It is hard, but the first one can be done in a day or two, and it should give you a real feeling on which difficulties one finds when writing an exploit, and on which countermeasures are useless/useful.
The problem with this area is that it is unclear until you try out something on your own, but that requires time. You could also check Metasploit to exploit problems directly (to have an idea of the impact) - you will find a list of exploit to fire to a target. If you need a target, use Metasploitable http://www.offensive-security.com/metasploit-unleashed/Requirements#Metasploitable
If you want practical examples of real life exploits, I totally recommend the book
"A Bug Hunter's Diary: A Guided Tour Through the Wilds of Software Security"
It's exactly what you want. It's full of case studies and real life examples of almost every type of exploits and it explains it from the finding to fully writing a working exploit.
Also there are some examples in the book "The shellcoders handbook" but it's not as comprehensive as "The bug hunter's diary" Also "The shellcoders handbook" is pretty big and I only use it as a reference when needed.
Also sometimes I keep reading exploits from "http://www.exploit-db.com" and it helped me a lot but keep in mind not everything can be taught so sometimes you will need to improvise based on what you have and what you can control it's hard at first but it will make you feel great when the exploit runs and you see that calc.exe :)
Of course corlan tutorials and other tutorials are a must to know the essentials but they only teach you the basic concepts and you have to see some real life exploits in action to really understand the possibilities.

Should Programmers Use Decompilers?

Hear lately I've been listening to Jeff Atwood and Joel Spolsky's radio show and they have been talking about dogfooding (the process of reusing your own code, see Jeff Atwood's blog post). So my question is should programmers use decompilers to see how that programmers code is implemented and works, to make sure it won't break your code. Or should you just trust that programmers code and adapt to it because using decompilers go against everything we as programmers have ever learn about hiding data (well OO programmers at least)?
Note: I wasn't sure which tags this would go under so feel free to retag it.
Edit: Just to clarify I was asking about decompilers as a last resort, say you can't get the source code for some reason. Sorry, I should have supplied this in the original question.
Yes, It can be useful to use the output of a decompiler, but not for what you suggest. The output of a compiler doesn't ever look much like what a human would write (except when it does.) It can't tell you why the code does what it does, or what a particular variable should mean. It's unlikely to be worth the trouble to do this unless you already have the source.
If you do have the source, then there are lots of good reasons to use a decompiler in your development process.
Most often, the reasons for using the output of a decompiler is to better optimize code. Sometimes, with high optimization settings, a compiler will just get it wrong. This can be almost impossible to sort out in some cases without comparing the output of the compiler at different levels of optimization.
Other times, when trying to squeeze the most performance out of a very hot code path, a developer can try arranging their code in a few different ways and compare the compiled results. As a last resort, this may be the simplest way to start when implementing a code block in assembly language, by duplicating the compiler's output.
Dogfooding is the process of using the code that you write, not necessarily re-using code.
However, code re-use typically means you have the source, hence 'code-reuse' otherwise its just using a library supplied by someone else.
Decompiling is hard to get right, and the output is typically very hard to follow.
You should use a decompiler if it is the tool that's required to get the job done. However, I don't think it's the proper use of a decompiler to get an idea of how well the code which is being decompiled was written. Depending on the language you use, the decompiled code can be very different from the code which was actually written. If you want to see some real code, look at open source code. If you want to see the code of some particular product, it's probably better to try to get access to the actual code through some legal means.
I'm not sure what exactly it is you are asking, what you expect "decompilers" to show you, or what this has to do with Atwood and Spolsky, or what the question is exactly. If you're programming to public interfaces then why would you need to see the original source of the the third party code to see if it will "break" your code? You could more effectively build tests to in order to determine this. As well, what the "decompiler" will tell you largely depends on the language/platform the software was written in, whether it is Java, .NET, C and so forth. It's not the same as having the original source to read, even in the case of .NET assemblies. Anyway, if you are worried about third party code not working for you then you should really be doing typical kinds of unit tests against the code rather than trying to "decompile" it. As far as whether you "should," if you mean whether you "should" in some other way other than what would be the best use of your time then I'm not sure what you mean.
Should Programmers Use Decompilers?
Use the right tool for the right job. Decompilers don't often produce results that are easy to understand, but sometimes they are what's needed.
should programmers use decompilers to
see how that programmers code is
implemented and works, to make sure it
won't break your code.
No, not unless you find a problem and need support. In general you don't use it if you don't trust it, and if you have to use it you even when you don't trust it you develop tests to prove the functionality and verify that later upgrades still work as expected.
Don't use functionality you don't test, unless you have very good support or a relationship of trust.
-Adam
Or should you just trust that programmers code and adapt to it because using decompilers go against everything we as programmers have ever learn about hiding data (well OO programmers at least)?
This is not true at all. You would use a decompiler not because you want to get around any sort of abstraction, encapsulation, or defeat OO principles, but because you want to understand why the code is behaving the way it is better.
Sometimes you need to use a decompiler (or in the Java world, a bytecode viewer) when you are troubleshooting an annoying bug with a 3rd party library where an exception is thrown with no useful error message, no logging, etc.
Use of a decompiler has nothing to do with OO principles.
The short answer to this... Program to a public and documented specification, not to an implementation. Relying on implementation specifics and side-effects will burn you.
Decompilation is not a tool to help you program correctly, though it might, in a pinch, assist you in understanding a problem with someone else's code for which you don't have source.
Also, beware of the possible legal risk of decompiling; many software companies have no-decompile clauses which could expose you and your employer to legal consequences.

Understanding a Large, Undocumented Set of Source Code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have always been astonished by Wine. Sometimes I want to hack on it, fix little things and generally understand how it works. So, I download the Wine source code and right after that I feel overwhelmed. The codebase is huge and - unlike the Linux Kernel - there are almost no guides about the code.
What are the best-practices for understanding such a huge codebase?
With a complex code base the biggest mistake you can make is trying to be a computer. Get the computer to run the code, and use a debugger to help find out what is going on.
Figure out how to compile, install and run your own version of Wine from the existing source code.
Learn how debug (e.g. use gdb) on a running instance of your version of Wine.
Run Wine under the debugger and make cause it to demonstrate the undesired behaviour.
The fun part: find where the code execution path goes and start learning how it all goes together.
Yes, reading lots and lots of code will help, but the compiler/debugger/computer can run code a lot faster than you.
A professor once told us to compare such a situation with climbing a mountain. You might be listening to someone who did this and tells you what it's like to look out into the country. And you believe without hesitation that that's a spectacular sight.
However, you have to start climbing yourself for real understanding what the view from the top is like.
And it's not that important to climb all the way to the top. It might be perfectly suficient just to reach a fair height above ground level.
But don't ever be afraid of start climbing. The view is always worth any efforts.
This has always been a nice analogy for me. I know this question was more about specific tips on how to efficiently deal with code bases once you started climbing. But nevertheless it instantly reminded me of our physics classes way back then.
(This is an answer I posted to a question a while back. I modified it a bit to fit this question.)
Experience has shown me that there are 3 major goals you have when learning a legacy system:
Learn what the code is supposed to do.
Learn how it does them.
(crucially) Learn why it does them the way it does.
All three of those parts are very important, and there's a few tricks to help you get started.
First, resist the temptation to just ctrl-click (or whatever your IDE uses) your way around the code to understand everything. You probably won't be able to keep everything in perspective in your mind this way, especially when each line forces you to look at multiple other classes in order to understand what it is, so you need to be able to hold several levels of the stack in your head.
Read documentation where possible; it usually helps you quickly gain a mental framework upon which to build everything that follows.
Run test cases where possible.
Don't be afraid to ask someone who knows if you have a question. Granted, you shouldn't waste others' time with inane queries, but if there's something that you simply don't understand (this is especially true with more conceptual questions like, "Wouldn't it make much more sense to implement this as a ___" or something), it's probably worth finding out the answer before you mess something up and don't know why.
When you do finally get down to reading the code, start at a logical "main" place and go from there. Don't just read the code top to bottom, or in alphabetical order, or anything (this is probably obvious).
The best way to get acquainted with a large codebase is to dive in. Many projects have a list of easy tasks that need to be done, and they're usually reserved to help ease people in. You should find and work on some of these; you'll learn a lot about the general code outline and structure, contribute to the project, and get an easy payoff that will help encourage you to take on larger tasks.
Like most projects, WINE has good resources available to its developers; IRC, wiki, mailing list, and guides/overviews. With most daunting codebases, it's not so scary after the first few fixes. WINE is truly large, and much like the kernel, I doubt there's any expert in all systems; don't feel like you need to be either. Start working on something that matters to you and take it from there.
I've started a few patches to WINE myself, and it's a good community and good structure. There's lots of very helpful debug messages, and it's a really cool project to work on, so that helps you hit it longer too.
We all appreciate your valor and willingness to help with WINE (it needs it). Thanks, and good luck.
Dig in. Think of a question you'd like to have answered, and try to find the answer. When you get tired of reading code, go read the dev mailing list, the developer's guide, or the wiki.
Unfortunately, there's no royal road to understanding a large code base. If you enjoy that sort of thing (I do) you're in for some fun. If not, guide books won't really help, so you aren't really that much worse off.
Look for one peculiar feature you are interested to improve. Search for its implementation. Once you found it, pull on that straw and all the rest will follow.
The best way is through comments.
I'm being ironic, as you understand tiny bits of the beast add comments so you can follow your trail.
The other developers will also enjoy it if you add the missing guides in the code.
Try to implement some tiny little change in the code, something that will be visible to you. That might be figuring out a workable way to output debugging statements (and figuring out where the output appears), it might be changing the default size of windows or desktop color, or something. Once you can make something happen in the codebase, you've scratched the surface of understanding and can begin to move on toward more complicated things. At that point, select a goal of something slightly more useful that you'd like the code to do, and implement that. Or check out the project's bug tracker and look for something small to start with.
Document as you go, and write unit tests as you go, and refactor as you go. When you figure out what a routine does, comment it!!
As others have suggested, dig in! Read all the available documentation you can absorb. Then see if you can find other people who are interested or knowledgeable and learn with/from them. It helps to have people to bounce ideas off of and ask questions.
For C source code, once you get a feel for what areas of the code you'd like to work on, generate ctags and cscope databases for that code. These tools make it a lot easier to jump around and understand the code. Many text editors (one example is gvim) have support for ctags and cscope so you can jump around easily.
(warning: shameless marketing ahead)
For Java developers using Eclipse, there's nWire. It is an Eclipse plugin for navigating and visualizing large codebases.
A good way to understand a large system is to break it down into it's constituent parts and focus on a specific paths through the application.
Your debugger is your friend here, set a breakpoint in the thread you want to investigate then step through it line by line looking at which each part does... hope that helps...

Resources