Related
Our users can enter questions that get answered by students. Our users need a extensible, flexible way to define the correct answers to these questions (which are stored as a simple string).
I would like to expose a library of domain specific functions that users can call on to describe the correct answer. Eg:
exact_match("puppy") // means the correct answer is the string 'puppy'
or
contains("yesterday") // means any answer with the word 'yesterday' is correct
The naive implementation would involve eval'ing user supplied strings in a sandboxed runtime (like a javascript vm or ruby vm). But I'd like to go further and only allow specific functions to be called. Any other scripting would be discarded. Such that:
puts("foo"); contains("yesterday")
would be illegal. Since we don't expose or allow puts().
How can I constrain the execution environment to only run a whitelist of functions? Or is there a different approach to build this kind of external-facing DSL instead of trying to constrain an existing language to a subset of functions?
I would check out MPS by JetBrains if I were you, its an open source DSL creation tool. I have never used it myself, but from everything I have seen on it, it's very intuitive; and all of their other products are incredibly powerful.
Just because you're creating a DSL, that doesn't necessarily mean that you have to give the user the ability to enter the code in text.
The key to this is providing a list of method names and your special keyword for them, the "FunCode" tag in the code example below:
Create a mapping from keyword to code, and letting them define everything they need, and then use it. And I would actually build my own XML parser so that it's not hackable, at least not on a list of zero-day-exploits hackable.
<strDefs>
<strDef><strNam>sickStr</strNam>
<strText>sick</strText><strNum>01</strNum><strDef>
<strDef><strNam>pupStr</strNam>
<strText>puppy</strText><strNum>02</strNum><strDef>
</strDefs>
<funDefs>
<funDef><funCode>pfContainsStr</funCode><funLabel>contains</funLabel>
<funNum>01</funNum></funDef>
<funDef><funCode>pfXact</funCode><funLabel>exact_match</funLabel>
<funNum>02</funNum></funDef>
</funDefs>
<queries>
<query><fun>01</fun><str>02</str>
</query>
</queries>
The above XML more represents the idea and the structure of what to do, but rather in a user interface, so the user is constrained. The user interface code that allows the data-entry of the above data should be running on your server, and they only interact with it. Any code that runs on their browser is hackable, because they can just save the page, edit the HTML (and/or JavaScript), and run that, which is their code now, not yours anymore.
You can't really open the door (pandora's box) and allow just anyone to write just any code and have it evaluated / interpreted by the language parser, because some hacker is going to exploit it. You must lock down the strings, probably by having them enter them into your database in an earlier step, and each string gets its own token that YOU generate (a SQL Server primary key is very simple, usable, and secure), but give them a display representation so it's readable to them.
Then give them a list of methods / functions they can use, along with a token (a primary key can also serve here, perhaps with a kind of table prefix) and also a display representation (label).
If you have them put all of their labels into yet another table, you can have SQL make sure that all of their labels are unique to each other in the whole "language", and then you can allow them to try to define their expressions in the language they want to use. This has the advantage that foreign languages can be used, but you don't have to do anything terribly special.
An important piece would be the verify button, that would translate their expression into unique tokens and back again, checking that the round-trip was successful. If it wasn't successful, there's some kind of ambiguity, and you might be able to allow them an option to use the list of tokens as the source in that case.
If you heavily rely on set-based logic for the underlying foundation of the language and your tables, you should be able to produce a coherent DSL that works. Many DSL creation problems are ones of integrity, where there are underlying assumptions that are contradictory, unintentionally mutually exclusive, or nonsensical. Truth is an unshakeable foundation. Anything else has a lie somewhere -- that you're trying to build on.
Sudoku is illustrative here. When you screw up a Sudoku, you often don't know that you have done so, and you keep building on that false foundation, until you get to the completion of the puzzle, and one whole string of assumptions disagrees with a different string of assumptions. They can't both be true. But you can't tell where you went wrong because you're too far away from the mistake and can not work backwards (easily). All steps taken look correct. A DSL, a database schema, and code, are all this way. Baby steps, that are double- and even triple-checked, and hopefully "correct by inspection", are the best way to "grow" a DSL, slowly, piece-by-piece. The best way to not have flaws is to not add them in the first place.
You don't want bugs in your DSL. Keep it spartan. KISS - Keep it simple, Sparticus! And I have personally found that keeping it set-based, if not overtly, under the covers, accomplishes this very well.
Finally, to be able to think this way, I've studied languages for a long time, and have cultivated a curiosity about how languages have come to be. Books are a good quality source of information, as they have a higher quality level than the internet, which is nevertheless also an indispensable source. Some of my favorite languages: Forth, Factor, SETL, F#, C#, Visual FoxPro (especially for its embedded SQL), T-SQL, Common LISP, Clojure, and probably my favorite, Dylan, an INFIX Lisp without parentheses that Apple experimented with and abandoned, with a syntax that seems to me reminiscent of Pascal, which I sort of liked. The language list is actually much longer than that (and I haven't written code for many of them -- just studied them or their genesis), but that's enough for now.
One of my favorite books, and immensely interesting for the "people" side of it, is "Masterminds of Programming: Conversations with the Creators of Major Programming Languages" (Theory in Practice (O'Reilly)) 1st Edition, Kindle Edition
by Federico Biancuzzi (Author), Chromatic (Author)
By the way, don't let them compromise the integrity of your DSL -- require that it is expressible set-based, and things should go well (IMHO). I hope it works out well for you. Add a comment to my answer telling me how it worked out, if you think of it. And don't forget to choose my answer if you think it's the best! We work hard for the money! ;-)
What are the methods for protecting an Exe file from Reverse Engineering.Many Packers are available to pack an exe file.Such an approach is mentioned in http://c-madeeasy.blogspot.com/2011/07/protecting-your-c-programexe-files-from.html
Is this method efficient?
The only good way to prevent a program from being reverse-engineered ("understood") is to revise its structure to essentially force the opponent into understanding Turing Machines. Essentially what you do is:
take some problem which generally proven to be computationally difficult
synthesize a version of that whose outcome you know; this is generally pretty easy compared to solving a version
make the correct program execution dependent on the correct answer
make the program compute nonsense if the answer is not correct
Now an opponent staring at your code has to figure what the "correct" computation is, by solving algorithmically hard problems. There's tons of NP-hard problems that nobody has solved efficiently in the literature in 40 years; its a pretty good bet if your program depends on one of these, that J. Random Reverse-Engineer won't suddenly be able to solve them.
One generally does this by transforming the original program to obscure its control flow, and/or its dataflow. Some techniques scramble the control flow by converting some control flow into essentially data flow ("jump indirect through this pointer array"), and then implementing data flow algorithms that require precise points-to analysis, which is both provably hard and has proven difficult in practice.
Here's a paper that describes a variety of techniques rather shallowly but its an easy read:
http://www.cs.sjsu.edu/faculty/stamp/students/kundu_deepti.pdf
Here's another that focuses on how to ensure that the obfuscating transformations lead to results that are gauranteed to be computationally hard:
http://www.springerlink.com/content/41135jkqxv9l3xme/
Here's one that surveys a wide variety of control flow transformation methods,
including those that provide levels of gaurantees about security:
http://www.springerlink.com/content/g157gxr14m149l13/
This paper obfuscates control flows in binary programs with low overhead:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.3773&rank=2
Now, one could go through a lot of trouble to prevent a program from being decompiled. But if the decompiled one was impossible to understand, you simply might not bother; that's the approach I'd take.
If you insist on preventing decompilation, you can attack that by considering what decompilation is intended to accomplish. Decompilation essentially proposes that you can convert each byte of the target program into some piece of code. One way to make that fail, is to ensure that the application can apparently use each byte
as both computer instructions, and as data, even if if does not actually do so, and that the decision to do so is obfuscated by the above kinds of methods. One variation on this is to have lots of conditional branches in the code that are in fact unconditional (using control flow obfuscation methods); the other side of the branch falls into nonsense code that looks valid but branches to crazy places in the existing code. Another variant on this idea is to implement your program as an obfuscated interpreter, and implement the actual functionality as a set of interpreted data.
A fun way to make this fail is to generate code at run time and execute it on the fly; most conventional languages such as C have pretty much no way to represent this.
A program built like this would be difficult to decompile, let alone understand after the fact.
Tools that are claimed to a good job at protecting binary code are listed at:
https://security.stackexchange.com/questions/1069/any-comprehensive-solutions-for-binary-code-protection-and-anti-reverse-engineeri
Packing, compressing and any other methods of binary protection will only every serve to hinder or slow reversal of your code, they have never been and never will be 100% secure solutions (though the marketing of some would have you believe that). You basically need to evaluate what sort of level of hacker you are up against, if they are script kids, then any packer that require real effort and skill (ie:those that lack unpacking scripts/programs/tutorials) will deter them. If your facing people with skills and resources, then you can forget about keeping your code safe (as many of the comments say: if the OS can read it to execute it, so can you, it'll just take a while longer). If your concern is not so much your IP but rather the security of something your program does, then you might be better served in redesigning in a manner where it cannot be attack even with the original source (chrome takes this approach).
Decompilation is always possible. The statement
This threat can be eliminated to extend by packing/compressing the
executable(.exe).
on your linked site is a plain lie.
Currently many solutions can be used to protect your application from being anti-compiled. Such as compressing, Obfuscation, Code snippet, etc.
You can looking for a company to help you achieve this.
Such as Nelpeiron, the website is:https://www.nalpeiron.com/
Which can cover many platforms, Windows, Linux, ARM-Linux, Android.
What is more Virbox is also can be taken into consideration:
The website is: https://lm-global.virbox.com/index.html
I recommend is because they have more options to protect your source code, such as import table protection, memory check.
I recently came across the term Polymorphic Code, and was wondering if anyone could suggest a legitimate (i.e. in legal and business appropriate software) reason to use it in a computer program? Links to real world examples would be appreciated!
Before someone answers, telling us all about the benefits of polymorphism in object oriented programming, please read the following definition for polymorphic code (taken from Wikipedia):
"Polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact. That is, the code changes itself each time it runs, but the function of the code in whole will not change at all."
Thanks, MagicAndi.
Update
Summary of answers so far:
Runtime optimization of the original code
Assigning a "DNA fingerprint" to each individual copy of an application
Obfuscate a program to prevent reverse-engineering
I was also introduced to the term 'metamorphic code'.
Runtime optimization of the original code, based on actual performance statistics gathered when running the application in its real environment and real inputs.
Digitally watermarking music is something often done to determine who was responsible for leaking a track, for example. It makes each copy of the music unique so that copies can be traced back to the original owner, but doesn't affect the audible qualities of the track.
Something similar could be done for compiled software by running each individual copy through a polymorphic engine before distributing it. Then if a cracked version of this software is released onto the Internet, the developer might be able to tell who cracked it by looking for specific variations produced the polymorphic engine (a sort of DNA test). As far as I know, this technique has never been used in practice.
It's not exactly what you were looking for I guess, since the polymorphic engine is not distributed with the code, but I think it's the closest to a legitimate business use you will find for this kind of technique.
Polymorphic code is a nice thing, but metamorphic is even nicer. To the legitimate uses: well, I can't think of anything other than anti-cracking and copy protection. Look at vx.org.ua if you wan't real world uses (not that legitimate though)
As Sami notes, on-the-fly optimisation is an excellent application of polymorphic code. A great example of this is the Fastest Fourier Transform in the West. It has a number of solvers at its disposal, which it combines with self-profiling to adjust the code path and solver parameters on subsequent executions. The result is the program optimises itself for your computing environment, getting faster with subsequent runs!
A related idea that may possibly be of interest is computational steering. This is the practice of altering the execution path of large simulations as the run proceeds, to focus on areas of interest to the researcher. The overall purpose of the simulation is not changed, but the feedback cycle acts to optimise the calculation. In this case the executable code is not being explicitly rewritten, but the effect from a user perspective is similar.
Polymorph code can be used to obfuscate weak or proprietary algorithms - that may use encryption e. g.. There're many "legitimate" uses for that. The term legitimate these days is kind of narrow-minded when it comes to IT. The core-paradigms of IT contain security. Whether you use polymorph shellcode in exploits or detect such code with an AV scanner. You have to know about it.
Obfuscate a program i.e. prevent reverse-engineering: goal being to protect IP (Intellectual Property).
Hear lately I've been listening to Jeff Atwood and Joel Spolsky's radio show and they have been talking about dogfooding (the process of reusing your own code, see Jeff Atwood's blog post). So my question is should programmers use decompilers to see how that programmers code is implemented and works, to make sure it won't break your code. Or should you just trust that programmers code and adapt to it because using decompilers go against everything we as programmers have ever learn about hiding data (well OO programmers at least)?
Note: I wasn't sure which tags this would go under so feel free to retag it.
Edit: Just to clarify I was asking about decompilers as a last resort, say you can't get the source code for some reason. Sorry, I should have supplied this in the original question.
Yes, It can be useful to use the output of a decompiler, but not for what you suggest. The output of a compiler doesn't ever look much like what a human would write (except when it does.) It can't tell you why the code does what it does, or what a particular variable should mean. It's unlikely to be worth the trouble to do this unless you already have the source.
If you do have the source, then there are lots of good reasons to use a decompiler in your development process.
Most often, the reasons for using the output of a decompiler is to better optimize code. Sometimes, with high optimization settings, a compiler will just get it wrong. This can be almost impossible to sort out in some cases without comparing the output of the compiler at different levels of optimization.
Other times, when trying to squeeze the most performance out of a very hot code path, a developer can try arranging their code in a few different ways and compare the compiled results. As a last resort, this may be the simplest way to start when implementing a code block in assembly language, by duplicating the compiler's output.
Dogfooding is the process of using the code that you write, not necessarily re-using code.
However, code re-use typically means you have the source, hence 'code-reuse' otherwise its just using a library supplied by someone else.
Decompiling is hard to get right, and the output is typically very hard to follow.
You should use a decompiler if it is the tool that's required to get the job done. However, I don't think it's the proper use of a decompiler to get an idea of how well the code which is being decompiled was written. Depending on the language you use, the decompiled code can be very different from the code which was actually written. If you want to see some real code, look at open source code. If you want to see the code of some particular product, it's probably better to try to get access to the actual code through some legal means.
I'm not sure what exactly it is you are asking, what you expect "decompilers" to show you, or what this has to do with Atwood and Spolsky, or what the question is exactly. If you're programming to public interfaces then why would you need to see the original source of the the third party code to see if it will "break" your code? You could more effectively build tests to in order to determine this. As well, what the "decompiler" will tell you largely depends on the language/platform the software was written in, whether it is Java, .NET, C and so forth. It's not the same as having the original source to read, even in the case of .NET assemblies. Anyway, if you are worried about third party code not working for you then you should really be doing typical kinds of unit tests against the code rather than trying to "decompile" it. As far as whether you "should," if you mean whether you "should" in some other way other than what would be the best use of your time then I'm not sure what you mean.
Should Programmers Use Decompilers?
Use the right tool for the right job. Decompilers don't often produce results that are easy to understand, but sometimes they are what's needed.
should programmers use decompilers to
see how that programmers code is
implemented and works, to make sure it
won't break your code.
No, not unless you find a problem and need support. In general you don't use it if you don't trust it, and if you have to use it you even when you don't trust it you develop tests to prove the functionality and verify that later upgrades still work as expected.
Don't use functionality you don't test, unless you have very good support or a relationship of trust.
-Adam
Or should you just trust that programmers code and adapt to it because using decompilers go against everything we as programmers have ever learn about hiding data (well OO programmers at least)?
This is not true at all. You would use a decompiler not because you want to get around any sort of abstraction, encapsulation, or defeat OO principles, but because you want to understand why the code is behaving the way it is better.
Sometimes you need to use a decompiler (or in the Java world, a bytecode viewer) when you are troubleshooting an annoying bug with a 3rd party library where an exception is thrown with no useful error message, no logging, etc.
Use of a decompiler has nothing to do with OO principles.
The short answer to this... Program to a public and documented specification, not to an implementation. Relying on implementation specifics and side-effects will burn you.
Decompilation is not a tool to help you program correctly, though it might, in a pinch, assist you in understanding a problem with someone else's code for which you don't have source.
Also, beware of the possible legal risk of decompiling; many software companies have no-decompile clauses which could expose you and your employer to legal consequences.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
When writing code do you consciously program defensively to ensure high program quality and to avoid the possibility of your code being exploited maliciously, e.g. through buffer overflow exploits or code injection ?
What's the "minimum" level of quality you'll always apply to your code ?
In my line of work, our code has to be top quality.
So, we focus on two main things:
Testing
Code reviews
Those bring home the money.
Similar to abyx, in the team I am on developers always use unit testing and code reviews. In addition to that, I also aim to make sure that I don't incorporate code that people may use - I tend to write code only for the basic set of methods required for the object at hand to function as has been spec'd out. I've found that incorporating methods that may never be used, but provide functionality can unintentionally introduce a "backdoor" or unintended/unanticipated use into the system.
It's much easier to go back later and introduce methods, attributes, and properties for which are asked versus anticipating something that may never come.
I'd recommend being defensive for data that enter a "component" or framework. Within a "component" or framework one should think that the data is "correct".
Thinking like this. It is up to the caller to supply correct parameters otherwise ALL functions and methods have to check every incomming parameter. But if the check is only done for the caller the check is only needed once. So, a parameter should be "correct" and thus can be passed through to lower levels.
Always check data from external sources, users etc
A "component" or framework should always check incomming calls.
If there is a bug and a wrong value is used in a call. What is really the right thing todo? One only have an indication that the "data" the program is working on is wrong and some like ASSERTS but others want to use advanced error reporting and possible error recovery. In any case the data is found to be faulty and in few cases it's good to continue working on it. (note it's good if servers don't die at least)
An image sent from a satellite might be a case to try advanced error recovery on...an image downloaded from the internet to put up an error icon for...
I recommend people write code that is fascist in the development environment and benevolent in production.
During development you want to catch bad data/logic/code as early as possible to prevent problems either going unnoticed or resulting in later problems where the root cause is hard to track.
In production handle problems as gracefully as possible. If something really is a non-recoverable error then handle it and present that information to the user.
As an example here's our code to Normalize a vector. If you feed it bad data in development it will scream, in production it returns a safety value.
inline const Vector3 Normalize( Vector3arg vec )
{
const float len = Length(vec);
ASSERTMSG(len > 0.0f "Invalid Normalization");
return len == 0.0f ? vec : vec / len;
}
I always work to prevent things like injection attacks. However, when you work on an internal intranet site, most of the security features feel like wasted effort. I still do them, maybe just not as well.
Well, there is a certain set of best practices for security. At a minimum, for database applications, you need to watch out for SQL Injection.
Other stuff like hashing passwords, encrypting connection strings, etc. are also a standard.
From here on, it depends on the actual application.
Luckily, if you are working with frameworks such as .Net, a lot of security protection comes built-in.
You have to always program defensively I would say even for internal apps, simply because users could just through sheer luck write something that breaks your app. Granted you probably don't have to worry about trying to cheat you out of money but still. Always program defensively and assume the app will fail.
Using Test Driven Development certainly helps. You write a single component at a time and then enumerate all of the potential cases for inputs (via tests) before writing the code. This ensures that you've covered all bases and haven't written any cool code that no-one will use but might break.
Although I don't do anything formal I generally spend some time looking at each class and ensuring that:
if they are in a valid state that they stay in a valid state
there is no way to construct them in an invalid state
Under exceptional circumstances they will fail as gracefully as possible (frequently this is a cleanup and throw)
It depends.
If I am genuinely hacking something up for my own use then I will write the best code that I don't have to think about. Let the compiler be my friend for warnings etc. but I won't automatically create types for the hell of it.
The more likely the code is to be used, even occasionally, I ramp up the level of checks.
minimal magic numbers
better variable names
fully checked & defined array/string lengths
programming by contract assertions
null value checks
exceptions (depending upon context of the code)
basic explanatory comments
accessible usage documentation (if perl etc.)
I'll take a different definition of defensive programming, as the one that's advocated by Effective Java by Josh Bloch. In the book, he talks about how to handle mutable objects that callers pass to your code (e.g., in setters), and mutable objects that you pass to callers (e.g., in getters).
For setters, make sure to clone any mutable objects, and store the clone. This way, callers cannot change the passed-in object after the fact to break your program's invariants.
For getters, either return an immutable view of your internal data, if the interface allows it; or else return a clone of the internal data.
When calling user-supplied callbacks with internal data, send in an immutable view or clone, as appropriate, unless you intend the callback to alter the data, in which case you have to validate it after the fact.
The take-home message is to make sure no outside code can hold an alias to any mutable objects that you use internally, so that you can maintain your invariants.
I am very much of the opinion that correct programming will protect against these risks. Things like avoiding deprecated functions, which (in the Microsoft C++ libraries at least) are commonly deprecated because of security vulnerabilities, and validating everything that crosses an external boundary.
Functions that are only called from your code should not require excessive parameter validation because you control the caller, that is, no external boundary is crossed. Functions called by other people's code should assume that the incoming parameters will be invalid and/or malicious at some point.
My approach to dealing with exposed functions is to simply crash out, with a helpful message if possible. If the caller can't get the parameters right then the problem is in their code and they should fix it, not you. (Obviously you have provided documentation for your function, since it is exposed.)
Code injection is only an issue if your application is able to elevate the current user. If a process can inject code into your application then it could easily write the code to memory and execute it anyway. Without being able to gain full access to the system code injection attacks are pointless. (This is why applications used by administrators should not be writeable by lesser users.)
In my experience, positively employing defensive programming does not necessarily mean that you end up improving the quality of your code. Don't get me wrong, you need to defensively program to catch the kinds of problems that users will come across - users don't like it when your program crashes on them - but this is unlikely to make the code any easier to maintain, test, etc.
Several years ago, we made it policy to use assertions at all levels of our software and this - along with unit testing, code reviews, etc. plus our existing application test suites - had a significant, positive effect on the quality of our code.
Java, Signed JARs and JAAS.
Java to prevent buffer overflow and pointer/stack whacking exploits.
Don't use JNI. ( Java Native Interface) it exposes you to DLL/Shared libraries.
Signed JAR's to stop class loading being a security problem.
JAAS can let your application not trust anyone, even itself.
J2EE has (admittedly limited) built-in support for Role based security.
There is some overhead for some of this but the security holes go away.
Simple answer: It depends.
Too much defensive coding can cause major performance issues.