Debug-able Domain Specific Language - dsl

My goal is to develop a DSL for my application but I want the user to be able to put a break-point in his/her DSL without the user to know anything about the underlying language that the DSL runs on and he/she see is the DSL related syntax, stack, watch variables and so on.
How can I achieve this?

It depends on your target platform. For example, if you're implementing your DSL compiler on top of .NET, it is trivial to annotate your bytecode with debugging information (variable names, source code location for expressions and statements, etc.).
If you also provide a Visual Studio extension for your language, you'll be able to reuse a royalty-free MSVS Isolated Shell for both editing and debugging for your DSL code.
Nearly the same approach is possible with JVM (you can use Eclipse or Netbeans as a debugging frontend).
Native code generation is a little bit more complicated, but it is still possible to do some simple things, like generating C code stuffed with line pragmas.

You basically need to generate code for your DSL with built-in opportunities for breakpoints, each with built-in facilities for observing the internal state variables. Then your debugger has know how to map locations in the DSL to the debug breakpoints, and for each breakpoint, simply call the observers. (If the observers have names, e.g., variable names, you can let the user choose which ones to call).

Related

How to add security to Spring boot jar file? [duplicate]

How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?
You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.
First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)
GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.
A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.
This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.

Is it possible to use JetBrains MPS, or part of it, inside another application as JIT Compiler/Translator?

Does JetBrains MPS provide an JIT compiler which can be used inside other applications?
We have a legacy application with its on script language. Because this script language is very difficult to use to our customer, we would like to provide a new DSL to them.
So the question is: Can we use Jetbrains MPS to design our DSL and then use the MPS JITCompiler/Translator to transform it to Java or whatever after the user wrote his script in our Software?
If you mean by JITCompiler/Translator, to take your DSL generate Java from it and then run that compiled java code, yes that is possible. But it would be an extra transformation step like: write code -> generate/compile -> run (the resulting jar).
If you mean interpreting the model without doing a transformation step first then the answer is, not out of the box. We have build a interpreter framework for MPS and build two interpreters with it so far. One for Java and one for C. Though the focus is not on performance there. We use it for small calculations in formulas or REPL like things. It is currently work in progress but work quite nice. You can look here for Interpreter and find some more information and where to look. As a midterm project we might want to integrate this interpreter definition with the Graal compiler which would then be much more a JITCompiler then just a interpreter.

Securely running user's code

I am looking to create an AI environment where users can submit their own code for the AI and let them compete. The language could be anything, but something easy to learn like JavaScript or Python is preferred.
Basically I see three options with a couple of variants:
Make my own language, e.g. a JavaScript clone with only very basic features like variables, loops, conditionals, arrays, etc. This is a lot of work if I want to properly implement common language features.
1.1 Take an existing language and strip it to its core. Just remove lots of features from, say, Python until there is nothing left but the above (variables, conditionals, etc.). Still a lot of work, especially if I want to keep up to date with upstream (though I just could also just ignore upstream).
Use a language's built-in features to lock it down. I know from PHP that you can disable functions and searching around, similar solutions seem to exist for Python (with lots and lots of caveats). For this I'd need to have a good understanding of all the language's features and not miss anything.
2.1. Make a preprocessor that rejects code with dangerous stuff (preferably whitelist based). Similar to option 1, except that I only have to implement the parser and not implement all features: the preprocessor has to understand the language so that you can have variables named "eval" but not call the function named "eval". Still a lot of work, but more manageable than option 1.
2.2. Run the code in a very locked-down environment. Chroot, no unnecessary permissions... perhaps in a virtual machine or container. Something in that sense. I'd have to research how to achieve this and how to make it give me the results in a secure way, but that seems doable.
Manually read through all code. Doable on a small scale or with moderators, though still tedious and error-prone (I might miss stuff like if (user.id = 0)).
The way I imagine 2.2 to work is like this: run both AIs in a virtual machine (or something) and constrain it to communicate with the host machine only (no other Internet or LAN access). Both AIs run in a separate machine and communicate with each other (well, with the playing field, and thereby they see each other's positions) through an API running on the host.
Option 2.2 seems the most doable, but also relatively hacky... I let someone's code loose in a virtualized or locked down environment, hoping that that'll keep them in while giving them free game to DoS or break out of the environment. Then again, most other options are not much better.
TL;DR: in essence my question is: how do I let people give me 'logic' for an AI (which I think is most easily done using code) and then run that without compromising the functionality of the system? There must be at least 2 AIs working on the same playing field.
This is really just a plugin system, so researching how others implement plugins is a good starting point. In particular, I'd look at web browsers like Chrome and Safari and their plugin systems.
A common theme in modern plugins systems is process isolation. Ideally you should run the plugin in its own process space in a sandbox. In OS X look at XPC, which is designed explicitly for this problem. On Linux (or more portably), I would probably look at NaCl (Native Client). The JVM is also designed to provide sandboxing, and offers a rich selection of languages. (That said, I don't personally consider the JVM a very strong sandbox. It's had a history of security problems.)
In general, my preference on these kinds of projects is a language-agnostic API. I most often use REST APIs (or "REST-like"). This allows the plugin to be highly restricted, while not restricting the language choice. I like simple HTTP for communications whenever possible because it has rich support in numerous languages, so it puts little restriction on the plugin. In fact, given your description, you wouldn't even have to run the plugin on your hardware (and certainly not on the main server). Making the plugins remote clients removes many potential concerns.
But ultimately, I think something like your "2.2" is the right direction.

Can you disallow a common lisp script called from common lisp to call specific functions?

Common Lisp allows to execute/compile code at runtime. But I thought for some (scripting-like) purposes it would be good if one could disallow a user-script to call some functions (especially for application extensions). One could still ask the user if he will allow an extension to access files/... I'm thinking of something like the Android permission system for Common Lisp. Is this possible without rewriting the evaluation code?
The problem I see is, that in Common Lisp you would probably want a script to be able to use reader macros and normal macros and for the latter operators like intern, but those would allow you to get arbitrary symbols (by string manipulation & interning), so simply scanning the code before evaluation won't suffice to ensure that specific functions aren't called.
So, is there something like a lock for functions? I thought of using fmakunbound / makunbound (and keeping the values in a local variable), but would that be possible in a multi-threaded environment?
Thanks in advance.
This is not part of the Common Lisp specification and there is no Common Lisp implementation that is extended to make this kind of restriction easy.
It seems to me like it would be easier to use operating system restrictions (e.g. rlimit, capabilities, etc) to enforce what you want on the Common Lisp process.
This is not an unusual desire, i.e. to run untrusted 3rd party code in a sandbox.
You can hand craft a sandbox by creating a custom parser and interpreter for your scripting language. It is pedantic, but true, than any program with an API is providing such a service. API designers and implementors needs to worry about the vile users.
You can still call eval or the compiler to run your sandbox scripts. It just means you need to assure that your reader, parser and language decline to provide access to any risky functionality.
You can use a lisp package to create a good sandbox. You can still use s-expressions for your scripting language's syntax, but you must cripple the standard reader so the user can't escape package-sandbox. You can still use the evaluator and the compiler, but you need to be sure the package you have boxed the user into contains no functionality that he can use to do inappropriate things.
Successful sandbox design and construction is easier when you start with an empty sandbox and slowly add functionality. Common Lisp is a big language and that creates a huge surface for attacker to poke at. So if you create a sandbox out of a package it's best to start with an empty package and add functions one at a time. Thinking thru what risks they create. The same approach is good when creating your crippled reader. Don't start with the full reader and throw things away, start with a useless reader and add things. Sadly taking that advice creates a pretty significant cost to getting started. But, if you look around I suspect you can find an existing safe reader.
Xach's suggestion is another way to go and in many case more straight forward.

Are there real world applications that use metaprogramming?

We all know that MetaProgramming is a Concept of Code == Data (or programs that write programs).
But are there any applications that use it & what are the advantages of using it?
This Question can be closed but i didnt see any related questions.
IDEs are full with metaprogramming:
code completion
code generation
automated refactoring
Metaprogramming is often used to work around the limitations of Java:
code generation to work around the verbosity (e.g. getter/setter)
code generation to work around the complexity (e.g. generating Swing code from a WYSIWIG editor)
compile time/load time/runtime bytecode rewriting to work around missing features (AOP, Kilim)
generating code based on annotations (Hibernate)
Frameworks are another example:
generating Models, Views, Controllers, Helpers, Testsuites in Ruby on Rails
generating Generators in Ruby on Rails (metacircular metaprogramming FTW!)
In Ruby, you pretty much cannot do anything without metaprogramming. Even simply defining a method is actually running code that generates code.
Even if you just have a simple shell script that sets up your basic project structure, that is metaprogramming.
Since code as data is one of key concepts of Lisp, the best thing would be to see the real applications of projects written in these.
On this link you can see an article about a real world application written partly in Clojure, a dialect of Lisp.
The thing is not to write programs that write programs, just because you can, but to add new functionality to your language when you really need it. Just think if you could simply add new keyword to Java or C#...
If you implement metaprogramming in a language-independent way, you get a program analysis and transformation system. This is precisely a tool that treats (arbitrary) programs as data. These can be used to carry out arbitrary transformations on arbitrary programs.
It also means you aren't limited by the specific metaprogramming features that the compiler guys happened to put into your language. For instance, while C++ has templates, it has no "reflection". But a program transformation system can provide reflection even if the base langauge doesn't have it. In particular, having a program transformation engine means never having to say "I'm sorry, your language doesn't support metaprogramming (well enough) so I can't do much except write code manually".
See our DMS Software Reengineering Toolkit for such a program transformation system. It has been used to build test coverage and profiling tools, code generation tools, tools to reshape the architecture of large scale C++ applications, tools to migrate applications from one langauge to another, ... This is all extremely practical. Most of the tasks done with DMS would completely impractical to do by hand.
Not a real world application, but a talk about metaprogramming in ruby:
http://video.google.com/videoplay?docid=1541014406319673545
Google TechTalks August 3, 2006 Jack Herrington, the author of Code Generation in Action (Manning, July 2003) , will talk about code generation techniques using Ruby. He will cover both do-it-yourself and off-the-shelf solutions in a conversation about where Ruby is as a tool, and where it's going.
A real world example would be Django's model metaclass. It is the class of the class, from which models inherit from and responsible for the outfit of the model instances with all their attributes and methods.
Any ORM in a dynamic language is an instant example of practical metaprogramming. E.g. see how SQLAlchemy or Django's ORM creates classes for tables it discovers in the database, dynamically, in runtime.
ORMs and other tools in Java world that use #annotations to modify class behavior do a bit of metaprogramming, too.
Metaprogramming in C++ allows you to write code that will get transformed at compilation.
There are a few great examples I know about (google for them):
Blitz++, a library to write efficient code for manipulating arrays
Intel Array Building Blocks
CGAL
Boost::spirit, Boost::graph
Many compilers and interpreters are implemented with metaprogramming techniques internally - as a chain of code rewriting passes.
ORMs, project templates, GUI code generation in IDEs had been mentioned already.
Domain Specific Languages are widely used, and the best way to implement them is to use metaprogramming.
Things like Autoconf are obviously cases of metaprogramming.
Actually, it's unlikely one can find an area of software development which won't benefit from one or another form of metaprogramming.

Resources