Is "require()" safe in a sandbox? - duktape

I'm building a sandboxed duktape application. The sanboxing doc (https://github.com/svaarala/duktape/blob/master/doc/sandboxing.rst) advises to remove the default require() implementation. I'm not clear why that is necessary. It seems that require() depends on modSearch() to determine what code to load and from where. If modSearch() doesn't permit loading data from anywhere that isn't permitted in the sandbox, is there anything else about the default require() implementation that is unsafe or gives cause for wariness?

The recommendation is there to emphasize that the default require() is not necessarily sandboxing safe (even if the current implementation was), so it's probably best to replace it when sandboxing against potentially untrusted code (at least if the code can be actively malicious rather than just accidentally broken).
That said I don't know of any concrete issues right now.

Related

Correctly dynamic loading PIEs

Many discussions like this and this have warned us with examples that trying to dlopen a PIE could never be correct. The reasons are various: copy relocations, TLS, etc.
However, these problems can be circumvented if we loose the restriction. This question showed us compiling with fPIC can eliminate copy relocation, and TLS seems to work alright.
This brings up the question about how far we are from correctly dynamic loading a PIE. I agree with the idea again in link 1:
Bottom line: this was never designed to work, and you just happened to not step on many of the land-mines, so you thought it is working, when in fact you were exercising undefined behavior.
But I'm more interesting about WHY we could not do that, instead of another failing example.
More specifically, users could write their own runtime dynamic linker as this comment suggest, which could make some strong assumptions or compromises just for this purpose. Yet this requires extremely broad knowledge on compiling, linking and loading, some of which are known to be poorly documented.
So again, how do users correctly dynamic load PIEs, or at least how can they try to find a way to do that(or not to do that)?
But I'm more interesting about WHY we could not do that, instead of another failing example.
Because the designers of GLIBC didn't intend to allow for this to happen and don't consider this to be a valid use case.
More specifically, users could write their own runtime dynamic linker
Absolutely. You are free to design your own libc and the dynamic loader to allow for this use case. That requirement will add some complexity, but there is no fundamental reason it can't be done.
You may also find an existing alternate libc implementation which doesn't have this restriction (either because it has been designed in, or because the designers forgot to enforce it, as was the case with GLIBC before this patch).
how do users correctly dynamic load PIEs
They don't.
how can they try to find a way to do that(or not to do that)?
The usual solution is to "not do that", and in fact the need to "do that" seems to be very esoteric.
Why do you need to dlopen a PIE executable in the first place?

How to Decompile Bytenode "jsc" files?

I've just seen this library ByteNode it's the same as ByteCode of java but this is for NodeJS.
This library compiles your JavaScript code into V8 bytecode, which protect your source code, I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough. I'm wondering because I would like to protect my source code using this library?
TL;DR It'll raise the bar to someone copying the code and trying to pass it off as their own. It won't prevent a dedicated person from doing so. But the primary way to protect your work isn't technical, it's legal.
This library compiles your JavaScript code into V8 bytecode, which protect your source code...
Well, we don't know it's V8 bytecode, but it's "compiled" in some sense. All we know is that it creates a "code cache" via the built-in vm.Script.prototype.createCachedData API, which is officially just a cache used to speed up recompiling the code a second time, third time, etc. In theory, you're supposed to also provide the original source code as a string to the vm.Script constructor. But if you go digging into Node.js's vm.Script and V8 far enough it seems to be the actual code in some compiled form (whether actual V8 bytecode or not), and the code string you give it when running is ignored. (The ByteNode library provides a dummy string when running the code from the code cache, so clearly the actual code isn't [always?] needed.)
I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough.
Naturally, otherwise it would be useless because Node.js wouldn't be able to run it. I didn't find a tool to do it that already exists, but since V8 is open source, it would presumably be possible to find the necessary information to write a decompiler for it that outputs valid JavaScript source code which someone could then try to understand.
Experimenting with it, local variable names appear to be lost, although function names don't. Comments appear to get lost (this may not be as obvious as it seems, given that Function.prototype.toString is required to either return the original source text or a synthetic version [details]).
So if you run the code through a minifier (particularly one that renames functions), then run it through ByteNode (or just do it with vm.Script yourself, ByteNode is a fairly thin wrapper), it will be feasible for someone to decompile it into something resembling source code, but that source code will be very hard to understand. This is very similar to shipping Java class files, which can be decompiled (there's even a standard tool to do it in the JDK, javap), except that the format Java class files are well-documented and don't change from one dot release to the next (though they can change from one major release to another; new releases always support the older format, though), whereas the format of this data is not documented (though it's an open source project) and is subject to change from one dot release to the next.
Certain changes, such as changing the copyright message, are probably fairly easy to make to said source code. More meaningful changes will be harder.
Note that the code cache appears to have a checksum or other similar integrity mechanism, since directly editing the .jsc file to swap one letter for another in a literal string makes the code cache fail to load. So someone tampering with it (for instance, to change a copyright notice) would either need to go the decompilation/recompilation route, or dive into the V8 source to find out how to correct the integrity check.
Fundamentally, the way to protect your work is to ensure that you've put all the relevant notices in the relevant places such that the fact copying it is a violation of copyright is clear, then pursue your legal recourse should you find out about someone passing it off as their own.
is there any way
You could get a hundred answers here saying "I don't know a way", but that still won't guarantee that there isn't one.
not secure enough
Secure enough for what? What's your deployment scenario? What kind of scenario/attack are you trying to defend against?
FWIW, I don't know of an existing tool that "decompiles" V8 bytecode (i.e. produces JavaScript source code with the same behavior). That said, considering that the bytecode is a fairly straightforward translation of the source code, I'm sure it wouldn't be very hard to write such a tool, if someone had a reason to spend some time on it. After all, V8's JS-to-bytecode compiler is open source, so one would only have to look at those sources and implement the reverse direction. So I would assume that shipping as bytecode provides about as much "protection" as shipping as uglified JavaScript, i.e. none that I would trust.
Before you make any decisions, please also keep in mind that bytecode is considered an internal implementation detail of V8; in particular it is not versioned and can change at any time, so it has to be created by exactly the same V8 version that consumes it. If you want to update your Node.js you'll have to recreate all the bytecode, and there is no checking or warning in place that will point out when you forgot to do that.
Node.js source already contains code for decompiling binary bytecode.
You can get a text string from your V8 bytecode and then you would need to analyze it.
But text string would be very long and miss some important information such as a constant pool. So you need to modify the Node.js source.
Please check https://github.com/3DGISKing/pkg10.17.0
I have attached exported xml file.
If you study V8, it would be possible to analyze it and get source code from it.
It keeping it short and sweet, You can try Ghidra node.js package which is based on Ghidra reverse engineering framework which was open-sourced by NSA in the year 2019. Ghidra is capable of disassembling and decompiling the v8 bytecode. The inner working of disassembling is quite complex, this answer is short but sufficient.

Recommendations for preventing RequireJS from interfering with legacy code

I'm developing a 'widget', for lack of a better word, that will be loaded in many different sites that I don't control.
We're using RequireJS to keep things easy, but this has the side effect of breaking A LOT of sites that don't already use/support it.
The be clear - we don't control the sites, and the cause is that many of the sites existing libraries are loading into RequireJS instead of globally, and the code on these sites expects them to be loaded globally.
The only practical solution I can think of so far is to rename RequireJS' require() and define() (and perhaps others), then edit every library we rely on (using sed, of course) to load using the 'new' functions.
Has anyone else dealt with this? Is there a better method I'm missing?
Michael
For anyone who stumbles upon this, here's what I ended up doing...
There isn't a good solution for this at the moment as:
1) All libraries that load into RequireJS need define() to exist in their scope at execution time
and
2) There isn't any mechanism for asynchronously loading scripts that would allow define to be defined (pun not intended) and undefined before/after execution, aside from eval(), and that's just not a good option.
This means that, it's not really possible to have some type of scoped RequireJS without it possibly interfering with other scripts on-page that CAN use RequireJS, but are intended to load globally on that particular site.
So... here's the hacky solution I did...
Instead of loading the JS libraries myself, I bundled them on the fly, along with RequireJS, and wrapped in an immediately executing function.
The reason for doing this on-the-fly, is that some site specific data is necessary for the program to function, and it saves an HTTP request to obtain it (at the expense of a larger file download).
This allowed me to:
1) Use libraries that need to run under RequireJS (or similar) to work property
2) Avoid cluttering up the global namespace for stuff like jQuery
3) Avoid editing library source (eg. changing define() to my_special_define() or similar)
I hope this helps someone if they're trying to do the same thing as me :)

Languages with a NodeJS/CommonJS style module system

I really like the way NodeJS (and it's browser-side counterparts) handle modules:
var $ = require('jquery');
var config = require('./config.json');
module.exports = function(){};
module.exports = {...}
I am actually rather disappointed by the ES2015 'import' spec which is very similar to the majority of languages.
Out of curiosity, I decided to look for other languages which implement or even support a similar export/import style, but to no avail.
Perhaps I'm missing something, or more likely, my Google Foo isn't up to scratch, but it would be really interesting to see which other languages work in a similar way.
Has anyone come across similar systems?
Or maybe someone can even provide reasons that it isn't used all that often.
It is nearly impossible to properly compare these features. One can only compare their implementation in specific languages. I collected my experience mostly with the language Java and nodejs.
I observed these differences:
You can use require for more than just making other modules available to your module. For example, you can use it to parse a JSON file.
You can use require everywhere in your code, while import is only available at the top of a file.
require actually executes the required module (if it was not yet executed), while import has a more declarative nature. This might not be true for all languages, but it is a tendency.
require can load private dependencies from sub directories, while import often uses one global namespace for all the code. Again, this is also not true in general, but merely a tendency.
Responsibilities
As you can see, the require method has multiple responsibilities: declaring module dependencies and reading data. This is better separated with the import approach, since import is supposed to only handle module dependencies. I guess, what you like about being able to use the require method for reading JSON is, that it provides a really easy interface to the programmer. I agree that it is nice to have this kind of easy JSON reading interface, however there is no need to mix it with the module dependency mechanism. There can just be another method, for example readJson(). This would separate the concerns, so the require method would only be needed for declaring module dependencies.
Location in the Code
Now, that we only use require for module dependencies, it is a bad practice to use it anywhere else than at the top of your module. It just makes it hard to see the module dependencies when you use it everywhere in your code. This is why you can use the import statement only on top of your code.
I don't see the point where import creates a global variable. It merely creates a consistent identifier for each dependency, which is limited to the current file. As I said above, I recommend doing the same with the require method by using it only at the top of the file. It really helps to increase the readability of the code.
How it works
Executing code when loading a module can also be a problem, especially in big programs. You might run into a loop where one module transitively requires itself. This can be really hard to resolve. To my knowledge, nodejs handles this situation like so: When A requires B and B requires A and you start by requiring A, then:
the module system remembers that it currently loads A
it executes the code in A
it remembers that is currently loads B
it executes the code in B
it tries to load A, but A is already loading
A is not yet finished loading
it returns the half loaded A to B
B does not expect A to be half loaded
This might be a problem. Now, one can argue that cyclic dependencies should really be avoided and I agree with this. However, cyclic dependencies should only be avoided between separate components of a program. Classes in a component often have cyclic dependencies. Now, the module system can be used for both abstraction layers: Classes and Components. This might be an issue.
Next, the require approach often leads to singleton modules, which cannot be used multiple times in the same program, because they store global state. However, this is not really the fault of the system but the programmers fault how uses the system in the wrong way. Still, my observation is that the require approach misleads especially new programmers to do this.
Dependency Management
The dependency management that underlays the different approaches is indeed an interesting point. For example Java still misses a proper module system in the current version. Again, it is announced for the next version, but who knows whether this will ever become true. Currently, you can only get modules using OSGi, which is far from easy to use.
The dependency management underlaying nodejs is very powerful. However, it is also not perfect. For example non-private dependencies, which are dependencies that are exposed via the modules API, are always a problem. However, this is a common problem for dependency management so it is not limited to nodejs.
Conclusion
I guess both are not that bad, since each is used successfully. However, in my opinion, import has some objective advantages over require, like the separation of responsibilities. It follows that import can be restricted to the top of the code, which means there is only one place to search for module dependencies. Also, import might be a better fit for compiled languages, since these do not need to execute code to load code.

What is the least obvious Go standard library package you can use to escape a sandbox?

I'm designing a project for a college-level computer security course, and I'm trying to include a vulnerability where code which is "clean" by virtue of a number of risky packages being blacklisted (unsafe, os, ioutil, etc). The question is this: can you think of a way to use other non-obvious Go standard library packages to escape the sandbox? "Escape the sandbox" here means reading/writing files, making network connections, breaking memory safety (which would allow you to do any of the other things), etc.
Things I've tried so far that haven't worked:
Using the reflect package to do unsafe pointer conversions (the reflect package seems really safe against this sort of abuse)
Using the reflect package to get access to a reference held by a random stdlib package to some sensitive function like os.Open (I haven't found any that actually keep function pointers or anything like that)

Resources