Is wasm safe to store client side secrets?

Is wasm safe to store client side secrets? - security

The security context of my question is as follows:
I currently have an electron desktop application which runs my webapp. Inside my app there is a feature which allows a authenticated user to access a document(pdf) which is password protected. The document is prepared on server and password is generated using SHA256 hash of 10 character string. The 10 characters are made up of two parts 1) variable doc id and 2nd fixed salt.
On the client inside electron currently i have a native node module which was written in C and compiled using node-gyp. I make a call to this module which contains the code to generate the same password using the two parts of the password. The whole purpose of having this native node module on the client was that as the module is a compiled binary it can not be reverse engineered and no one can access my secret salt. If salt is discovered then the other part is easy to find and hence all docs are accessible to attacker.
I read about WebAssembly and how it compiles lower language code to wasm which can then be loaded in to browsers. This effectively means i can use the same C code that i use in my native module but now over the web instead of in electron.
My doubts were on the security and reverse engineering part of wasm module that gets produced. I read that it is possible to reverse engineer and get back the C file from wasm but the format which it gives is not exactly the same as source and how much can it help the attacker is again a question.
Some threads that i have read
https://twitter.com/jebdec/status/1012749064696295425?lang=en
https://news.ycombinator.com/item?id=17507767
https://www.reddit.com/r/WebAssembly/comments/8qmxjv/can_we_decompile_wasm_to_ccsource/
Apart from security concerns i think i maybe using wasm for a wrong purpose here as being on the open web if there is code the source of which is not viewable and auditable, it presents a very large security issue on the whole.
Any comments/advice will be helpful.

First to answer your question. No, WASM is not safe to store client secrets for the scenario that you are describing.
And your current implementation doesn't seem to be secure as well (from the limited info available).
If you are compiling your code to binary just so that an attacker cannot find your fixed salt I got bad news for you. It is very much possible to extract data from a binary file.
Just think of a case where you have your salt stored as a variable inside a c program.
int main() {
char* mySaltVar = "my salt";
char* b = "my other string";
}
and you compile this with gcc and create binary. you might think these strings cannot be read as they inside the binary but you can just run strings program on the binary to get the strings inside the binary.
sk$ strings binary.out
my salt
my other string
The two strings are extracted from the binary.
This is an over simplified example to show that everything in your code is inside the binary. In most of the cases the compiler throws away some info to make the binary as efficient as possible. In such cases attacker would use reverse engineering tools to figure out what is going on inside the binary.

Related

How to Decompile Bytenode "jsc" files?

I've just seen this library ByteNode it's the same as ByteCode of java but this is for NodeJS.
This library compiles your JavaScript code into V8 bytecode, which protect your source code, I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough. I'm wondering because I would like to protect my source code using this library?

TL;DR It'll raise the bar to someone copying the code and trying to pass it off as their own. It won't prevent a dedicated person from doing so. But the primary way to protect your work isn't technical, it's legal.
This library compiles your JavaScript code into V8 bytecode, which protect your source code...
Well, we don't know it's V8 bytecode, but it's "compiled" in some sense. All we know is that it creates a "code cache" via the built-in vm.Script.prototype.createCachedData API, which is officially just a cache used to speed up recompiling the code a second time, third time, etc. In theory, you're supposed to also provide the original source code as a string to the vm.Script constructor. But if you go digging into Node.js's vm.Script and V8 far enough it seems to be the actual code in some compiled form (whether actual V8 bytecode or not), and the code string you give it when running is ignored. (The ByteNode library provides a dummy string when running the code from the code cache, so clearly the actual code isn't [always?] needed.)
I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough.
Naturally, otherwise it would be useless because Node.js wouldn't be able to run it. I didn't find a tool to do it that already exists, but since V8 is open source, it would presumably be possible to find the necessary information to write a decompiler for it that outputs valid JavaScript source code which someone could then try to understand.
Experimenting with it, local variable names appear to be lost, although function names don't. Comments appear to get lost (this may not be as obvious as it seems, given that Function.prototype.toString is required to either return the original source text or a synthetic version [details]).
So if you run the code through a minifier (particularly one that renames functions), then run it through ByteNode (or just do it with vm.Script yourself, ByteNode is a fairly thin wrapper), it will be feasible for someone to decompile it into something resembling source code, but that source code will be very hard to understand. This is very similar to shipping Java class files, which can be decompiled (there's even a standard tool to do it in the JDK, javap), except that the format Java class files are well-documented and don't change from one dot release to the next (though they can change from one major release to another; new releases always support the older format, though), whereas the format of this data is not documented (though it's an open source project) and is subject to change from one dot release to the next.
Certain changes, such as changing the copyright message, are probably fairly easy to make to said source code. More meaningful changes will be harder.
Note that the code cache appears to have a checksum or other similar integrity mechanism, since directly editing the .jsc file to swap one letter for another in a literal string makes the code cache fail to load. So someone tampering with it (for instance, to change a copyright notice) would either need to go the decompilation/recompilation route, or dive into the V8 source to find out how to correct the integrity check.
Fundamentally, the way to protect your work is to ensure that you've put all the relevant notices in the relevant places such that the fact copying it is a violation of copyright is clear, then pursue your legal recourse should you find out about someone passing it off as their own.

is there any way
You could get a hundred answers here saying "I don't know a way", but that still won't guarantee that there isn't one.
not secure enough
Secure enough for what? What's your deployment scenario? What kind of scenario/attack are you trying to defend against?
FWIW, I don't know of an existing tool that "decompiles" V8 bytecode (i.e. produces JavaScript source code with the same behavior). That said, considering that the bytecode is a fairly straightforward translation of the source code, I'm sure it wouldn't be very hard to write such a tool, if someone had a reason to spend some time on it. After all, V8's JS-to-bytecode compiler is open source, so one would only have to look at those sources and implement the reverse direction. So I would assume that shipping as bytecode provides about as much "protection" as shipping as uglified JavaScript, i.e. none that I would trust.
Before you make any decisions, please also keep in mind that bytecode is considered an internal implementation detail of V8; in particular it is not versioned and can change at any time, so it has to be created by exactly the same V8 version that consumes it. If you want to update your Node.js you'll have to recreate all the bytecode, and there is no checking or warning in place that will point out when you forgot to do that.

Node.js source already contains code for decompiling binary bytecode.
You can get a text string from your V8 bytecode and then you would need to analyze it.
But text string would be very long and miss some important information such as a constant pool. So you need to modify the Node.js source.
Please check https://github.com/3DGISKing/pkg10.17.0
I have attached exported xml file.
If you study V8, it would be possible to analyze it and get source code from it.

It keeping it short and sweet, You can try Ghidra node.js package which is based on Ghidra reverse engineering framework which was open-sourced by NSA in the year 2019. Ghidra is capable of disassembling and decompiling the v8 bytecode. The inner working of disassembling is quite complex, this answer is short but sufficient.

How to protect my script from copying and modifying in it?

I created expect script for customer and i fear to customize it like he want without returning to me so I tried to encrypt it but i didn't find a way for it
Then I tried to convert it to excutable but some commands was recognized by active tcl like "send" command even it is working perfectly on red hat
So is there a way to protect my script to be reading?
Thanks

It's usually enough to just package the code in a form that the user can't directly look inside. Even the smallest of speed-bump stops them.
You can use sdx qwrap to parcel your script up into a starkit. Those are reasonably resistant to random user poking, while being still technically open (the sdx tool is freely available, after all). You can convert the .kit file it creates into an executable by merging it with a packaged runtime.
In short, it's basically like this (with some complexity glossed over):
tclkit sdx.kit qwrap myapp.tcl
tclkit sdx.kit unwrap myapp.kit
# Copy additional assets into myapp.vfs if you need to
tclkit sdx.kit wrap myapp.exe -runtime C:\path\to\tclkit.exe
More discussion is here, the tclkit runtimes are here, and sdx itself can be obtained in .kit-packaged form here. Note that the runtime you use to run sdx does not need to be the same that you package; you can deploy code for other platforms than the one you are running from. This is a packaging phase action, not a compilation or linking.
Against more sophisticated users (i.e., not Joe Ordinary User) you'll want the Tcl Compiler out of the ActiveState TclDevKit. It's a code-obscurer formally (it doesn't actually improve the performance of anything) and the TDK isn't particularly well supported any more, but it's the main current solution for commercial protection of Tcl code. I'm on a small team working on a true compiler that will effectively offer much stronger protection, but that's not yet released (and really isn't ready yet).

One way is to store the essential code running in your server as back-end. Just give the user a fron-end application to do the requests. This way essential processes are on your control, and user cannot access that code.

What is the difference between dynamically and statically generated grpc code?

In the examples of the GRPC client there are two types of implementation, one where the .proto files are loaded and processed at runtime, and one where they are compiled using protoc.
My question is: what is the difference? The docs say nothing more than 'they behave identically', but surely there has to be a difference right?

Fundamentally, the primary difference is the one you mentioned: with the dynamic code generation, the .proto file is loaded and parsed at run time, and with static code generation, the .proto file is preprocessed into JavaScript.
The dynamic code generation is simpler to use, potentially easier to debug, and generates code that accepts regular JavaScript objects.
The static code generation (using protoc) requires the user to create protobuf objects, which means that input validation will be done earlier. It is also a workflow that is more consistent with other languages.

Is it possible to extract constants and other predefined values from binary executables?

Let's say we have this program here
class Message{
public static SUPER_SECRET_STRING = "bar";
public static void Main(){
string SECRET = "foo";
Console.Write(sha(SUPER_SECRET_STRING) + "" + sha(SECRET));
}
}
Now, after building this program, is there any way using a hex editor or some other utility to extract the values "foo" and "bar" from the compiled binary file?
Also let's assume that memory editors are not allowed.
Is this applicable to all compiled languages like C++? What about ones that are run in another environment like Java or C#?

The answer from Mene is correct, but I wanted to put in my two cents to let you know how ridiculously easy it is to extract strings from compiled binaries (regardless of the language). If you have Linux, all you have to do is run the command strings <compiled binary> and you have the extracted strings. You don't have to be any sort of reverse engineer to pull this off. I just ran it against the eclipse binary on my Ubuntu machine and check out the (truncated) output:
> strings eclipse
ATSH
0[A\
8.uCH
The %s executable launcher was unable to locate its
companion shared library.
There was a problem loading the shared library and
finding the entry point.
setInitialArgs
-vmargs
-name
--launcher.library
--launcher.suppressErrors
--launcher.ini
eclipse
Notice how the string "The %s executable launcher was unable to locate its companion shared library. There was a problem loading the shared library and finding the entry point." appears in the output. This string is no doubt hard coded into the program.
When strings (and other data) are hard coded into a program, most compilers place them into a special section in the binary where they can be mapped directly into memory for access by the program as it needs them. If you were to open the binary with a hex editor, you could find this string easily.

Yes you could easily use a decompiler to extract those kinds of constants, especially strings (since they require a larger chunk of memory). This will even work in machine-code binaries and is even easier for VM-languages like Java and C#.
If you need to keep something secret in there you will need to go great lengths. Simply encrypting the string for example would add a layer of security, but for someone who knows what she does this won't be a big barrier. For example scanning the the file for places with uncommon entropy is likely to reveal the key which was used for encryption. There are even systems which encode secrets by altering the used low-level commands in the binary. Those tools replace certain combinations of commands with other equivalent commands. But even thous systems are not too hard to circumvent, as the uncommon combination of commands will reveal the use of such tools.
And even if you manage to protect the string by some kind of encryption in your binary, you will at some point require a decrypted version for your execution. Creating a memory-dump at a point in time where the string is used will thus also contain a copy of the secret value. This is especially problematic in Java as you cannot deallocate a chunk of memory and a string is immutable (meaning that a "change" to the string will lead to a new chunk of memory).
As you see the problem is far from trivial. And of course there is no way to give you 100% security (think of all the cracked games and so on).
Something that can be implemented in a secure way is using Public-key cryptography. In that case you will need to keep the private key hidden. That might be possible if you could for example send things to your server to encrypt them or you have hardware which provides a Trusted Platform Module. But those things might not be feasible for your case.

How can I write a program that can detect by itself that it has been changed?

I need to write a small program that can detect that it has been changed. Please give me a suggestion!
Thank you.

The short answer is to create a hash or key of the program and have the program encrypt and store that key within itself. From time to time the program would make a checksum of itself and compare it against that hash/key. If there is a difference then handle it accordingly.
There are lots and lots of ways to go about this. There are lots of very smart engineers out there that know how to work around it if that is what you are trying to avoid.

The simplest way would be to use a hash function to generate a short code which is a digest of the whole program and then check this.
It would be fairly easy to debug the code and replace the hash value to subvert this.
A better way would be to generate a digital signature using your private key and with the public key in the program to check it.
This would then require changing the public key and the hash as well as understanding the program, or changing the program code itself to subvert the check.
All you can do in the case described so far is make it more difficult to subvert but it will be possible with a certain amount of effort. I'd suggest looking into cryptographic techniques and copy protection for more information to suit your specific case.

Do you mean that program 'foo' should be able to tell if some part of it was modified prior to / during run time? That's not the responsibility of the program, its the responsibility of the security hooks in the target OS.
For instance, if the installed and trusted 'foo' has signature "xyz1234" , the kernel should refuse to run a modified (or completely new) 'foo'. The same goes for 'foo' while its currently running in memory. Look up 'Trusted Path Of Execution', aka TPE to start.
A better question to ask would be how to sign your released version of 'foo', which depends upon your target platform.

try searching for "code signing"

The easiest way would be for the program to detect its own md5 and store that in a separate file, but this isn't totally secure. An MD5 + CRC might work slightly better.
Or as others here have suggested, a sha1, sha2 or sha3 which are much more secure than md5 currently.

I'd ask an external tool to do the check. This problem reminds me of the challenge to write a program that prints itself. In Bash you could do something like this:
#!/bin/bash
cat $0
which really asks for an external tool to do the job. It's kind of solving the problem by getting away from solving the problem...

The best option is going to be code signing -- either using a tool supplied by your local friendly OS (For example, If you're targeting Windows, you probably want to take a look at Authenticode where the Operating System handles the tampering), or by rolling your own option storing MD5 hashes and comparing
It is important to remember that bets are off if someone injects a thread into your process (to potentially kill your ongoing checks, etc.), or if they tamper with your compiled application to bypass said checks.

An alternative way which wasn't mentioned is to use a binary packer such as UPX.
If the binary gets changed on the disk then the unpacking code is likely to fail.
This however doesn't protect you if someone changes the binary while it is in memory.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string