What is the difference between dynamically and statically generated grpc code? - node.js

In the examples of the GRPC client there are two types of implementation, one where the .proto files are loaded and processed at runtime, and one where they are compiled using protoc.
My question is: what is the difference? The docs say nothing more than 'they behave identically', but surely there has to be a difference right?

Fundamentally, the primary difference is the one you mentioned: with the dynamic code generation, the .proto file is loaded and parsed at run time, and with static code generation, the .proto file is preprocessed into JavaScript.
The dynamic code generation is simpler to use, potentially easier to debug, and generates code that accepts regular JavaScript objects.
The static code generation (using protoc) requires the user to create protobuf objects, which means that input validation will be done earlier. It is also a workflow that is more consistent with other languages.

Related

How to Decompile Bytenode "jsc" files?

I've just seen this library ByteNode it's the same as ByteCode of java but this is for NodeJS.
This library compiles your JavaScript code into V8 bytecode, which protect your source code, I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough. I'm wondering because I would like to protect my source code using this library?
TL;DR It'll raise the bar to someone copying the code and trying to pass it off as their own. It won't prevent a dedicated person from doing so. But the primary way to protect your work isn't technical, it's legal.
This library compiles your JavaScript code into V8 bytecode, which protect your source code...
Well, we don't know it's V8 bytecode, but it's "compiled" in some sense. All we know is that it creates a "code cache" via the built-in vm.Script.prototype.createCachedData API, which is officially just a cache used to speed up recompiling the code a second time, third time, etc. In theory, you're supposed to also provide the original source code as a string to the vm.Script constructor. But if you go digging into Node.js's vm.Script and V8 far enough it seems to be the actual code in some compiled form (whether actual V8 bytecode or not), and the code string you give it when running is ignored. (The ByteNode library provides a dummy string when running the code from the code cache, so clearly the actual code isn't [always?] needed.)
I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough.
Naturally, otherwise it would be useless because Node.js wouldn't be able to run it. I didn't find a tool to do it that already exists, but since V8 is open source, it would presumably be possible to find the necessary information to write a decompiler for it that outputs valid JavaScript source code which someone could then try to understand.
Experimenting with it, local variable names appear to be lost, although function names don't. Comments appear to get lost (this may not be as obvious as it seems, given that Function.prototype.toString is required to either return the original source text or a synthetic version [details]).
So if you run the code through a minifier (particularly one that renames functions), then run it through ByteNode (or just do it with vm.Script yourself, ByteNode is a fairly thin wrapper), it will be feasible for someone to decompile it into something resembling source code, but that source code will be very hard to understand. This is very similar to shipping Java class files, which can be decompiled (there's even a standard tool to do it in the JDK, javap), except that the format Java class files are well-documented and don't change from one dot release to the next (though they can change from one major release to another; new releases always support the older format, though), whereas the format of this data is not documented (though it's an open source project) and is subject to change from one dot release to the next.
Certain changes, such as changing the copyright message, are probably fairly easy to make to said source code. More meaningful changes will be harder.
Note that the code cache appears to have a checksum or other similar integrity mechanism, since directly editing the .jsc file to swap one letter for another in a literal string makes the code cache fail to load. So someone tampering with it (for instance, to change a copyright notice) would either need to go the decompilation/recompilation route, or dive into the V8 source to find out how to correct the integrity check.
Fundamentally, the way to protect your work is to ensure that you've put all the relevant notices in the relevant places such that the fact copying it is a violation of copyright is clear, then pursue your legal recourse should you find out about someone passing it off as their own.
is there any way
You could get a hundred answers here saying "I don't know a way", but that still won't guarantee that there isn't one.
not secure enough
Secure enough for what? What's your deployment scenario? What kind of scenario/attack are you trying to defend against?
FWIW, I don't know of an existing tool that "decompiles" V8 bytecode (i.e. produces JavaScript source code with the same behavior). That said, considering that the bytecode is a fairly straightforward translation of the source code, I'm sure it wouldn't be very hard to write such a tool, if someone had a reason to spend some time on it. After all, V8's JS-to-bytecode compiler is open source, so one would only have to look at those sources and implement the reverse direction. So I would assume that shipping as bytecode provides about as much "protection" as shipping as uglified JavaScript, i.e. none that I would trust.
Before you make any decisions, please also keep in mind that bytecode is considered an internal implementation detail of V8; in particular it is not versioned and can change at any time, so it has to be created by exactly the same V8 version that consumes it. If you want to update your Node.js you'll have to recreate all the bytecode, and there is no checking or warning in place that will point out when you forgot to do that.
Node.js source already contains code for decompiling binary bytecode.
You can get a text string from your V8 bytecode and then you would need to analyze it.
But text string would be very long and miss some important information such as a constant pool. So you need to modify the Node.js source.
Please check https://github.com/3DGISKing/pkg10.17.0
I have attached exported xml file.
If you study V8, it would be possible to analyze it and get source code from it.
It keeping it short and sweet, You can try Ghidra node.js package which is based on Ghidra reverse engineering framework which was open-sourced by NSA in the year 2019. Ghidra is capable of disassembling and decompiling the v8 bytecode. The inner working of disassembling is quite complex, this answer is short but sufficient.

Can I use the Rust lexer or parser to retrieve a list of functions within a Rust file?

The lexer/parser file located here is quite large and I'm not sure if it is suitable for just retrieving a list of Rust functions. Perhaps writing my own/using another library would be a better route to take?
The end objective would be to create a kind of execution manager. To contextualise, it would be able to read a list of function calls wrapped in a function. The function calls that are within the function will then be able to be re/ordered from some web interface. Thought it might be nice to manage larger applications this way.
No. I mean, not really. Whether you write your own parser or re-use syntex, you're going to hit a fundamental limitation: macros.
So let's say you go all-out and expand macro_rules!-based macros, including the ones defined in external crates (which means you'll also need to extract rustc's crate metadata loading... which isn't stable). What about procedural macros and custom derive attributes? Those are defined in code and depend on compiler-internal interfaces to function.
The only way this is likely to ever work correctly is if you build on top of the compiler, or duplicate a huge amount of work (which also involves unstable binary interfaces).
You could use syntex to parse the Rust code in a build script.

template instantiation statistics from compilers

Is there a way to get a summary of the instantiated templates (with what types and how many times - like a histogram) within a translation unit or for the whole project (shared object/executable)?
If I have a large codebase and I want to take advantage of the C++11 extern keyword I would like to know which templates are most used within my project (or from the internals of stl - like std::less<MyString> for example).
Also is it possible to have a weight assigned to each template instantiation (time spent by the compiler)?
Even if only one (c++11 enabled) compiler gives me such statistics I would be happy.
How difficult would it be to implement such a thing with Clang's LibTooling?
And is this even reasonable? Many people told me that I can reason which template instantiations I should extern without the use of a tool...
There are several ways to attack this problem.
If you are working with an open-source compiler, it's not hard to make a simple change to the source code that will trace all template substantiations.
If that sounds like too much hassle, you can also try to force the compiler to produce a warning on each template instantiation for a given symbol. Steven Watanabe has written a set of tools that can help you with that.
Finally, possibly the best options is to use the debugging symbols (or map files), generated by the compiler, to track down how many times each function appears in the final image and more importantly how much does it add to the weight in bytes. The best example for such a tool is Andrian Stone's SymbolSort, which is based on the Microsoft's toolset. Another similar tool is the Map File Browser.

Size of fay generated file

I tried fay-jquery and the included sample test.hs file results in whooping 150 kb of js file.
Even with closure compiling it is still 20 kb.
I understand that it must carry a runtime, stdlib and jquery wrappers with it.
I can tell fay not to generate stdlib (--no-stdlib and --no-builtins flags).
But i do not know how to tell it not to include jquery code.
So my question is, how can i split those static parts into a separate js file and only generate module specific code?
This way large parts of code will be loaded only once (and cached) and i can create many smaller js files for separate web pages.
Yes it's safe to split modules up, as of Fay 0.16 all modules can exist standalone (before that you could still have the runtime and fay-base separate). There are some flags for this, --print-runtime and --no-stdlib. Compile with optimizations (-O, this increases the output size, but closure will be able to minimize it even better).
Also remember that the web server should gzip this. That brings the code size down to 4.5kiB. That's pretty decent, right?
You might want to consider putting all of your javascript in one file, that means a slower initial load but then users will have it cached for future page loads.
The reason the file size is so big is that fay-jquery has a lot of FFI bindings which produce a lot of transcoding information. I think fay-jquery could be optimized a lot here to for instance use Ptr JQuery rather than just JQuery in the types, or by figuring out that a lot of this is unnecessary while compiling, or abstracting the conversions more in the compiler's output.
Another possible issue I realized a couple of days ago is that the output is now in the global scope rather than in a closure, which might mean that google closure can't remove redundant code as well as previously (haven't had time to investigate this yet). The modules generation should perhaps be changed to produce a closure for each module.
Also see Reducing output size on the wiki.

Static data definition as language feature

I am writing a program in C++ in what I use some constant data. I build that data procedurally at startup, and never again change it. I know that ideally that data should be static data at the binaries, but that is not the case since I have to build them at first.
In C++ I could define static consts members in classes with that data or simply global consts. By doing this I would have static data in the binaries, but then I would not be able to program their contents. I would have to build them elsewhere and paste the result in the code. In my case it would be a bunch of binary data uglily and nonsenselesly encoded inside the source file.
So I started to wonder, is there any language that support such feature that enables me to define my static data procedurally, but that resolves it at compile time and embed inside the binaries? Could any kind of optimization handle these cases? If the procedure/function that generates the data takes no external parameters and has predictable results, it could be safely optimized with the results by the compiler. Does compilers take that path? Any languages you know does treat explicitly this matter?
I know of C preprocessor, but it is really not Turing-compatible, and its syntax is not as attractive as it would be a function modifier that tells it should be resolved at compile time.
Build a program to generate the data (you already have that), have it's output be in C++, add the generation to your Makefile, and use the generated data with #include.
C++ templates are Turing-complete, and are often used to compute compile-time constants.

Resources