Executing bytecode without source in v8 engine - node.js

I try to use this source https://github.com/JoseExposito/v8-compiler/blob/master/addon/v8-compiler.cpp in my project. But a function compilation_cache() called from runScript always returns NULL. What can be wrong? What I need to fix in the source to work with the later version of v8?
In addition, why is there no flag in the latest version of v8 FLAG_serialize_toplevel?

Two parts here:
First a warning: you should definitely not be writing code using any part of the i (internal) namespace. The APIs in the internal namespace don't always reflect how JS actually runs (due to optimizations), plus they change often and without warning.
In the case of the code you copied from V8 internals, V8 already provides an API to produce and consume "cached data" which is just the bytecode in a serialized form. Here's an example from the source of Node.js for how to produce a code cache.
Secondly your actual question: V8 will always perform a check on bytecode using a special hash (basically V8 version and source text length) and if it doesn't match it won't use the bytecode cache. This is why changing V8 versions causes the bytecode to be rejected.

Related

Trigger an interrupt when the value of a memory location is modified in FreeBSD/Linux

Is it possible to generate an interrupt when the value of a variable or memory location get modified in FreeBSD or Linux environment using C program ?
In a C application there is an dynamically allocated array which is being used/modified from multiple locations. The application is pretty large and complex, it is difficult to trace all the places the array being used or modified from.
The problem is in some condition/flow the array[2] element become 0 which is not expected as per this application. I can't run the application using gdb to debug this issue (because of some constraint). The only way to debug this issue is to modify the source code and run the binary where the issue is happening.
Is it possible to generate an interrupt when the arra[2] element is modified and print the backtrace to know which part of the codebase has modified it ?
Thanks!!!
You want a data breakpoint, also called watchpoint; GDB provides the following commands:
watch for writes
rwatch for reads
awatch for both
You can ask GDB for a specific condition as well, so the following expression (or something similar) should work:
watch array[2] if array[2] == 0
You must run the expression in the scope of the variable, the easiest way is to just set a breakpoint in the line after the allocation, then set the watchpoint after the breakpoint triggers and resume execution.
OTOH, to implement such a debugging facility within the application is rather complex and hardware-specific (in case hardware support isn't available, software watchpoints require implementing an entire debugger), so I would recommend using liblldb (which is Apache-2.0 licensed IIRC), as it provides a lldb::SBWatchpoint class which you can leverage. The Python API is documented: https://lldb.llvm.org/python_api/lldb.SBWatchpoint.html.
The C++ API is similar, but there's a lot of boilerplate to write that I don't see documented anywhere, so the API is private; you'd have to look at LLDB's own source code.

Why is Node.js's bytecode output so large?

Running node --print-bytecode on a file containing code as minimal as just void 0; produces a file the size of which is 2.3 MB. Looking through it, I thought it was including assembly code generated by JIT compilation, but that doesn't appear to be the case. Still, that doesn't explain why it's big. Compare it to the output of javap which, even though it's for a different language, is much smaller and much more readable.
I thought it was including assembly code generated by JIT compilation, but that doesn't appear to be the case.
Correct, bytecode is not assembly code. Assembly code will be generated later, for functions that run hot enough that V8 decides optimizing them is likely worth the time investment.
Why is Node.js's bytecode output so large? [...] on a file containing code as minimal as just void 0;
A lot of Node itself is written in JavaScript, and all of that gets compiled to bytecode before being executed. So even if you run Node on an empty file, you'll see quite a bit of bytecode.
What is it even saying though?
It's saying the same as the JavaScript code it was generated from. What exactly the individual bytecode instructions do and how to read them is an internal implementation detail of V8 and can change at any time. As a user of Node (or V8), you're not supposed to have a reason to care; and in particular you shouldn't assume that the bytecode format is stable over time/versions. Printing it at all is meant for debugging engine-internal issues.

How to avoid multiple copies of the code for std::vector<double>?

I've got one shared library -- let's call it the master. It produces one or more slave shared libraries. The slave shared libraries are interacting with the master via an interface, exchanging std::string, std::vector and others.
The compile time of the slave shared libraries must be minimized as this compile is done at the customer site dynamically.
As long as the exchanged object is not a STL container, everything works fine. e.g.
master compiles NonStlObject.cpp and NonStlObject.h and produces global text symbols (T)
client uses NonStlObject.h and creates undefined global text symbols (U)
As soon as an STL container is being exchanged, I end up with 1 + numberOfSlaves copies of the STL code -- and matching compile time -- they are weak symbols (W) in both master and slaves.
Is there any way to avoid this, other than wrapping every STL container?
PS. I don't care to get told, that the version of the compiler used for building the interacting shared libraries must be the same. Of course it must!
PPS. extern template seems to be ignored by the compiler when applied to std::vector
I end up with 1 + numberOfSlaves copies of the code
This is the least of your problems.
The much bigger problem is that it's really hard to achieve ABI compatibility in C++ across different versions of the compiler.
If you compile your code with g++-9.0, and your customer has g++-7.0 installed, chances are your code will not work at all.
You can of course ask the customer to install 9.0, but then they may not be able to build their other programs.
Or they could have g++-10.0, and then you can start seeing crashes if/when the std::string changes its ABI and is no longer compatible with your "master" library.
Is there any way to avoid this, other than wrapping every STL container?
Wrapping every STL class and passing it through a C interface definitely solves the "ABI incompatibility" problem, but I don't see how it solves the "I have N+1 copies of STL" problem (and I am not sure the latter problem needs solving in the first place).

Is there any JIT pre-caching support in NodeJS?

I am using a rather large and performance-intensive nodejs program to generate hinting data for CJK fonts (sfdhanautohint), and for some better dependency tracking I had to end up calling the nodejs program tens of thousands of times from a makefile like this.
This immediately brought me to the concern that doing such is actually putting a lot of overhead in starting and pre-heating the JIT engine, so I decided to find something like ngen.exe for nodejs. It appears that V8 already has some support for code caching, but is there anything I can do to use it in NodeJS?
Searching for kProduceCodeCache in NodeJS's GitHub repo doesn't return any non-bundled-v8 results. Perhaps it's time for a feature request…
Yes, this happens automatically. Node 5.7.0+ automatically pre-caches (pre-heats the JIT engine for your source) the first time you run your code (since PR #4845 / January 2016 here: https://github.com/nodejs/node/pull/4845).
It's important to note you can even pre-heat the pre-heat (before your code is ever even run on a machine, you can pre-cache your code and tell Node to load it).
Andres Suarez, a Facebook developer who works on Yarn, Atom and Babel created v8-compile-cache, which is a tiny little module that will JIT your code and require()s, and save your Node cache into your $TMP folder, and then use it if it's found. Check out the source for how it's done to suit other needs.
You can, if you'd like, have a little check that runs on start, and if the machine architecture is in your set of cache files, just load the cached files instead of letting Node JIT everything. This can cut your load time in half or more for a real-world large project with tons of requires, and it can do it on the very first run
Good for speeding up containers and getting them under that 500ms "microservice" boot time.
It's important to note:
Caches are binaries; they contain machine-executable code. They aren't your original JS code.
Node cache binaries are different for each target CPU you intend to run on (IA-32, IA-64, ARM etc). If you want to pre-cache pre-caches for your users, you must make cache targets for each target architecture you want to support.
Enjoy a ridiculous speed boost :)

Why does a Node.js heapdump shows compiled code?

Hi first time at investigating memory leak in a Node.js application. By reading thru a heapdump snapshot in Chrome Profiler, I see that there is an entry for (compiled code), see attached. I thought Javascript is not compiled, unlike Java. Can anyone shed some lights?
Further, unlike JProfiler and with the way the code was written (without a formal constructor), it is very hard to find the leak, and so far the info the snapshot provides is not quite useful, I have searched for sometime and so far not too much useful info on reading these snapshots, any suggestions?
Thanks!
(compiled code) indeed refers to the code generated by V8's JIT compiler. All JavaScript VMs employed by browsers today are using tiered adaptive JIT compilation - it wouldn't be possible to achieve good performance otherwise. In fact V8 never had an interpreter at all.
That refers to host objects that are implemented in C++, such as the DOM, or the JS built-in functions.

Resources