Why is Node.js's bytecode output so large? - node.js

Running node --print-bytecode on a file containing code as minimal as just void 0; produces a file the size of which is 2.3 MB. Looking through it, I thought it was including assembly code generated by JIT compilation, but that doesn't appear to be the case. Still, that doesn't explain why it's big. Compare it to the output of javap which, even though it's for a different language, is much smaller and much more readable.

I thought it was including assembly code generated by JIT compilation, but that doesn't appear to be the case.
Correct, bytecode is not assembly code. Assembly code will be generated later, for functions that run hot enough that V8 decides optimizing them is likely worth the time investment.
Why is Node.js's bytecode output so large? [...] on a file containing code as minimal as just void 0;
A lot of Node itself is written in JavaScript, and all of that gets compiled to bytecode before being executed. So even if you run Node on an empty file, you'll see quite a bit of bytecode.
What is it even saying though?
It's saying the same as the JavaScript code it was generated from. What exactly the individual bytecode instructions do and how to read them is an internal implementation detail of V8 and can change at any time. As a user of Node (or V8), you're not supposed to have a reason to care; and in particular you shouldn't assume that the bytecode format is stable over time/versions. Printing it at all is meant for debugging engine-internal issues.

Related

Executing bytecode without source in v8 engine

I try to use this source https://github.com/JoseExposito/v8-compiler/blob/master/addon/v8-compiler.cpp in my project. But a function compilation_cache() called from runScript always returns NULL. What can be wrong? What I need to fix in the source to work with the later version of v8?
In addition, why is there no flag in the latest version of v8 FLAG_serialize_toplevel?
Two parts here:
First a warning: you should definitely not be writing code using any part of the i (internal) namespace. The APIs in the internal namespace don't always reflect how JS actually runs (due to optimizations), plus they change often and without warning.
In the case of the code you copied from V8 internals, V8 already provides an API to produce and consume "cached data" which is just the bytecode in a serialized form. Here's an example from the source of Node.js for how to produce a code cache.
Secondly your actual question: V8 will always perform a check on bytecode using a special hash (basically V8 version and source text length) and if it doesn't match it won't use the bytecode cache. This is why changing V8 versions causes the bytecode to be rejected.

I-7188ex's weird behaviour

I have quite complex I/O program (written by someone else) for controller ICPDAS i-7188ex and I am writing a library (.lib) for it that does some calculations based on data from that program.
Problem is, if I import function with only one line printf("123") and embed it inside I/O, program crashes at some point. Without imported function I/O works fine, same goes for imported function without I/O.
Maybe it is a memory issue but why should considerable memory be allocated for function which only outputs a string? Or I am completely wrong?
I am using Borland C++ 3.1. And yes, I can't use anything newer since controller takes only 80186 instruction set.
If your code is complex then sometimes your compiler can get stuck and compile it wrongly messing things up with unpredictable behavior. Happen to me many times when the code grows ... In such case usually swapping few lines of code (if you can without breaking functionality) or even adding few empty or rem lines inside code sometimes helps. Problem is to find the place where it do its thing. You can also divide your program into several files compile each separately to obj and then just link them to the final file ...
The error description remembers me of one I did fight with a long time. If you are using class/struct/template try this:
bds 2006 C hidden memory manager conflicts
may be it will help (did not test this for old turbo).
What do you mean by embed into I/O ? are you creating a sys driver file? If that is the case you need to make sure you are not messing with CPU registers. That could cause a lot of problems try to use
void some_function_or_whatever()
{
asm { pusha };
// here your code
printf("123");
asm { popa };
}
If you writing ISR handlers then you need to use interrupt keyword so compiler returns from it properly.
Without actual code and or MCVE is hard to point any specifics ...
If you can port this into BDS2006 or newer version (just for debug not really functional) then it will analyse your code more carefully and can detect a lot of hidden errors (was supprised when I ported from BCB series into BDS2006). Also there is CodeGuard option in the compiler which is ideal for finding such errors on runtime (but I fear you will not be able to run your lib without the I/O hw present in emulated DOS)

Is there any JIT pre-caching support in NodeJS?

I am using a rather large and performance-intensive nodejs program to generate hinting data for CJK fonts (sfdhanautohint), and for some better dependency tracking I had to end up calling the nodejs program tens of thousands of times from a makefile like this.
This immediately brought me to the concern that doing such is actually putting a lot of overhead in starting and pre-heating the JIT engine, so I decided to find something like ngen.exe for nodejs. It appears that V8 already has some support for code caching, but is there anything I can do to use it in NodeJS?
Searching for kProduceCodeCache in NodeJS's GitHub repo doesn't return any non-bundled-v8 results. Perhaps it's time for a feature request…
Yes, this happens automatically. Node 5.7.0+ automatically pre-caches (pre-heats the JIT engine for your source) the first time you run your code (since PR #4845 / January 2016 here: https://github.com/nodejs/node/pull/4845).
It's important to note you can even pre-heat the pre-heat (before your code is ever even run on a machine, you can pre-cache your code and tell Node to load it).
Andres Suarez, a Facebook developer who works on Yarn, Atom and Babel created v8-compile-cache, which is a tiny little module that will JIT your code and require()s, and save your Node cache into your $TMP folder, and then use it if it's found. Check out the source for how it's done to suit other needs.
You can, if you'd like, have a little check that runs on start, and if the machine architecture is in your set of cache files, just load the cached files instead of letting Node JIT everything. This can cut your load time in half or more for a real-world large project with tons of requires, and it can do it on the very first run
Good for speeding up containers and getting them under that 500ms "microservice" boot time.
It's important to note:
Caches are binaries; they contain machine-executable code. They aren't your original JS code.
Node cache binaries are different for each target CPU you intend to run on (IA-32, IA-64, ARM etc). If you want to pre-cache pre-caches for your users, you must make cache targets for each target architecture you want to support.
Enjoy a ridiculous speed boost :)

Allocating a data page in linux with NX bit turned off

I would like to generate some machine code in my program and then run it. One way to do it would be to write out a .so file and then load it in the program but that seems too expensive.
IS there a way in linux for me to write out the code in my data pages and then set my function ointer there and just call it? I've seen something similar on windows where you can allocate a page with the NX protection turned off for that page, but I can't find a similar OS call for linux.
The mmap(2) (with munmap(2)) and mprotect(2) syscalls are the elementary operations to do that. Recall that syscalls are elementary operations from the point of view of an application. You want PROT_EXEC
You could just strace any dynamically linked executable to get a clue about how you might call them, since the dynamic linker ld.so is using them.
Generating a shared object might be less expensive than you imagine. Actually, generating C code, running the compiler, then dlopen-ing the resulting shared object has some sense, even when you work interactively. My MELT domain specific language (to extend GCC) is doing this. Recall that you can do a big lot of dlopen-s without issues.
If you want to generate machine code in memory, you could use GNU lightning (quick generation of slow machine code), libjit from dotgnu (generate less bad machine code), LuaJit, asmjit (x86 or amd64 specific), LLVM (slowly generate optimized machine code). BTW, the SBCL Common Lisp implementation is dynamically compiling to memory and produces good machine code at runtime (and there is also all the JIT for JVMs doing that).

Typecheck generated code that access dynamically loaded code in Haskell

I need a fast way (1000's of typechecks per second) to typecheck generated Haskell source code.
We tried hint which was fast enough except it cannot access dynamically loaded code unless the source code is available which we would not have in some cases. Maybe there is some way to register dynamically loaded code as a package or something since hint can access registered packages it seems?
We tried using the ghc api, but it appears to require the files be on disk and all the file IO required makes it too slow.
We can use haskell-src-exts to generate the code, but we need to typecheck it.
Thousands of type checks per second doesn't seem feasible sequentially -- you're doing these concurrently, with some hope for parallelism I hope?
And I assume you are supporting the full GHC type system? So a stripped down type checker (e.g. THIH won't suffice).
Use ghc-api, with bytecode and no optimizations
Cache everything in memory
Submit modifications to GHC to ensure it can take FDs from memory buffers, if necessary

Resources