what's the different between the hotspot jvm interpreter and jit - jvm-hotspot

what's the different between the hotspot jvm interpreter and jit?I had got confused from the opinion from the book i had read, the interpreter execute the code line by line, does it mean that the interpreter will translate the bytecode to the machine code and then execute them?

Compiler and Interpreter do the same job of converting High lavel code to machine understandable code
interpreter takes the single instruction of the code, translates it into an intermediate code and then into a machine code, executes it and takes another instruction and continue for all the instructions
JIT compiler does optimization on high lavel code and converts entire code into machine understandable code through scanning all the code.
You can find the more details here - http://www.whizlabs.com/blog/what-is-just-in-time-compiler-difference-between-compiler-and-interpreter/

Related

How to run multi-thread programs in rocket-chip?

I am trying to run multi-thread programs/benchmark in a rocket-chip SoC I generated from chipyard.
I generated the TutorialConfig SoC given in https://fires.im/isca22-slides-pdf/03_building_custom_socs.pdf, which consists of a Rocket core and a BOOM core.
To check whether I can run a multi-threaded program in this configuration, I compiled the mt-matmul benchmark in riscv-tests after changing the number of cores in crt.S file.
I ran it using the following command:
make CONFIG=TutorialStarterConfig run-binary BINARY=riscv-tests/benchmarks/mt-matmul.riscv
In the output trace, I can only see 'C0' at the beginning of each line, I am assuming I should see 'C1' if the program was executed in a second core.
Is this the correct way to run multi-threaded programs with RocketChip SoCs?
Do I need to change anything else in the programs or in the SoC?

Record dynamic instruction trace or histogram in QEMU?

I've written and compiled a RISC-V Linux application.
I want to dump all the instructions that get executed at run-time (which cannot be achieved by static analysis).
Is it possible to get a dynamic assembly instruction execution historgram from QEMU (or other tools)?
For instruction tracing, I go with -singlestep -d nochain,cpu, combined with some awk. This can become painfully slow and large depending on the code you run.
Regarding the statistics you'd like to obtain, delegate it to R/numpy/pandas/whatever after extracting the program counter.
The presentation or video of user "yvr18" on that topic, might cover some aspects of QEMU tracing at various levels (as well as some interesting heatmap visualization).
QEMU doesn't currently support that sort of trace of all instructions executed.
The closest we have today is that there are various bits of debug logging under the -d switch, and you can combine the tracing of "instructions translated from guest to native" with the "blocks of translated code executed" translation to work out what was executed, but this is pretty awkward.
Alternatively you could try scripting the gdbstub interface to do something like "disassemble instruction at PC; singlestep" which will (slowly!) give you all the instructions executed.
Note: There ongoing work to improve QEMU's ability to introspect guest execution so that you can write a simple 'plugin' with functions that are called back on events like guest instruction execution; with that it would be fairly easy to write a dump of guest instructions executed (or do more interesting processing), but this is still work-in-progress, so not available yet.
It seems you can do something similar with rv8 (https://github.com/rv8-io/rv8), using the command:
rv-jit -l
The "spike" RISC-V emulator allows tracing instructions executed, new values stored into registers, or just simply a histogram of PC values (from which you can extract what instruction was at each PC location).
It's not as fast as qemu, but runs at 100 to 200 MIPS on current x86 hardware (at least without tracing enabled)

How does Spark do bytecode to machine code instructions run time conversion?

After reading some articles about Whole State Code Generation, spark does bytecode optimizations to convert a query plan to an optimized execution plan.
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html
Now my next question is but still after doing these optimizations related to bytecodes and all, it might still be plausible that conversion of those bytecode instructions to machine code instructions could be a possible bottleneck because this is done by JIT alone during the runtime of the process and for this optimization to take place JIT has to have enough runs.
So does spark do anything related to dynamic/runtime conversion of optimized bytecode ( which is an outcome of whole stage code gen) to machine code or does it rely on JIT to convert those byte code instructions to machine code instructions. Because if it relies on JIT then there are certain uncertainties involved.
spark does bytecode optimizations to convert a query plan to an optimized execution plan.
Spark SQL does not do bytecode optimizations.
Spark SQL simply uses CollapseCodegenStages physical preparation rule and eventually converts a query plan into a single-method Java source code (that Janino compiles and generates the bytecode).
So does spark do anything related to dynamic/runtime conversion of optimized bytecode
No.
Speaking of JIT, WholeStageCodegenExec does this check whether the whole-stage codegen generates "too long generated codes" or not that could be above spark.sql.codegen.hugeMethodLimit Spark SQL internal property (that is 8000 by default and is the value of HugeMethodLimit in the OpenJDK JVM settings).
The maximum bytecode size of a single compiled Java function generated by whole-stage codegen. When the compiled function exceeds this threshold, the whole-stage codegen is deactivated for this subtree of the current query plan. The default value is 8000 and this is a limit in the OpenJDK JVM implementation.
There are not that many physical operators that support CodegenSupport so reviewing their doConsume and doProduce methods should reveal whether if at all JIT might not kick in.

Allocating a data page in linux with NX bit turned off

I would like to generate some machine code in my program and then run it. One way to do it would be to write out a .so file and then load it in the program but that seems too expensive.
IS there a way in linux for me to write out the code in my data pages and then set my function ointer there and just call it? I've seen something similar on windows where you can allocate a page with the NX protection turned off for that page, but I can't find a similar OS call for linux.
The mmap(2) (with munmap(2)) and mprotect(2) syscalls are the elementary operations to do that. Recall that syscalls are elementary operations from the point of view of an application. You want PROT_EXEC
You could just strace any dynamically linked executable to get a clue about how you might call them, since the dynamic linker ld.so is using them.
Generating a shared object might be less expensive than you imagine. Actually, generating C code, running the compiler, then dlopen-ing the resulting shared object has some sense, even when you work interactively. My MELT domain specific language (to extend GCC) is doing this. Recall that you can do a big lot of dlopen-s without issues.
If you want to generate machine code in memory, you could use GNU lightning (quick generation of slow machine code), libjit from dotgnu (generate less bad machine code), LuaJit, asmjit (x86 or amd64 specific), LLVM (slowly generate optimized machine code). BTW, the SBCL Common Lisp implementation is dynamically compiling to memory and produces good machine code at runtime (and there is also all the JIT for JVMs doing that).

Address space identifiers using qemu for i386 linux kernel

Friends, I am working on an in-house architectural simulator which is used to simulate the timing-effect of a code running on different architectural parameters like core, memory hierarchy and interconnects.
I am working on a module takes the actual trace of a running program from an emulator like "PinTool" and "qemu-linux-user" and feed this trace to the simulator.
Till now my approach was like this :
1) take objdump of a binary executable and parse this information.
2) Now the emulator has to just feed me an instruction-pointer and other info like load-address/store-address.
Such approaches work only if the program content is known.
But now I have been trying to take traces of an executable running on top of a standard linux-kernel. The problem now is that the base kernel image does not contain the code for LKM(Loadable Kernel Modules). Also the daemons are not known when starting a kernel.
So, my approach to this solution is :
1) use qemu to emulate a machine.
2) When an instruction is encountered for the first time, I will parse it and save this info. for later.
3) create a helper function which sends the ip, load/store address when an instruction is executed.
i am stuck in step2. how do i differentiate between different processes from qemu which is just an emulator and does not know anything about the guest OS ??
I can modify the scheduler of the guest OS but I am really not able to figure out the way forward.
Sorry if the question is very lengthy. I know I could have abstracted some part but felt that some part of it gives an explanation of the context of the problem.
In the first case, using qemu-linux-user to perform user mode emulation of a single program, the task is quite easy because the memory is linear and there is no virtual memory involved in the emulator. The second case of whole system emulation is a lot more complex, because you basically have to parse the addresses out of the kernel structures.
If you can get the virtual addresses directly out of QEmu, your job is a bit easier; then you just need to identify the process and everything else functions just like in the single-process case. You might be able to get the PID by faking a system call to get_pid().
Otherwise, this all seems quite a bit similar to debugging a system from a physical memory dump. There are some tools for this task. They are probably too slow to run for every instruction, though, but you can look for hints there.

Resources