Full system simulation at Gem5, RISC-V - riscv

I want to run a 64-bit RISC-V binary without OS in gem5’s fs mode. I tried --kernel= but it didn’t stop.

This will be a feature in the upcoming 21.0 release. See this Jira issue for the current status: https://gem5.atlassian.net/browse/GEM5-367.

Related

Executable gives different FPS values ​in Yocto and Raspbian(everything looks the same in terms of configuration)

In Yocto project, built my project which is running on Raspbian OS. When i run executable, i get half FPS compared to executable running on Raspbian OS.
The libraries i use:
OpenCV
Tensorflow-Lite, Flatbuffer, Libedgetpu
I use Libedgetpu1-std, Tensorflow-lite 2.4.0 on Raspbian and Libedgetpu 2.5.0, Tensorflow-lite 2.5.0 on Yocto.
Thinking that the problem is that the versions or configurations of the libraries are not the same, i followed these steps:
I ran the executable which i built in Raspbian directly in the runtime of the Yocto project.(I have set the required library versions to the same library versions available in raspbian for it to work in runtime.)
But i still got low FPS. Here is how i calculate that i get half the FPS:
I am using TFLite's interpreter invoke function. I set a timer when entering and exiting the function, i calculate FPS over it. I can exemplify like this:
Timer_Begin();
m_tf_interpreter->Invoke();
Timer_End();
Somehow i think the Interpreter Invoke function is running slower on the Yocto side. I checked Kernel versions, CPU speeds, /boot/config.txt contents, USB power consumes of Raspbian and Yocto. However, I couldn't catch anything from anywhere.
Note : Using RPI4 and Coral-TPU(Plugged into USB 2.0).
We spoke with #Paulo Neves. He recommend Perf profiling and i did . In the perf profiling, i noticed that the CPU is running slowly. Although the frequencies are the same.
When i check the "scaling_governor", i saw that it was in "powersave" mode. The problem solved when i switched from "powersave" to "performance" mode from virtual kernel.
In addition, if you want to make the governor change permanent, you need to create a kernel config fragment.

using BCM2835 with RT-PREEMPT kernel

I am making a project which send out 40khz signal from antenna.
I have found the signal is not too accurate so I have decided to try a real-time kernel.
I run Raspbian Jessie on my Raspberry-Pi 2B.
After clean install, the script run without any problem.
bcm2835_delayMicroseconds could be run.
I follow this tutorial http://www.frank-durr.de/?p=203 compiled and installed the RT kernal.
However, the script could no longer be run successfully.
After showing "HIGH SLEEP", and it is held up.
This is the code snippet:
fprintf(stdout , "HIGH\n");
bcm2835_gpio_write(PIN, HIGH);
fprintf(stdout , "SLEEP\n");
bcm2835_delayMicroseconds(12);
fprintf(stdout , "LOW\n");
bcm2835_gpio_write(PIN, LOW);
fprintf(stdout , "SLEEP\n");
bcm2835_delayMicroseconds(12);
Do I miss anything when compiling the kernel?
To use PREEMPT_RT you just have to:
retrieve the configuration of your current kernel
retrieve the kernel sources
patch the kernel sources with the PREEMPT_RT patch (or obtain an already patched kernel)
configure the new kernel as the current kernel (i.e., using make oldconfig)
enable full preemtability in kernel config (e.g., by running make menuconfig).
compile the kernel in the standard way
install the new kernel
Therefore, no particular action is needed.
Then, if performance is still not sufficient, you may want to tune priorities of specific IRQ threads.
From your specific error, it seems that the new kernel has been conpiled with a different configuration than the current kernel (e.g., GPIOs not enabled).
I have just seen and remembered this thread.
About half year ago, I want to generate 40khz from a Raspberry.
But finally I found I am using the wrong tool.
I believe Raspberry cannot handle such a task, since it is running an OS.
I switched to Arduino, and the problem is solved immediately without any problem.
Using the right tool for your task is very important!

ARMv8 - Running legacy 32 bit Applications on 64 bit OS

Going thru the ARMv8 manual, I have the following questions to help understand the big picture.
Can legacy 32 bit app. (ARMv7 or earlier) run as is on the ARMv8 OS?
If the legacy applications need to be rebuilt for ARMv8 and assuming that I rebuild the application as 32 bit (Aarch32), does this need 32 bit OS underlying support? (It is interesting to know how the addressing mechanism works here.)
Please provide references wherever possible.
PS: I am targeting Linux OS with Aarch64 support (3.7 and later)
Aarch64 platform may run 32bit ARM but this compatibility is optional.
To run AArch32 binaries you need all libraries application would use in 32bit versions. Same as with i686 binaries on x86-64 systems.
There is also a Linux arm64 CONFIG_COMPAT at: https://github.com/torvalds/linux/blob/v4.17/arch/arm64/Kconfig#L1274 which says:
This option enables support for a 32-bit EL0 running under a 64-bit
kernel at EL1. AArch32-specific components such as system calls,
the user helper functions, VFP support and the ptrace interface are
handled appropriately by the kernel.
which will likely be required, and an ARM employee mentioned on this thread: https://community.arm.com/processors/f/discussions/5535/running-armv7-binaries-on-armv8 that userland instructions are basically the same with some exceptions:
For something like a Linux application, then yes. ARMv8-A includes AArch32, which provides backwards compatibility with ARMv7-A. There are some limitations, such as the SWP instruction no longer being supported. But these are types of things that applications are unlikely to be using (and were deprecated in ARMv7).
For baremetal, you have all the usual problems of using a binary from one platform on another. So you are going to need to do some degree of porting in most cases.
I then tried it for myself with this QEMU full system setup but my attempt failed: I compiled a C hello world with the armv7 compiler as:
arm-linux-gcc -static hello_world.c
and put the built file into the aarch64 target, but when I tried to run it it failed with:
a.out: line 1: syntax error: unexpected word (expecting ")")
even though /proc/config.gz says that CONFIG_COMPAT is set.
It seems that the Linux kernel is not identifying it as an ELF file but rather falling back to /bin/sh, I get the same error if I do:
sh /mnt/9p/a.out
is trying to use the shell binfmt instead of ELF.
In particular, I know that the Linux kernel can choose between archs from the binfmt signature because qemu-user does so: https://unix.stackexchange.com/questions/41889/how-can-i-chroot-into-a-filesystem-with-a-different-architechture

CUDA device seems to be blocked

I'm running a small CUDA application: the QuickSort benchmark algorithm (see here). I have a dual system with a NVIDIA 660GTX (device 0) and 8600GTS (device 1).
Under Windows 8 and Visual Studio, the application compiles and runs flawlessly on device 0. Under Linux (Ubuntu 12.04 LTS), the app compiles with nvcc and gcc but suddenly stops in its tracks, returning a (unspecified launch failure).
I have two issues:
After this error, my GPU cannot perform some other operations, e.g., running the SDK example bandwidhtTest blocks when it performs the first data transfer, but running deviceQuery continues to perform well. How can I reset my GPU? I've already tried the cudaDeviceReset() method but it doesn't help
How can I find what is going wrong under linux? Has someone a clue or seen this before?
Thanks in advance for your help!
Using the nvidia-smi utility you can reset the GPU if it is compatible
To my knowledge and experience, (unspecified launch failure) usually referees to segmentation fault. Have you specified the right GPU to use? Try to use cuda-memcheck to see if there is any memory out of bound scenario.
From my experience XID 31 was always caused by accessing bad pointer (aka Memory access violation).
I'd first pursue this trail. Run your application with cuda memcheck. Like that cuda-memcheck you_app args to your app and see if it finds any wrong memory accesses.
Also try stepping though the code with cuda-gdb or Nsight Eclipse Edition.
I've found that using
cuda-memcheck -b ...
prevents the device from locking up.

Is QEMU good for learning programming in assembler for ARM and PowerPC?

I want to learn programming in assembler for PowerPC and ARM, but I'm unable to buy real hardware for this purpose. I'm thinking about using QEMU for that. However I'm not sure if it emulates both architectures enough well, that I'll compile and run my programs in native assembler on it?
QEMU works well for testing program correction (i.e. whether the code would properly run on an actual ARM or PowerPC) but it is not good for testing program efficiency: the emulation is not cycle accurate, and speed measured with QEMU cannot be reliably (or even unreliably) correlated with speed on true hardware.
Also, QEMU will not trap unaligned memory accesses, which is not a problem for PowerPC emulation (the PowerPC tolerates unaligned accesses) but may be for ARM (an unaligned access, e.g. reading a 32-bit word in RAM from an address which is not a multiple of 4, will work fine with QEMU but would trigger an exception on a true ARM processor).
Apart from these points, QEMU is fine for assembly development on ARM or MIPS (haven't tried PowerPC, because I found an old iBook on eBay for that; but I have done ARM and MIPS assembly with QEMU and then ran the resulting code on true hardware, and this worked). You can either emulate a whole system and run Debian in it (in which case the compiler, linker, text editor... will also run in emulation), or use the "user-mode emulation" where the ARM/MIPS executable is run directly, with a wrapper which converts system calls into those for the host PC (this assumes that the host is a PC running Linux). The latter is more convenient (you have access to your normal home directory, programming tools are native...) but requires installing cross-development tools. See buildroot for that (and link with -static, this will avoid many headaches).
Since I have found signs that Debian for PowerPC and for ARM can run on QEMU, I suppose this won't be a problem.

Resources