Debugging (possibly) OpenCV related crash on Jetson TK1 - linux

What I am looking for: I need help debugging consistently happening system crashes on my Jetson TK1.
System: I am using a Jetson TK1 board from NVIDIA. Updated to 21.3.4 Grinch Kernel. All drivers installed, libopencv4tegra installed alongside ROS (using hacked deb packages to not overwrite openCV). Everything used to work perfectly in this exact setup.
When the crashes happen: I am running a VSLAM program, which uses a camera connected on the USB port. The program is making heavy use of OpenCV. The program used to run for over 1 month without problems in the current setup. Now, I am getting consistent system crashes which result in a total system freeze. When I am connected over ssh, I loose connection. When I connect a monitor to see what happens on the system while it crashes, I can see everything freeze. The USB port also seems to turn off, since not even USB mouse and keyboard work anymore post-crash. The Jetson stays on though.
Crash Logs: I have tried looking into the /var/log/ logs, but none of them show any messages for when the crash happens.
I have run memtester before. It didn't return any bad memory. While running and crashing, the memory onboard is used at about 60-75% (as shown by "top"). CPU usage is around 60%.
The weird thing is that this exact setup has been running just like this for over a month now.
I need to know: are there any other logs I could find information about the crash in? How could I find out if this is related to a hardware failure or whether there's a software issue?
Thanks
-Marc

Related

Radeon developer panel not detecting running program

I have a vulkan application I want to profile (to find the bottlenecks on the gpu for optimizations). I am on linux and amd hardware so I downloaded the linux version of the radeon developer tools. I ran it and created a local server and that seems to work.
I then launched my program, but it does not appear on the list of profiling candidates in the panel.
As you can see the connection is fine (green dot), but no applications are detected. I have tried with advanced mode as well but no luck.
I know for a fact the program is running as I can see it and use it, recompile it... Has anyone run into this problem before?

How do I turn off the console on an embedded system built with Yocto?

I am running Linux kernel 4.14.149 built by Yocto Zeus, and I am running 2019.07 U-boot. At the recommendation of our security team, I am trying to get rid of the Linux console. I am not worried about debugging (once I get this to work anyways); we have other ways of getting the system logs out of the machine, and this will not be done on software development boards. That mechanism is already in place and is tested working. We have an i.MX6 as our core (this is an embedded system), and we have dedicated UART5 to our console on dev boards.
I have tried a few different methods to do this. The first was to disable the framebuffer console kernel config (CONFIG_FRAMEBUFFER_CONSOLE). The primary issue with this approach is that it disabled the splash screen. We have a splash screen that is put up in U-boot (and it is displayed again by Linux), but Linux appears to reset the framebuffer or something when it is booting, resulting in the display flickering and being blank for a bit before our applications start, which was unacceptable (and is the reason we put the splash screen up in both U-boot and Linux).
I also tried just setting "console=" on our command line. This is close to what we want to achieve in that the console doesn't come out the UART anymore, but we see it start to appear on the display on top of the splash screen. I haven't found any way to fix that (I can upload a screenshot if desired).
Just eliminating the console parameter entirely didn't appear to work, it still came out the UART. This is to be expected based on the serial console documentation which says it just uses the first available device.
I have tried commenting out the console initialization in main.c in the Linux source, which exploded rather quickly.
I tried setting to be a netconsole (see Where do you send the kernel console on an embedded system?) but the splash screen still got overwritten, same as the setting it nothing case.
The last thing I have tried was just setting it to a bogus device ("console=ttymxc9" on the Linux command line). While this appears to work (there is no data on the display or the UART) it appears to stall (crash?) partway through bootup and without being able to get the logs (it stalls before our application service runs). I say stall because we have Linux configured for a heartbeat and we do still get proper LED heartbeat behavior. None of the systemd services I added to our build however appear to run (I added one to save the journalctl log file after boot to a file on an external SD card for debugging purposes until I get this working)
At this point, I have run out of ideas on how to get rid of the console while keeping the splash screen intact. What is the proper way to disable the Linux console?
For kernel versions 5.11 and newer:
In the submenu "Character devices" under "Device Drivers" from make menuconfig, there is an option called "Null TTY driver" (CONFIG_NULL_TTY) that you can enable and add console=ttynull to the kernel boot cmdline so that all console output will be simply discarded.
You can also disable CONFIG_VT and CONFIG_UNIX98_PTYS, since you don't need to interact with your program via console at all.
For older kernels (like my 4.14):
You can add this support with the diffs at: https://lore.kernel.org/lkml/20190403131213.GA4246#kroah.com/T/ and then follow the instructions above.
More recent versions of yocto use systemd and a service called getty.target to load the serial port console. Disable by running the following command (once):
systemctl mask getty.target
This answer may not fully fit your question, however, it could serve as a research source for other users, just like me. I use the commands below to temporarily turn the console (ttyS0) on and off.
systemctl stop serial-getty#ttyS0.service
and
systemctl start serial-getty#ttyS0.service

Jetson TK1 booting issues

Received my Jetson TK1 yesterday. After unboxing it and configuring the Linux GUI, rebooting the device with a mouse (cordless) attached to its USB 3.0 port takes it to some sort of Command line page where it probably loads some files and then the screen starts printing " [ . ] ". Nothing happens beyond that until I restart the board without any USB peripheral and then the device boots into the normal Linux GUI. Unable to figure out what's wrong with my board and why is it not working properly.(I am a newbie to LINUX)
P.S.: Connecting the monitor via HDMI after switching on the device gives no visual output, just a blank screen. Is it possible to connect the device via network adapter for remote access even it the screen is running blank?
The question is quite old, but as some people might get frustrated with it, I'll provide the answer for most probable cause.
Upgrading the board running 19.X release causes libglx.so to be corrupted. The issue have been actively discussed on NVIDIA forums and the best way to solve it is to upgrade to 21.X.
Otherwise, you can try recovering the libglx.so in the usr/lib/xorg/modules/extensions/ from Tegra124_Linux_R19.3.0_armhf.tbz2.
Could you possibly provide a bit more information about your situation.
Are you able to go to command mode by pressing 'CTRL+F1' or 'CTRL+ALT+F1'?
If that works, it means your Jetson operating system is working but only the GUI is not working properly.
Yes, You can use ssh to your Jetson (what I do) if only the GUI of Jetson is broken and your OS is working properly. Note that in order to do so you need to know ip address of your Jetson and perform some possible router configuration.
Note: Sometimes if you have a USB device connected to your system (jetson), the jetson might mistakenly assume the USB is storage type and therefore tries to boot up from the USB. This leads to failure since it can not load any OS of the USB. (I'm not sure if this is the case for you)

Problems installation of camera drivers - linux

I want to install the drivers of the video camera on my linux computer.
I write the command:
modprobe usbserial vendor=... product=...
what I expected to get was ttyUSB0 (or sth simmilar) in the /dev directory.
Instead what is getting installed is sg3 (whatever that is) and when I run a program that is supposed to send a command to start recording I get no results (but no errors either).
(I changed what I had previously: fd = open(/dev/ttyUSB0,...) to /dev/sg3 but I guess this is not a configuration that enables sending this kind of data.
What might be the problem? (Sorry if it's a basic question)
cameras and linux can be tricky.
Start by plugging in the camera and running
lsusb
google for the id to see if anyone has a step by step tutorial or at very least can tell you which modules are needed.
Most common drivers seem to have been migrated to the kernel, so rebulid your kernel make sure the modules are built.
Some more obscure usb modules have to be built by hand.

Linux process goes in D-state when I connect serial device via usb hub

I have a serial GPS connected to an embedded PC via serial<->USB adapter (Prolific PL2303). Every 5 minutes a shell script runs a Python script that reads GPS data via Pyserial then upload them to Internet. If I plug my GPS directly to the PC (via PL2303) everything is ok and my system runs forever BUT if I use a usb HUB between pl2303 and the PC I have a this problem: the Python script runs ok for about 3-6 hours then it goes in a D-state (uninterruptible sleep) and the shell script cannot run it again (I can only shutdown the system, no kill possible). I checked my script and I used usb hubs from various vendors (powered and not) with the same result.
PS my embedded pc (from Embeddedarm) runs an updated Debian Lenny.
Ho can I fix it ?
A process in D state means the kernel (most probably a device driver), has put your process into uninterruptible sleep.
To be honest, there is probably quite little you can do about it as a user, unless you intend to debug the kernel USB stack and/or specific USB chipset device driver.
Here is what would do -
Make sure the kernel configuration of you embedded device has the kernel config option for the magic sysreq key and the run time configuration for it turned on. See: http://en.wikipedia.org/wiki/Magic_SysRq_key on how to do that.
Recreate the problem (have the process get stuck in D state).
Find out the PID of the stuck python script with ps and run strace -p PID on it. This will give you the specific system call that the process is sleeping in.
Send the magic sysreq key command 't', that lists all tasks and their kernel stack to console. Look for the specific task of the python script by PID, see at what part of the kernel code exactly are you stuck.
Open the kernel code and try to debug the problem if you can, or port it to the relevant mailing list if you don't.
One more suggestion would be to try and see if the problem goes away in a more recent kernel version then Debian ships. If so, you know it is a bug fixed in newer version of the kernel and you have the choice of either using the newer version and try to port the fix to the old version you care using.
Good luck! you'll need it...
Ubuntu launchpad has a bug filed that is suspiciously like yours. The workaround suggested is:
modprobe -r pl2303
modprobe pl2303
See if this works around the bug?

Resources