Systemd service interrupting CAN bus process - linux

I've made a program that communicates with hardware over CAN bus. When I start my program via CLI, everything seems to run fine, but starting the process via a Systemd service leads to paused traffic
I'm making a system that communicates with hardware over CAN bus. When I start my program via CLI, everything seems to run fine, I'll quantify this in a second.
Then I created systemd services, like below, to autostart the process on system power up.
By plotting log timestamps, we noticed that there are periodic pauses in the CAN traffic, anywhere between 250ms to a few seconds, every 5 or so minutes (not a regular rate), within a 30 minute window. If we switch back to starting up via CLI, we might get one 100ms drop over a 3 hour period, essentially no issue.
Technically, we can tolerate pauses like this in the traffic, but the issue is that we don't understand the cause of these dropped messages (run via systemd vs starting up manually via command line).
Does anyone have an inkling what's going on here?
Other notes:
- We don't use any environment variables or parameters (read in via config file).
- We've just watched CAN traffic with nothing running, no drops, so we're pretty confident it's not our hardware/socketCAN driver
- We've tried starting via services on an Arch laptop and didn't see this pausing behavior.
[Unit]
Description=Simple service to start CAN C2 process
[Service]
Type=simple
User=dzyne
WorkingDirectory=/home/thisguy/canProg/build/bin
ExecStart=/home/thisguy/canProg/build/bin/piccolo
Restart=on-failure
# or always, on-abort, etc
RestartSec=5
[Install]
WantedBy=multi-user.target
I'd expect that no pauses between messages larger then about ~20-100ms, our tolerance, when run via system service

Related

Systemd get reason for watchdog timeout

I have to debug an application that always gets killed via SIGABRT signal due to some mysterious watchdog timeout in systemd after exactly 3 minutes. Is there any logging etc. that helps me find out which of the many systemd parameters triggers the abort?
The application needs to notify watchdog messages to systemd. There are several ways of doing this.
The watchdog internal is set in the systemd service file, and the line looks like
WatchdogSec=4s
3 minutes seems like a long time, so it looks like the app is not feeding the watchdog.
See https://www.freedesktop.org/software/systemd/man/sd_notify.html for documentation on how to feed the watchdog.

Running a cron every 1 day and 10 seconds?

Is it possible to run a cron every 86410 seconds or simply every 1 day and 10 seconds?
I have a service which takes 24 hours to process the data from the moment it is called! Now, I need to make sure that I am giving the service enough time to process the data so instead of calling the service every 24 hours, I need to call the service every 24 hours and few seconds!
Is it possible using a cron?
I think it is interesting to note that if you do want to restart the processing every 86410 seconds, then your service start times would drift over days, later and later every time - so if you originally scheduled your process to start at 08:00, after about a year it would be starting at 09:00, and after about 23.6 years, it would go around the clock to start again at 8am.
Cron was definitely not designed for that kind of thing :-)
But if you are running on a recent Linux OS, you can use SystemD timer units to do exactly that. You may be familiar with SystemD service units - as this is how you write services for modern Linuxes, but SystemD can do a lot more, and one of those things is scheduling things that require interesting schedules.
Supposed you run your processing job as a SystemD service, it may look something like this:
/etc/systemd/system/data-processing.service
[Unit]
Description=Process some data
[Service]
Type=simple # its the default, but I thought I'd be explicit
ExecStart=/usr/bin/my-data-processor
You can then set up a timer unit to launch this service every 86410 seconds very simply - create a timer unit file in /etc/systemd/system/data-processing.timer with this content:
[Unit]
Description=start processing every day and 10 seconds
[Timer]
OnBootSec=0 # Start immediately after bootup
# Start the next processing 86410 seconds after the last start
OnUnitActive=86410
AccuracySec=1 # change from the default of 60, otherwise
# the service might start 86460 after the last start
[Install]
WantedBy=timers.target
Then just enable and start the timer unit - but not the service. If the service is enabled, you probably want to disable it as well - the timer will take care of running it as needed.
systemctl daemon-reload
systemctl enable data-processing.timer
systemctl start data-processing.timer
Looking at it a bit more, you mentioned that you want to start the next run of the service after the previous run has completed. What happens if it doesn't take exactly 86400 seconds to finish processing? If we change the requirement to be "restart the data processing service after it finished running, but give it 10 seconds to cool down first" then you don't need a timer at all - you just need to have SystemD restart the service after a 10 seconds cooldown, whenever it is done.
We can change the service unit above to do exactly that:
[Unit]
Description=Process some data
[Service]
Type=simple
ExecStart=/usr/bin/my-data-processor
Restart=always
RestartSec=10

what is the advantage of using supervisord over monit

We have a custom setup which has several daemons (web apps + background tasks) running. I am looking at using a service which helps us to monitor those daemons and restart them if their resource consumption exceeds over a level.
I will appreciate any insight on when one is better over the other. As I understand monit spins up a new process while supervisord starts a sub process. What is the pros and cons of this approach ?
I will also be using upstart to monitor monit or supervisord itself. The webapp deployment will be done using capistrano.
Thanks
I haven't used monit but there are some significant flaws with supervisord.
Programs should run in the foreground
This means you can't just execute /etc/init.d/apache2 start. Most times you can just write a one liner e.g. "source /etc/apache2/envvars && exec /usr/sbin/apache2 -DFOREGROUND" but sometimes you need your own wrapper script. The problem with wrapper scripts is that you end up with two processes, a parent and child. See the the next flaw...
supervisord does not manage child processes
If your program starts child process, supervisord wont detect this. If the parent process dies (or if it's restarted using supervisorctl) the child processes keep running but will be "adopted" by the init process and stay running. This might prevent future invocations of your program running or consume additional resources. The recent config options stopasgroup and killasgroup are supposed to fix this, but didn't work for me.
supervisord has no dependency management - see #122
I recently setup squid with qlproxy. qlproxyd needs to start first otherwise squid can fail. Even though both programs were managed with supervisord there was no way to ensure this. I needed to write a start script for squid that made it wait for the qlproxyd process. Adding the start script resulted in the orphaned process problem described in flaw 2
supervisord doesn't allow you to control the delay between startretries
Sometimes when a process fails to start (or crashes), it's because it can't get access to another resource, possibly due to a network wobble. Supervisor can be set to restart the process a number of times. Between restarts the process will enter a "BACKOFF" state but there's no documentation or control over the duration of the backoff.
In its defence supervisor does meet our needs 80% of the time. The configuration is sensible and documentation pretty good.
If you want to additionally monitor resources you should settle for monit. In addition to just checking whether a process is running (availability), monit can also perform some checks of resource usage (performance, capacity usage), load levels and even basic security checks (md5sum of a bianry file, config file, etc). It has a rule-based config which is quite easy to comprehend. Also there is a lot of ready to use configs: http://mmonit.com/wiki/Monit/ConfigurationExamples
Monit requires processes to create PID files, which can be a flaw, because if a process does not create pid file you have to create some wrappers around. See http://mmonit.com/wiki/Monit/FAQ#pidfile
Supervisord on the other hand is more bound to a process, it spawns it by itself. It cannot make any resource based checks as monit. It has a nice CLI servicectl and a web GUI though.

Executing process on Linux from WSGI based web application

I have a dashboard and I want a process to run when the user clicks on a button. That process might take a long time to complete.
My options so far:
using popen or something similar to execute the process
having a daemon monitor a directory. When this directory is changed (a file created) the daemon will do the job and then delete the file before idling again.
using cron, running every 5 seconds and also monitoring some directory.
Which one is more Linux-friendly? Is there any I have not considered?
This is what task queueing systems like Celery and Redis Queue are for.
Another option is to have a daemon (as in your 2nd option) that listen on some socket. Then, your WSGI application could just connect & send a command. There are many possibilities for how the communication over the socket would take place, choosing the right one depends a lot on the actual case.
This have the advantage that you can eventually have the two application (WSGI and the daemon) run on different computers or VMs at some point.

Maintaining a long-running task on Linux

My system includes a task which opens a network socket, receives pushed data from the network, processes it, and writes it out to disk or pings other machines depending on the messages. This task is intended to run forever, and the service is designed to have this task always running. But sometimes it crashes.
What's the best practice for keeping a task like this alive? Assume it's okay for the task to be dead for up to 30 seconds before we restart it.
Some obvious ideas include having a watchdog process that checks to make sure the process is still running. Watchdog could be triggered by cron. But how does it know if the process is alive or not? Write a pidfile? touch a heartbeat file? An ideal solution wouldn't continuously spin up more processes if the machine gets bogged down to the point where the watchdog is running faster than the heartbeat.
Are there standard linux tools for this? I can imagine a solution that uses a message queue, but I'm not sure if that's a good idea or not.
Depending on the nature of the task that you wish to monitor, one method is to write a simple wrapper to start up your task in a fork().
The wrapper task can then do a waitpid() on the child and restart it if it is terminated.
This does depend on modifying the source for the task that you wish to run.
sysvinit will restart processes that die, if added to inittab.
If you're worried about the process freezing without crashing and ending the process, you can use a heartbeat and hard kill the active instance, letting init restart it.
You could use monit along with daemonize. There are lots of tools for this in the *nix world.
Supervisor was designed precisely for this task. From the project website:
Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.
It runs as a daemon (supervisord) controlled by a command line tool, supervisorctl. The configuration file contains a list of programs it is supposed to monitor, among other settings.
The number of options is quite extensive, -- have a look at the docs for a complete list. In your case, the relevant configuration section might be something like this:
[program:my-network-task]
command=/bin/my-network-task # where your binary lives
autostart=true # start when supervisor starts?
autorestart=true # restart automatically when stopped?
startsecs=10 # consider start successful after how many secs?
startretries=3 # try starting how many times?
I have used Supervisor myself and it worked really well once everything was set up. It requires Python, which should not be a big deal in most environments but might be.

Resources