Opensips suddenly crash in two-three days running

Opensips suddenly crash in two-three days running - voip

I am using opensips, it is working fine but after 2-3 days it suddenly crash. Don't understand following log
CRITICAL:core:receive_fd: EOF on 17
INFO:core:handle_sigs: child process 14090 exited by a signal 11
INFO:core:handle_sigs: core was generated
INFO:core:handle_sigs: terminating due to SIGCHLD
CRITICAL:core:receive_fd: EOF on 17
INFO:core:handle_sigs: child process 14090 exited by a signal 11
INFO:core:handle_sigs: core was generated
INFO:core:handle_sigs: terminating due to SIGCHLD
INFO:core:sig_usr: signal 15 received
How can I investigate what is exactly going wrong with my opensips. I am using Ubuntu, should I change it to Centos or Debian? or what above log dictate error? any idea.

The log isn't telling you anything other than that it's crashed. The question is why.
If you run the same version & config on a different environment you'll probably have the same issues.
The time dependence of the crashes would suggest it's crashing when a specific race condition is met. This could be a call coming in with an invalid Caller ID you're trying to parse as an int, a routing block that's seldom called being called, a resource limitation on the system, or something totally different.
This is a pretty generic crash message, so without more debugging it's just guesswork, so let's enable debugging:
The start of the OpenSIPs config file is where we enable, here's how the default config looks (assuming you've built off the standard template):
####### Global Parameters #########
log_level=3
log_stderror=no
log_facility=LOG_LOCAL0
children=4
/* uncomment the following lines to enable debugging */
#debug_mode=yes
If you change yours to:
####### Global Parameters #########
log_level=8
log_stderror=yes
log_facility=LOG_LOCAL0
children=4
/* uncomment the following lines to enable debugging */
debug_mode=yes
You'll have debugging features enabled and a whole lot more info available in syslog.
Once you've done that sit back and wait for 2 days until it crashes, and you'll have an answer as to what module / routing block / packet is causing your instance to crash.
After that you can post the output here along with your config file, but there's a pretty high chance that someone on the OpenSIPs or Kamailio mailing lists will have had the same issue before.

Related

"shmop_open(): unable to attach or create shared memory segment 'No error':"?

I get this every time I try to create an account to ask this on Stack Overflow:
Oops! Something Bad Happened!
We apologize for any inconvenience, but an unexpected error occurred while you were browsing our site.
It’s not you, it’s us. This is our fault.
That's the reason I post it here. I literally cannot ask it on Overflow, even after spending hours of my day (on and off) repeating my attempts and solving a million reCAPTCHA puzzles. Can you maybe fix this error soon?
With no meaningful/complete examples, and basically no documentation whatsoever, I've been trying to use the "shmop" part of PHP for many years. Now I must find a way to send data between two different CLI PHP scripts running on the same machine, without abusing the database for this. It must work without database support, which means I'm trying to use shmop, but it doesn't work at all:
$shmopid = shmop_open(1, 'w', 0644, 99999); // I have no idea what the "key" is supposed to be. It says: "System's id for the shared memory block. Can be passed as a decimal or hex.", so I've given it a 1 and also tried with 123. It gave an error when I set the size to 64, so I increased it to 99999. That's when the error changed to the one I now face above.
shmop_write($shmopid, 'meow 123', 0); // Write "meow 123" to the shared variable.
while (1)
{
$shared_string = shmop_read($shmopid, 0, 8); // Read the "meow 123", even though it's the same script right now (since this is an example and minimal test).
var_dump($shared_string);
sleep(1);
}
I get the error for the first line:
shmop_open(): unable to attach or create shared memory segment 'No error':
What does that mean? What am I doing wrong? Why is the manual so insanely cryptic for this? Why isn't this just a built-in "superarray" that can be accessed across the scripts?

About CLI:
It cannot work in standalone CLI processes, as an answer here says:
https://stackoverflow.com/a/34533749
The master process is the one to hold the shared memory block, so you will have to use php-fpm or mod_php or some other web/service-running version, and maybe even start/request/stop it all from a CLI php script.
About shmop usage itself:
Use "c" mode in shmop_open() for creating the segment before it can be used with "a" or "w".
I stumbled on this error in a different scenario where shared memory is completely optional to speed up some repeated operations. So I wanted to try reading first without knowing memory size and then allocate from actual data when needed. In my case I had to call it as #shmop_open() to hide this error output.
About shmop on Windows:
PHP 7 crashed Apache worker process causing its restart with status 3221225477 when trying to reallocate a segment with the same predefined (arbitrary number) key but different size, even after shmop_delete(). As a workaround for this, I took filemtime() of the source file which contains data to be stored in memory, and used that timestamp as the key in shmop_open(). It still was not flawless IIRC, and I don't know if it would cause memory leaks, but it was enough to test my code which would mainly run on Linux anyway.
Finally, as of PHP version 8.0.7, shmop seems to work fine with Apache 2.4.46 and mod_php in Windows 10.

How to investigate which process causes wakeups during laptop sleep-mode in MacOS (or Linux)?

My MacBook spontaneously wakes up from sleep mode with high fan activity.
I want to do a investigate this in RTC or power settings? Or by strace-ing of processes, etc (using some process/kernel magic!).
Hint: It is probably managed by "rtcwake".
I am not even sure if this is a scheduled task, or from a WiFi wakeup, or something else.
I don't want guesses about what usually causes this in Mojave, etc. Instead:
I need to do a systematic investigation on this on my MacOS (Mojave). Linux-related answers are also appreciated.
This is about system standby, sleep-mode, suspended mode. (Note that this is not about standup and wakeup of individual processes. The whole laptop turns on spontaneously.)

Reading the log file is the best way to debug the problem.
So, try this command in your Terminal to fetch the system logs,
this will tell you "wake up" history.
log show --style syslog | fgrep "Wake reason: EC.LidOpen"
To see the wake reason:
For macOS Sierra, Mojave, Catalina, and newer
log show |grep -i "Wake reason"
Or for MacOS El Capitan, Yosemite, Mavericks, and older
syslog |grep -i "Wake reason"
This will look like:
MacBookPro kernel[0] : Wake reason = OHC1
MacBookPro kernel[0] : Wake reason = PWRB
MacBookPro kernel[0] : Wake reason = EHC2
MacBookPro kernel[0] : Wake reason = OHC1
So what do these wake reason codes mean?
OHC: stands for Open Host Controller, is usually USB or Firewire. If you see OHC1 or OHC2 it is almost certainly an external USB keyboard or mouse that has woken up the machine.
EHC: standing for Enhanced Host Controller, is another USB interface, but can also be wireless devices and bluetooth since they are also on the USB bus of a Mac.
USB: a USB device woke the machine up
LID0: this is literally the lid of your MacBook or MacBook Pro when you open the lid the machine wakes up from sleep.
PWRB: PWRB stands for Power Button, which is the physical power button on your Mac
RTC: Real Time Clock Alarm, is generally from wake-on-demand services like when you schedule sleep and wake on a Mac via the Energy Saver control panel. It can also be from launchd setting, user applications, backups, and other scheduled events.
There may be some other codes (like PCI, GEGE, etc) but the above are the ones that most people will encounter in the system logs. Once you find out these codes, you can really narrow down what is causing your Mac to wake up from sleep seemingly at random.
Hope this will help :)

This answer is based on Linux, so it might not apply strictly to Mac.
To determine whether rtcwake is responsible for your MacOS wakeups, you could replace the executable (in my Ubutnu it is /usr/sbin/rtcwake) with a wrapper script that leaves a sign of rtcwake having run, e.g.
$ cd /usr/sbin/rtcwake
$ sudo mv rtcwake rtcwake_orig
and then write script /usr/sbin/rtcwake containing
#!/bin/bash
touch $HOME/rtcwake_ran
/usr/sbin/rtcwake_orig
Variants of the script would depend on your shell.
In particular, in the last line you would possibly run rtcwake in some alternative way, so as to not own the process (nohup / disown).
See https://unix.stackexchange.com/questions/152310/how-to-correctly-start-an-application-from-a-shell
To inspect possible causes of wakeup, you can check various relevant logs, at /var/log.
E.g., syslog*, acpi*.
See also https://unix.stackexchange.com/questions/83036/where-is-the-log-for-acpi-events
Do you have wakeonlan?

Here I am documenting my systematic approach. It is loosely based on, and initiated by, the answer by #vijay-rajpurohit, which is in turn based on comment by #Robert #1431720 . Note that the final result is particular to my MacOS machine, based on the logs shown below. It will be different in your MacOS.
In first attempt, I first checked the logs using: log show --style syslog | grep ... but it is taking too long. I accidentally checked /var/log/wifi.log after exploring the /var/log/ (I am also curious about /var/log/powermanagement/*.asl).
This turned out to be most useful:
cat /var/log/wifi.log|grep -i "Wake reason"
Then found this line: (note the EC. bit)
Thu Apr 23 22:41:32.359 Info: <airportd[219]> _systemWokenByWiFi: System wake reason: <EC.ARPT>, was woken by WiFi
Then googled for EC.ARPT, I found the following commands:
pmset -g log Useful stats about "Total Sleep/Wakes since boot".
pmset -g assertions This turned out to show the full answer to this question:
2020-04-24 02:23:38 +0100
Assertion status system-wide:
BackgroundTask 1
ApplePushServiceTask 0
UserIsActive 1
PreventUserIdleDisplaySleep 0
PreventSystemSleep 0
ExternalMedia 0
PreventUserIdleSystemSleep 0
NetworkClientActive 0
Listed by owning process:
pid 111(hidd): [0x0000200a000986a9] 00:00:00 UserIsActive named: "com.apple.iohideventsystem.queue.tickle.4295010950.3"
pid 85(apsd): [0x0003b830000b90bd] 00:00:10 ApplePushServiceTask named: "com.apple.apsd-waitingformessages-push.apple.com"
Kernel Assertions: 0x100=MAGICWAKE
id=504 level=255 0x100=MAGICWAKE mod=24/04/2020, 01:57 description=en0 owner=en0
Idle sleep preventers: IODisplayWrangler
In short, in a systematic approach, I explored the following keywords based on the logs, and googled each :
EC.ARPT (example link)
iohideventsystem (example link)
MAGICWAKE (example link)
ApplePushServiceTask (see below)
Most informative item emerged from the output of pmset -g assertions. For example ApplePushServiceTask in the following line:
pid 85(apsd): [0x0003b830000b90bd] 00:00:10 ApplePushServiceTask named: "com.apple.apsd-waitingformessages-push.apple.com"
The solution that seems to work in my particular case (not a general solution) was to disable :
/System/Library/LaunchDaemons/com.apple.apsd.plist using launchctl. But this cannot be done until you do a csrutil disable in the safe mode. I don't write instructions here because it need caution and you need to enable it later.
(to be updated)

After enabling earlyprintk the kernel prints twice the logs

I'm working on a project at the office where I have to enable the earlyprintk so that I can see the crashes at boot. So far so good, I've added earlyprintk=mydriver ... in the command line + some custom platform settings and I can see now the messages printed at boot.
...mydriver is a custom (the SoC) serial driver which is registered as the earlyprintk driver and does the driver setup at boot time.
However, the issue here is that I can see earlyprintk sending the logs, and when the log timestamp reaches 4,1 seconds, I see again the whole dmesg printed from timestamp 0.
So basically the normal console doesn't resume the printing where the earlyprink left off, but does a complete kernel buffer printing.
So far I couldn't find a way to solve this issue, any help much appreciated.
Thanks & Regards
mdaniel.

linux kernel consoleblank argument ignored

I'm running an embedded Linux board with a console only (no graphical environment) based on the i.MX6 and a custom Yocto build.
I'm trying to stop the screen from shutting off after 15 minutes of inactivity. I think the correct way to do this is to pass consoleblank=0 to the boot arguments, which I have done. The problem is that when I do
cat /sys/module/kernel/parameters/consoleblank
I get 900. The results of cat /proc/cmdline are:
console=ttymxc0,115200 root=/dev/mmcblk0p2 rootwait rw consoleblank=0
Does anyone know where else that parameter could be set?
Thanks
Marlon

To avoid a console blank after a period of time, there's two things to change:
consoleblank=0 as kernel parameter as you mentioned
disable terminal blanking with: setterm -blank 0 -powerdown 0
The value that you see in proc, I suspect that in the boot process the setterm parameters are set, that will change consoleblank parameter, to be sure about this, you can make a simple test:
setterm -blank 600
cat /sys/module/kernel/parameters/consoleblank
# This must be 600
setterm -blank 0
cat /sys/module/kernel/parameters/consoleblank
# This must be 0
You can see additional info in this question: https://unix.stackexchange.com/questions/8056/disable-screen-blanking-on-text-console

I know this is a super old question, but I recently ran into a very similar problem and discovered that the 15 minute blanking timeout was being caused by Qt. If you're running any Qt programs, that's likely the source of the problem.
There is a function in Qt's source called setTTYCursor. If you look at the code in the linked file, it disables the blanking by setting the timeout to 0 when the Qt app loads, and then when it exits, it changes the blanking to a 15 minute timeout. After it does this, /sys/module/kernel/parameters/consoleblank reports a value of 900, regardless of what you initially supplied with the kernel command line. I spent a lot of time questioning my sanity before figuring out that Qt was changing this behind my back.
You can bypass this bizarre behavior by setting an environment variable prior to launching the Qt application:
export QT_QPA_PRESERVE_CONSOLE_STATE=1

Can syslog Performance Be Improved?

We have an application on Linux that used the syslog mechanism. After a week spent trying to figure out why this application was running slower than expected, we discovered that if we eliminated syslog, and just wrote directly to a log file, performance improved dramatically.
I understand why syslog is slower than direct file writes. But I was wondering: Are there ways to configure syslog to optimize its performance?

You can configure syslogd (and rsyslog at least) not to sync the log files after a log message by prepending a "-" to the log file path in the configuration file. This speeds up performance at the expense of the danger that log messages could be lost in a crash.

There are several options to improve syslog performance:
Optimizing out calls with a macro
int LogMask = LOG_UPTO(LOG_WARNING);
#define syslog(a, ...) if ((a) & LogMask ) syslog((a), __VA_ARGS__)
int main(int argc, char **argv)
{
LogMask = setlogmask(LOG_UPTO(LOG_WARNING));
...
}
An advantage of using a macro to filter syslog calls is that the entire call is
reduced to a conditional jump on a global variable, very helpful if you happen to
have DEBUG calls which are translating large datasets through other functions.
setlogmask()
setlogmask(LOG_UPTO(LOG_LEVEL))
setlogmask() will optimize the call by not logging to /dev/log, but the program will
still call the functions used as arguments.
filtering with syslog.conf
*.err /var/log/messages
"check out the man page for syslog.conf for details."
configure syslog to do asynchronous or buffered logging
metalog used to buffer log output and flushed it in blocks. stock syslog and syslog-ng
do not do this as far as I know.

Before embarking in new daemon writing you can check if syslog-ng is faster (or can be configured to be faster) than plain old syslog.

One trick you can use if you control the source to the logging application is to mask out the log level you want in the app itself, instead of in syslog.conf. I did this years ago with an app that generated a huge, huge, huge amount of debug logs. Rather than remove the calls from the production code, we just masked so that debug level calls never got sent to the daemon. I actually found the code, it's Perl but it's just a front to the setlogmask(3) call.
use Sys::Syslog;
# Start system logging
# setlogmask controls what levels we're going to let get through. If we mask
# them off here, then the syslog daemon doesn't need to be concerned by them
# 1 = emerg
# 2 = alert
# 4 = crit
# 8 = err
# 16 = warning
# 32 = notice
# 64 = info
# 128 = debug
Sys::Syslog::setlogsock('unix');
openlog($myname,'pid,cons,nowait','mail');
setlogmask(127); # allow everything but debug
#setlogmask(255); # everything
syslog('debug',"syslog opened");
Not sure why I used decimal instead of a bitmask... shrug

Write your own syslog implementation. :-P
This can be accomplished in two ways.
Write your own LD_PRELOAD hook to override the syslog functions, and make them output to stderr instead. I actually wrote a post about this many years ago: http://marc.info/?m=97175526803720 :-P
Write your own syslog daemon. It's just a simple matter of grabbing datagrams out of /dev/log! :-P
Okay, okay, so these are both facetious answers. Have you profiled syslogd to see where it's choking up most?

You may configure syslogd's level (or facility) to log asynchronously, by putting a minus before path to logfile (ie.: user.* [tab] -/var/log/user.log).
Cheers.

The syslog-async() implementation may help, at the risk of lost log lines / bounded delays at other times.
http://thekelleys.org.uk/syslog-async/
Note: 'asynchronous' here refers to queueing log events within your application, and not the asynchronous syslogd output file configuration option that other answers refer to.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string