running slattach on pseudo tty - linux

I try to open a network connection through a pair of pseudo tty's on linux os.
# slattach -v /dev/ptmx
cslip started on /dev/ptmx interface sl0
OK, this was the "creating side" for the pseudo tty.
I can look in /dev/pts and find the new pty there.
If I now try to use slattach also on this side I got:
slattach -v /dev/pts/3
slattach: tty_open(/dev/pts/3, RW): Input/output error
I traced with strace:
28 5505 write(1, "slattach: tty_open: trying to op"..., 46) = 46
29 5505 open("/dev/pts/3", O_RDWR|O_NONBLOCK) = -1 EIO (Input/output error)
30 5505 write(2, "slattach: tty_open(/dev/pts/3, R"..., 55) = 55
31 5505 exit_group(3)
All this happens on different distros of ubuntu, tested on 10.04 and 11.04, both are failing.
What I'm doing wrong?

You may want to take a look at the man page pty(7).
Basically, /dev/ptmx uses the Unix 98 pseudo-terminal interface and requires that your program uses grantpt(3) and unlockpt(3). Here, slattach (the one that opens /dev/ptmx, not the other one) doesn't do so, and any program that tries to open the slave pseudo-terminal associated to the master will fail, as you experienced.
You can force slattach to do grantpt() and unlockpt() by overloading the open() call with an external
routine, see this example

Related

Identify other end of a unix domain socket connection

I'm trying to figure out what process is holding the other end of a unix domain socket. In some strace output I've identified a given file descriptor which is involved in the problem I'm currently debugging, and I'd like to know which process is on the other end of that. As there are multiple connections to that socket, simply going by path name won't work.
lsof provides me with the following information:
dbus-daem 4175 mvg 10u unix 0xffff8803e256d9c0 0t0 12828 #/tmp/dbus-LyGToFzlcG
So I know some address (“kernel address”?), I know some socket number, and I know the path. I can find that same information in other places:
$ netstat -n | grep 12828
unix 3 [ ] STREAM CONNECTED 12828 #/tmp/dbus-LyGToFzlcG
$ grep -E '12828|ffff8803e256d9c0' /proc/net/unix
ffff8803e256d9c0: 00000003 00000000 00000000 0001 03 12828 #/tmp/dbus-LyGToFzlcG
$ ls -l /proc/*/fd/* 2>/dev/null | grep 12828
lrwx------ 1 mvg users 64 10. Aug 09:08 /proc/4175/fd/10 -> socket:[12828]
However, none of this tells me what the other end of my socket connection is. How can I tell which process is holding the other end?
Similar questions have been asked on Server Fault and Unix & Linux. The accepted answer is that this information is not reliably available to the user space on Linux.
A common suggestion is to look at adjacent socket numbers, but ls -l /proc/*/fd/* 2>/dev/null | grep 1282[79] gave no results here. Perhaps adjacent lines in the output from netstat can be used. It seems like there was a pattern of connections with and without an associated socket name. But I'd like some kind of certainty, not just guesswork.
One answer suggests a tool which appears to be able to address this by digging through kernel structures. Using that option requires debug information for the kernel, as generated by the CONFIG_DEBUG_INFO option and provided as a separate package by some distributions. Based on that answer, using the address provided by lsof, the following solution worked for me:
# gdb /usr/src/linux/vmlinux /proc/kcore
(gdb) p ((struct unix_sock*)0xffff8803e256d9c0)->peer
This will print the address of the other end of the connection. Grepping lsof -U for that number will provide details like the process id and the file descriptor number.
If debug information is not available, it might be possible to access the required information by knowing the offset of the peer member into the unix_sock structure. In my case, on Linux 3.5.0 for x86_64, the following code can be used to compute the same address without relying on debugging symbols:
(gdb) p ((void**)0xffff8803e256d9c0)[0x52]
I won't make any guarantees about how portable that solution is.
Update: It's been possible to to do this using actual interfaces for a while now. Starting with Linux 3.3, the UNIX_DIAG feature provides a netlink-based API for this information, and lsof 4.89 and later support it. See https://unix.stackexchange.com/a/190606/1820 for more information.

How to monitor number of syscalls executed by kernel?

I need to monitor amount of system calls executed by Linux.
I'm aware that vmstat has ability to show this for BSD and AIX systems, but for Linux it can't (according to man page).
Is there any counter in /proc? Or is there any other way to monitor it?
I wrote a simple SystemTap script(based on syscalls_by_pid.stp).
It produces output like this:
ProcessName #SysCalls
munin-graph 38609
munin-cron 8160
fping 4502
check_http_demo 2584
check_nrpe 2045
sh 1836
nagios 886
sendmail 747
smokeping 649
check_http 571
check_nt 376
pcscd 216
ping 108
check_ping 100
crond 87
stapio 69
init 56
syslog-ng 27
sshd 17
ntpd 9
hp-asrd 8
hald-addon-stor 7
automount 6
httpd 4
stap 3
flow-capture 2
gam_server 2
Total 61686
The script itself:
#! /usr/bin/env stap
#
# Print the system call count by process name in descending order.
#
global syscalls
probe begin {
print ("Collecting data... Type Ctrl-C to exit and display results\n")
}
probe syscall.* {
syscalls[execname()]++
}
probe end {
printf ("%-20s %-s\n\n", "ProcessName", "#SysCalls")
summary = 0
foreach (procname in syscalls-) {
printf("%-20s %-10d\n", procname, syscalls[procname])
summary = summary + syscalls[procname]
}
printf ("\n%-20s %-d\n", "Total", summary)
}
You can use pstrace as said Jeff Foster to trace the system call.
Also, you can use strace and ltrace
strace - trace system calls and signals
ltrace - A library call tracer
You can use ptrace to monitor all syscalls (see here)
I believe OProfile can do this.
I am not aware of a centralized way to monitor syscalls throughout the entire OS. Maybe do a ptrace on the init process and follow all children? But I don't know if that will work.
Your best bet is to write a patch to the kernel itself to do this. The closest thing to this that I've seen is a cgroup implementation for enforcing permissions on what syscalls can be executed at runtime. You can find the patch here:
https://github.com/luksow/syscalls-cgroup
It shouldn't be too much more work to throw a counter in there, from a kernel programming perspective.

Less gets keyboard input from stderr?

I'm taking a look at the code to the 'less' utility, specifically how it gets keyboard input. Interestingly, on line 80 of ttyin.c, it sets the file descriptor to read from:
/*
* Try /dev/tty.
* If that doesn't work, use file descriptor 2,
* which in Unix is usually attached to the screen,
* but also usually lets you read from the keyboard.
*/
#if OS2
/* The __open() system call translates "/dev/tty" to "con". */
tty = __open("/dev/tty", OPEN_READ);
#else
tty = open("/dev/tty", OPEN_READ);
#endif
if (tty < 0)
tty = 2;
Isn't file descriptor 2 stderr? If so, WTH?! I thought keyboard input was sent through stdin.
Interestingly, even if you do ls -l * | less, after the file finishes loading, you can still use the keyboard to scroll up and down, but if you do ls -l * | vi, then vi will yell at you because it doesn't read from stdin. What's the big idea? How did I end up in this strange new land where stderr is both a way to report errors to the screen and read from the keyboard? I don't think I'm in Kansas anymore...
$ ls -l /dev/fd/
lrwx------ 1 me me 64 2009-09-17 16:52 0 -> /dev/pts/4
lrwx------ 1 me me 64 2009-09-17 16:52 1 -> /dev/pts/4
lrwx------ 1 me me 64 2009-09-17 16:52 2 -> /dev/pts/4
When logged in at an interative terminal, all three standard file descriptors point to the same thing: your TTY (or pseudo-TTY).
$ ls -fl /dev/std{in,out,err}
lrwxrwxrwx 1 root root 4 2009-09-13 01:57 /dev/stdin -> fd/0
lrwxrwxrwx 1 root root 4 2009-09-13 01:57 /dev/stdout -> fd/1
lrwxrwxrwx 1 root root 4 2009-09-13 01:57 /dev/stderr -> fd/2
By convention, we read from 0 and write to 1 and 2. However, nothing prevents us from doing otherwise.
When your shell runs ls -l * | less, it creates a pipe from ls's file descriptor 1 to less's file descriptor 0. Obviously, less can no longer read the user's keyboard input from file descriptor 0 – it tries to get the TTY back however it can.
If less has not been detached from the terminal, open("/dev/tty") will give it the TTY.
However, in case that fails... what can you do? less makes one last attempt at getting the TTY, assuming that file descriptor 2 is attached to the same thing that file descriptor 0 would be attached to, if it weren't redirected.
This is not failproof:
$ ls -l * | setsid less 2>/dev/null
Here, less is given its own session (so it is no longer a part of the terminal's active process group, causing open("/dev/tty") to fail), and its file descriptor 2 has been changed – now less exits immediately, because it is outputting to a TTY yet it fails to get any user input.
Well... first off, you seem to missing the open() call which opens '/dev/tty'. It only uses file descriptor 2 if the call to open() fails. On a standard Linux system, and probably many Unices, '/dev/tty' exists and is unlikely to cause a fail.
Secondly, the comment at the top provides a limited amount of explanation as to why they fall back to file descriptor 2. My guess is that stdin, stdout, and stderr are pretty much connected to '/dev/tty/' anyway, unless redirected. And since the most common redirections for for stdin and/ or stdout (via piping or < / >), but less often for stderr, odds on are that using stderr would be most likely to still be connect to the "keyboard".
The same question with an answer ultimately from the person who asked it is on linuxquestions although they quote slightly different source from less. And no, I don't understand most of it so I can't help beyond that :)
It appears to be Linux specific functionality that sends keyboard input to FD 2.

Who is refreshing hardware watchdog in Linux?

I have a processor AT91SAM9G20 running a 2.6 kernel. Watchdog is enabled at bootstrap level and configured for 16 seconds. Watchdog mode register can be configured only once.
When code hangs either in bootstrap, bootloader or kernel, the board reboots. But once kernel comes up even though watchdog is not refreshed in any of the applications, the board is not being reset after 16 seconds, but 15 minutes.
Who is refreshing the watchdog?
In our case, the watchdog should be influenced by applications, so that the board can reset if our application hangs.
These are the running processes:
1 root init
2 root [kthreadd]
3 root [ksoftirqd/0]
4 root [watchdog/0]
5 root [events/0]
6 root [khelper]
63 root [kblockd/0]
72 root [ksuspend_usbd]
78 root [khubd]
85 root [kmmcd]
107 root [pdflush]
108 root [pdflush]
109 root [kswapd0]
110 root [aio/0]
740 root [mtdblockd]
828 root [rpciod/0]
982 root [jffs2_gcd_mtd10]
1003 root /sbin/udevd -d
1145 daemon portmap
1158 dbus dbus-daemon --system
1178 root /usr/sbin/ifplugd -i eth0 -fwI -u0 -d5 -l -q
1190 root /usr/sbin/ifplugd -i eth1 -fwI -u0 -d5 -l -q
1221 default avahi-daemon: running [SP14.local]
1226 root /usr/sbin/dropbear
1246 root /root/bin/host_app
1254 root /root/bin/mini_httpd -c *.cgi -d /root/bin -u root -E /root/bin/
1256 root -sh
1257 root /sbin/syslogd -n -m 0
1258 root /sbin/klogd -n
1259 root /usr/bin/tail -f /var/log/messages
1265 root ps -e
We are using the watchdog for soft lockups available in kernel-2.6.25-ts.at91sam9g20/kernel/softlockup.c
If you enabled the watchdog driver in your kernel, the watchdog driver sets up a kernel timer, in charge of resetting the watchdog. The corresponding code is linux/drivers/watchdog/at91sam9_wdt.c. So it works like this:
If no application opens the /dev/watchdog file, then the kernel takes care of resetting the watchdog. Since it is a timer, it won't appear as a dedicated kernel thread, but handled by the soft IRQ thread. Now, if an application opens this file, it becomes responsible of the watchdog, and can reset it by writing to the file, as documented by the documentation linked in Richard's post.
Is the watchdog driver configured in your kernel?
If not, you should configure it, and see if the reset still happens. If it still happens, it is likely that your reset comes from somewhere else.
If your kernel is too old to have a proper watchdog driver (not present in 2.6.25) you should backport it from 2.6.28. Or you can try to disable the watchdog in your bootloader and see if the reset still occurs.
In July 2016 commit 3fbfe92647 (watchdog: change watchdog_need_worker logic) in the 4.7 kernel to watchdog_dev.c enabled the same behavior as shodanex's answer for all watchdog timer drivers. This doesn't seem to be documented anywhere other than this thread and the source code.
/*
* A worker to generate heartbeat requests is needed if all of the
* following conditions are true.
* - Userspace activated the watchdog.
* - The driver provided a value for the maximum hardware timeout, and
* thus is aware that the framework supports generating heartbeat
* requests.
* - Userspace requests a longer timeout than the hardware can handle.
*
* Alternatively, if userspace has not opened the watchdog
* device, we take care of feeding the watchdog if it is
* running.
*/
return (hm && watchdog_active(wdd) && t > hm) ||
(t && !watchdog_active(wdd) && watchdog_hw_running(wdd));
This may give you a hint: http://www.mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt
It makes perfect sense to have a user space daemon handling the watchdog. It probably defaults to a 15 minute timeout.
we had a similar problem regarding WDT on AT91SAM9263. Problem was with bit 29 WDIDLEHLT of WDT_MR (Address: 0xFFFFFD44) register. This bit was set to 1 but it should be 0 for our application needs.
Bit explanation from datasheet documentation:
• WDIDLEHLT: Watchdog Idle Halt
0: The Watchdog runs when the system is in idle mode.
1: The Watchdog stops when the system is in idle state.
This means that WDT counter does not increment when kernel is in idle state, hence the 15 or more delay until reset happens.
You can try "dd if=/dev/zero of=/dev/null" which will prevent kernel from entering idle state and you should get a reset in 16 seconds (or whatever period you have set in WDT_MR register).
So, the solution is to update u-boot code or other piece of code that sets WDT_MR register. Remember this register is write once...
Wouldn't the kernel be refreshing the watchdog timer? The watchdog is designed to reset the board if the whole system hangs, not just a single application.

core dump files on Linux: how to get info on opened files?

I have a core dump file from a process that has probably a file descriptor leak (it opens files and sockets but apparently sometimes forgets to close some of them). Is there a way to find out which files and sockets the process had opened before crashing? I can't easily reproduce the crash, so analyzing the core file seems to be the only way to get a hint on the bug.
If you have a core file and you have compiled the program with debugging options (-g), you can see where the core was dumped:
$ gcc -g -o something something.c
$ ./something
Segmentation fault (core dumped)
$ gdb something core
You can use this to do some post-morten debugging. A few gdb commands: bt prints the stack, fr jumps to given stack frame (see the output of bt).
Now if you want to see which files are opened at a segmentation fault, just handle the SIGSEGV signal, and in the handler, just dump the contents of the /proc/PID/fd directory (i.e. with system('ls -l /proc/PID/fs') or execv).
With these information at hand you can easily find what caused the crash, which files are opened and if the crash and the file descriptor leak are connected.
Your best bet is to install a signal handler for whatever signal is crashing your program (SIGSEGV, etc.).
Then, in the signal handler, inspect /proc/self/fd, and save the contents to a file. Here is a sample of what you might see:
Anderson cxc # ls -l /proc/8247/fd
total 0
lrwx------ 1 root root 64 Sep 12 06:05 0 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 12 06:05 1 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 12 06:05 10 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Sep 12 06:05 11 -> socket:[124061]
lrwx------ 1 root root 64 Sep 12 06:05 12 -> socket:[124063]
lrwx------ 1 root root 64 Sep 12 06:05 13 -> socket:[124064]
lrwx------ 1 root root 64 Sep 12 06:05 14 -> /dev/driver0
lr-x------ 1 root root 64 Sep 12 06:05 16 -> /temp/app/whatever.tar.gz
lr-x------ 1 root root 64 Sep 12 06:05 17 -> /dev/urandom
Then you can return from your signal handler, and you should get a core dump as usual.
One of the ways I jump to this information is just running strings on the core file. For instance, when I was running file on a core recently, due to the length of the folders I would get a truncated arguments list. I knew my run would have opened files from my home directory, so I just ran:
strings core.14930|grep jodie
But this is a case where I had a needle and a haystack.
If the program forgot to close those resources it might be because something like the following happened:
fd = open("/tmp/foo",O_CREAT);
//do stuff
fd = open("/tmp/bar",O_CREAT); //Oops, forgot to close(fd)
now I won't have the file descriptor for foo in memory.
If this didn't happen, you might be able to find the file descriptor number, but then again, that is not very useful because they are continuously changing, by the time you get to debug you won't know which file it actually meant at the time.
I really think you should debug this live, with strace, lsof and friends.
If there is a way to do it from the core dump, I'm eager to know it too :-)
You can try using strace to see the open, socket and close calls the program makes.
Edit: I don't think you can get the information from the core; at most it will have the file descriptors somewhere, but this still doesn't give you the actual file/socket. (Assuming you can distinguish open from closed file descriptors, which I also doubt.)
Recently during my error troubleshooting and analysis , my customer provided me a coredump which got generated in his filesystem and he went out of station in order to quickly scan through the file and read its contents i used the command
strings core.67545 > coredump.txt
and later i was able to open the file in file editor.
A core dump is a copy of the memory the process had access to when crashed. Depending on how the leak is occurring, it might have lost the reference to the handles, so it may prove to be useless.
lsof lists all currently open files in the system, you could check its output to find leaked sockets or files. Yes, you'd need to have the process running. You could run it with a specific username to easily discern which are the open files from the process you are debugging.
I hope somebody else has better information :-)
Another way to find out what files a process has opened - again, only during runtime - is looking into /proc/PID/fd/ , which contains symlinks to open files.

Resources