Perf record hanging on armv7 - linux

I have a device with embedded Linux. The base image is built using ptxdist 2019.01 with the OSELAS toolchain build 2018.02 with gcc 7.3.1. Ptxdist has native option to enable perf support, so I enabled it and installed it on the device. It is using Linux 4.19.72.
However, when I run perf record -g (with process to trace) without explicitly specifying events, it seems to hang using a lot of CPU and not responding to SIGINT. I am not sure what the default event is; it does not seem to say anywhere. How can I find what it is hanging on and/or which events to specify that will actually work?
Update #1: Trying to strace perf recorg -g app… shows (
openat(AT_FDCWD, "/proc/sys/kernel/kptr_restrict", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(3, "0\n", 1024) = 2
geteuid32() = 0
getuid32() = 0
close(3) = 0
statfs64("/sys", 88, 0xbef2f388) = 0
stat64("/sys/bus/event_source/devices/cs_etm/format", 0xbef2f440) = -1 ENOENT (No such file or directory)
stat64("/sys/bus/event_source/devices/cs_etm/type", 0xbef2f440) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_CLOEXEC|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(3, /* 12 entries */, 32768) = 360
getdents64(3, /* 0 entries */, 32768) = 0
close(3) = 0
stat64("/sys/bus/event_source/devices/arm_spe_0/format", 0xbef2f440) = -1 ENOENT (No such file or directory)
stat64("/sys/bus/event_source/devices/arm_spe_0/type", 0xbef2f440) = -1 ENOENT (No such file or directory)
geteuid32() = 0
perf_event_open(
Unfortunately arguments of the perf_event_open don't ever get written out. Listing with ps shows it in R state.

Related

device driver fn is not being called when system call in userspace is called

i am re-writing a scull device driver.
I have written the open/read/write fns in driver code.
echo "hello" > /dev/myscull0 shows the data being written successfully to my device driver, and open() -> write() ->release() has been called in driver successfully.
Similarly, other operations using standard bash commands are successfull when operated on driver device file. Eg: cat /dev/myscull0 execute drivers read() call.
Now, i am writing a user space program to operate on my device file.
void
scull_open(){
char device_name[DEV_NAME_LENGTH];
int fd = 0;
memset(device_name, 0, DEV_NAME_LENGTH);
if((fgets(device_name, DEV_NAME_LENGTH -1, stdin) == NULL)){
printf("error in reading from stdin\n");
exit(EXIT_SUCCESS);
}
device_name[DEV_NAME_LENGTH -1] = '\0';
if ((fd = open(device_name, O_RDWR)) == -1) {
perror("open failed");
exit(EXIT_SUCCESS);
}
printf("%s() : Success\n", __FUNCTION__);
}
But i am seeing, drivers open() call is not being executed, confirmed from dmesg. I am running the program with sudo privileges, yet no succsess. I supply the input as /dev/myscull0
Infact, after executing the user program, i am seeing two entries in /dev dir
vm#vm:/dev$ ls -l | grep scull
crw-r--r-- 1 root root 247, 1 Feb 27 14:38 myscull0
---Sr-S--- 1 root root 0 Feb 27 14:38 myscull0
vm#vm:/dev$
The first entry was created by me using mknod command, however second entry is created with strange set of permissions after executing the user program.
Output :
/dev/myscull0
scull_open() : Success
Can anyone pls help what wrong i am doing here ?

BadIDChoice RENDER in python 3.3 and tk/tcl displayed on X

I have a fairly complicated GUI written through python's tkinter running on linux, and one of the components (which has a Text widget which updates frequently) causes the GUI to crash infrequently (once a day).
The guis are being displayed to X running on both Mac OSX through X11 and Gnome 2.28.2 with the same behavior. My python version is 3.3 and tk/tcl version is 8.5. The error I get is:
X Error of failed request: BadIDChoice (invalid resource ID chosen for this connection)
Major opcode of failed request: 148 (RENDER)
Minor opcode of failed request: 4 (RenderCreatePicture)
Resource id in failed request: 0x116517f
Serial number of failed request: 15106831
Current serial number in output stream: 15106872
a strace looks like:
11:03:29.632041 recvfrom(13, 0x3bae1d4, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:03:29.632059 recvfrom(13, 0x3bae1d4, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:03:29.632147 poll([{fd=13, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=13, revents=POLLOUT}])
11:03:29.632164 writev(13, [{"\224\4\5\0D\304\361\0\17\274\361\0i\4\0\0\0\0\0\0\224\27\n\0\3\f\340\0\301\v\340\0"..., 5032}, {NULL, 0}, {"", 0}], 3) = 5032
11:03:29.632193 poll([{fd=13, events=POLLIN}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])
11:03:29.637040 recvfrom(13, "\0\16\302\276x\304\361\0\4\0\224\0\1\0\0\0`\16\330\3\1\0\0\0\243\304\342\210\377\177\0\0"..., 4096, 0, NULL, NULL) = 136
11:03:29.637135 open("/usr/share/X11/XErrorDB", O_RDONLY) = 35
11:03:29.637217 fstat(35, {st_mode=S_IFREG|0644, st_size=41532, ...}) = 0
11:03:29.637360 read(35, "!\n! Copyright 1993, 1995, 1998 "..., 41532) = 41532
11:03:29.637387 close(35) = 0
11:03:29.637820 write(2, "X Error of failed request: BadI"..., 91) = 91
...
My GUI is single-threaded (and uses the after() call to monitor sockets for I/O).
Does anyone know what might be wrong? Is there any better debugging that I could be doing to figure out what the X Error part means?
Infrequent crashes (once a day) with the following logs...
X Error of failed request: BadIDChoice (invalid resource ID chosen for this connection)
Major opcode of failed request: 148 (RENDER)
Minor opcode of failed request: 4 (RenderCreatePicture)
...appear to be a telltale signature of a known issue within xcb as mentioned in the following thread:
Bug 458092 - Crashes with BadIdChoice X errors
The patch for it is available here.
Based on the git history, this xcb bug should be fixed in libX11-1.1.99.2 and above (~8years ago).
For further reference here is the email-thread with the complete discussion.

Could it be that users are added/removed in the background from /etc/passwd?

from pwd import getpwuid
getpwuid(48).pw_name
This Python program prints apache 99% of the time. 48 is the id that appears in /etc/passwd for the apache user. Without any apparent reason, Python sometimes prints the error:
KeyError: 'getpwuid(): uid not found: 48'
I need to understand why this happen sometimes. Can the apache user be removed from the file for some reason?
Here is the CPython 2.7 source code for the pwd module, particularly the getpwuid() call: https://github.com/python/cpython/blob/2.7/Modules/pwdmodule.c#L114 It looks like a wrapper around the system getpwuid call with not very much code - Python doesn't read from /etc/passwd directly.
Here's a current Ubuntu manpage (you didn't mention any particular OS) for (3) getpwuid: http://manpages.ubuntu.com/manpages/wily/man3/getpwuid.3posix.html which includes:
ERRORS
The getpwuid() and getpwuid_r() functions may fail if:
EIO An I/O error has occurred.
EINTR A signal was caught during getpwuid().
EMFILE {OPEN_MAX} file descriptors are currently open in the calling
process.
ENFILE The maximum allowable number of files is currently open in the
system.
Since you haven't mentioned any user management processes which might be regenerating your user accounts, I'm going to answer that no, apache doesn't get removed from /etc/passwd, but your webserver does hit some heavy IO or too many open files condition where reading /etc/passwd becomes impossible.
That’s a very interesting phenomenon (and a great question) but I doubt that Apache is being removed from your /etc/passwd file.
On a GNU/Linux system, there are a number of different authentication mechanisms that can be used. In modern systems, the Name Service Switch (NSS) is used to resolve user names and IDs. This is configured in the passwd line of /etc/nsswitch.conf, e.g., the following configuration means that the /etc/passwd will be searched first and if the user or ID is not found, then a configured NIS server is used to determine the user name/ID.
passwd: files nis
However, in some systems, the NSS library functions might not actually be used to resolve a name request. Some systems may have a service such as nscd running. This is a daemon that caches name service requests, e.g., if the Apache user had previously been looked up, its name would be stored in the nscd cache and it would return the correct name or ID without /etc/passwd being searched.
Debugging
I would try debugging this issue by running the Python program through strace. At the very end of the output file, you should see the system calls that are used to retrieve the name of the user.
strace -o getpwuid_test.trace getpwuid_test.py
You would need to run this command enough times to catch the call to getpwuid failing to see why it failed. I, for one, would be interested to see the results.
Examples
Here’s an example of the output where no caching daemon is running and NSS is used to read the /etc/passwd file:
open("/etc/nsswitch.conf", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=1717, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa9000
read(3, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1717
read(3, "", 4096) = 0
close(3) = 0
...
open("/etc/passwd", O_RDONLY) = 3
fcntl64(3, F_GETFD) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=3012, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa9000
read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 3012
close(3) = 0
...
write(1, "apache\n", 7)
Here’s an example where the nscd service is running and the NSS library is bypassed:
socket(PF_FILE, SOCK_STREAM, 0) = 3
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = 0
send(3, "\2\0\0\0\v\0\0\0\7\0\0\0passwd\0", 19, MSG_NOSIGNAL) = 19
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=3, revents=POLLIN|POLLHUP}])
recvmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"passwd\0", 7}, {"\270O\3\0\0\0\0\0", 8}], msg_controllen=16, {cmsg_len=16, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {4}}, msg_flags=0}, 0) = 15
...
write(1, "apache\n", 7)

why linux search path for .so when it already searched the .so succeed in another path?

it seems odd to me.
i put the path "/home/nemo/foxyserver_strace_opt/lib" in LD_LIBRARY_PATH as first path to search.
and then i use strace to trace my program. it shows below:
10695 16:12:58.227132 open("/home/nemo/foxyserver_strace_opt/lib/tls/x86_64/libselinux.so.1", O_RDONLY) = 3 <0.000018>
10695 16:12:58.227486 open("/home/nemo/foxyserver_strace_opt/lib/tls/libselinux.so.1", O_RDONLY) = 3 <0.000023>
10695 16:12:58.227937 open("/home/nemo/foxyserver_strace_opt/lib/x86_64/libselinux.so.1", O_RDONLY) = 3 <0.000018>
10695 16:12:58.228360 open("/home/nemo/foxyserver_strace_opt/lib/libselinux.so.1", O_RDONLY) = 3 <0.000016>
see? the libselinux.so.1 is search successful.it is right on the path "/home/nemo/foxyserver_strace_opt/lib".
and then i continue to see below:
11144 16:13:01.536915 open("/home/nemo/foxyserver/server/libselinux.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
11144 16:13:01.537058 open("/home/nemo/foxyserver/extra/lib/tls/x86_64/libselinux.so.1", O_RDONLY) = 3 <0.000021>
11144 16:13:01.537391 open("/home/nemo/foxyserver/extra/lib/tls/libselinux.so.1", O_RDONLY) = 3 <0.000019>
11144 16:13:01.537679 open("/home/nemo/foxyserver/extra/lib/x86_64/libselinux.so.1", O_RDONLY) = 3 <0.000019>
11144 16:13:01.537990 open("/home/nemo/foxyserver/extra/lib/libselinux.so.1", O_RDONLY) = 3 <0.000017>
11144 16:13:01.538469 open("/lib64/libselinux.so.1", O_RDONLY) = 3 <0.000014>
see? linux search a new path "/home/nemo/foxyserver/extra/lib/x86_64/" for libselinux.so.1 again even though it can search the path "/home/nemo/foxyserver_strace_opt/lib".
why?

Shared Library can not be opened even though it exists: file not found

I am trying to test an ODBC driver which was delivered to me as a shared library. I've placed the shared library into /usr/local/lib and added an entry for it in /etc/odbcinst.ini as well as a connection entry (DSN) into /etc/odbc.ini
When trying to use the isql command to test this connection I get the following
isql -v test
[01000][unixODBC][Driver Manager]Can't open lib '/usr/local/lib/splcODBC.so' : file not found
Interestingly enough, when I do an strace -o t isql -v test I see the following info in the trace file:
open("/usr/local/lib/splcODBC.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \262R\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=15473838, ...}) = 0
mmap(NULL, 13298832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe2065a9000
mprotect(0x7fe206fcd000, 2093056, PROT_NONE) = 0
mmap(0x7fe2071cc000, 442368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa23000) = 0x7fe2071cc000
mmap(0x7fe207238000, 130192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe207238000
close(3)
which seems to suggest that the file is found and successfully opened.
File information is as follows:
ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=0xa37555c3810ff5e5c78083139a3ca0232793a2e3, not stripped
Does anybody have any ideas what's going on here?

Resources