How can I examine a process' image? - linux

First I find the process' pid with ps:
% ps -a | grep 'a.out'
output:
36296 pts/0 00:00:07 a.out
Then I get an image of this process with gcore:
% sudo gcore 36296
output:
0x0000558eab27d131 in main ()
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile core.36296
[Inferior 1 (process 36296) detached]
Then, hex dump object:
% hd core.36296 | grep 'HOME'
output:
001f4a90 3d 32 00 48 4f 4d 45 3d 2f 68 6f 6d 65 2f 63 61 |=2.HOME=/home/ca|
Now, I'm trying to find the section where environment variables is loaded. How can I do this ?

You should use a debugger!
For linux, gcc and gdb you can do:
> gdb <executable> <core-file>
Within gdb you now can examine the environment from the core file:
(gdb) print ((char**)__environ)[0]
$1 = 0x7ffc6aba0a58 "SHELL=/bin/bash"
(gdb) print ((char**)__environ)[1]
$2 = 0x7ffc6aba0a68 "SESSION_MANAGER=local/unix:#/tmp/.ICE-unix/1873,unix/unix:/tmp/.ICE-unix/1873"
unless you get a string with length 0.
If you do not have an executable with debug infos, you also can try to find the text with:
strings –a <core-file>
But before you write a core file and try to search in it, you simply can get the environment from a process by using ps if your program is still running:
ps eww <pid>

Related

Is there a bug fix for sporadic SIGSEGV crashes of the BlueZ bluetoothd (version 5.50) in Debian 10?

I am developing software for a commercial product that runs on a Moxa MPC-2070 panel computer (Intel Atom based) under Debian 10 (Buster) with BlueZ (5.50) bluetooth support. The application has been developed using Qt Creator. I have been struggling to find a robust and reliable method to scan for Bluetooth Low Energy devices.
Because of an extreme performance problem associated with the QBluetoothDiscoveryAgent::start() method in Qt (which I won't go into here), I am using the bluetoothctl command to perform BLE device scanning. A wrapper around bluetoothctl provides it with input commands and parses the output from bluetoothctl. Sporadically (once every 1 - 150 times) that I launch bluetoothctl to perform the BLE scan, the bluetooth daemon process (bluetoothd) crashes with a SIGSEGV.
Here is the tail of syslog after the bluetoothd crash:
[315398.536280] show_signal_msg: 8 callbacks suppressed
[315398.536293] bluetoothd[523]: segfault at a8ec8148fd ip 00007f681ba3e143 sp 00007ffc8110a858 error 4 in libdbus-1.so.3.19.11[7f681ba2f000+2e000]
[315398.536343] Code: 85 ed 74 13 0a 18 88 18 48 83 c4 08 5b 5d c3 0f 1f 84 00 00 00 00 00 f7 d3 22 18 88 18 48 83 c4 08 5b 5d c3 0f 1f 00 48 8b 07 <0f> b6 40 02 85 f0 0f
95 c0 0f b6 c0 c3 55 48 89 fd 53 89 f3 48 83
I have restarted bluetoothd with the -d flag to enable debug output via:
$ sudo bluetoothd -d &
And again ran the bluetoothctl scans in a loop until bluetoothd again crashed. The full syslog showing the bluetoothd crash can be found here: Complete syslog with bluetoothd SIGSEGV
In the above syslog, the initial bluetoothd (without -d) crash can be found at Jan 14 09:58:55.
The restart of bluetoothd with the -d flag is at Jan 14 10:03:16.
The looping use of bluetoothctl begins at Jan 14 10:06:03.
bluetoothd again SIGSEGVs at Jan 14 10:05:13.
Sometimes the bluetoothd crashes happen after only 1 or 2 bluetoothctl commands, and other times it takes many iterations before the crash occurs.
This shell script will reproduce the bluetoothd crash. It loops performing essentially the same function as my C bluetoothctl wrapper program, but without the bluetoothctl output processing. Note that this script must be run as root or by a user id which is a member of the 'bluetooth' group.
#! /bin/bash
COUNT=0
RESULT=0
while [ "${RESULT}" != "9" ]
do
COUNT=`expr ${COUNT} + 1`
echo "Loop #${COUNT}"
# uveTagScanner -s FEA0 ${#} # The compiled bluetoothctl wrapper program with output processing
# RESULT="$?"
( echo "menu scan" # Enter the bluetoothctl scan sub-menu
echo "clear" # Clear all filter parameters
echo "transport le" # Filter scanning for low-energy devices only
echo "duplicate-data off" # Disable reporting of duplicate-data
echo "back" # Exit the bluetoothctl scan sub-menu & return to main menu
echo "scan on" # Start scanning for LE devices
sleep 10 # Let scanning proceed for 10 seconds
echo "scan off" # Stop scanning for LE devices
echo "quit" # Quit the bluetoothctl command
) | bluetoothctl
done
Within my C wrapper program (uveTagScanner) which fork()/exec()s bluetoothctl and performs the output processing, I am able to detect if bluetoothd has crashed and then restart it. But this is only a band-aid solution, as it still leaves me with instances where the scanning for BLE devices does not provide the needed information.
I'm running out of ideas on how to reliably perform BLE device scanning! I could try using the BlueZ libraries and Dbus interface APIs instead of bluetoothctl, but I fear that the same bluetoothd crash would occur.

GNU debugger __text_start () at wrong filepath

First a little preface: I'm using the Windows Subsystem for Linux and the Gaisler BCC version of GCC for cross-compiling (aka "machine-gcc" where the machine is sparc-gaisler-elf in this case).
I compile a simple hello world program for debugging like this
$ sparc-gaisler-elf-gcc -g hello.c -o hello
Then I open the simulator for the particular processor with the GNU debugger (GDB) as a server
$ tsim-leon3 -gdb
...
gdb interface: using port 1234
Starting GDB server. Use Ctrl-C to stop waiting for connection.
In another bash session I start a remote GDB and connect to the server
$ sparc-gaisler-elf-gdb -ex "target extended-remote localhost:1234"
...
Remote debugging using localhost:1234
This works fine. But if I try to load the hello executable I get a problem
$ sparc-gaisler-elf-gdb -ex "target extended-remote localhost:1234" hello
...
Remote debugging using localhost:1234
__text_start () at /opt/bcc-2.0.4-gcc/src/libbcc/shared/trap/trap_table_mvt.S:167
167 /opt/bcc-2.0.4-gcc/src/libbcc/shared/trap/trap_table_mvt.S: No such file or directory.
in /opt/bcc-2.0.4-gcc/src/libbcc/shared/trap/trap_table_mvt.S
Current language: auto; currently asm
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /mnt/c/Users/<username>/bcc-2.0.4-gcc/src/examples/hello/hello
Program received signal SIGSEGV, Segmentation fault.
__text_start () at /opt/bcc-2.0.4-gcc/src/libbcc/shared/trap/trap_table_mvt.S:167
167 in /opt/bcc-2.0.4-gcc/src/libbcc/shared/trap/trap_table_mvt.S
Now, with my Windows Subsystem for Linux setup I have the particular file it's looking for at
/mnt/c/Users/<username>/bcc-2.0.4-gcc/src/libbcc/shared/trap/trap_table_mvt.S
instead of in /opt/bcc-2.0.4-gcc/...
How can I tell it where to find this file?
Update
I tried setting dir as per Employed Russian's answer
(gdb) dir /mnt/c/Users/<user>/bcc-2.0.4-gcc/src/libbcc/shared/trap
Source directories searched: /mnt/c/Users/<user>/bcc-2.0.4-gcc/src/libbcc/shared/trap:$cdir:$cwd
(gdb) list
162 BAD_TRAP; BAD_TRAP; BAD_TRAP; BAD_TRAP; ! 78 - 7B undefined
163 BAD_TRAP; BAD_TRAP; BAD_TRAP; BAD_TRAP; ! 7C - 7F undefined
164
165 /* trap_instruction 0x80 - 0xFF */
166 /* NOTE: "ta 5" can be generated by compiler. */
167 SOFT_TRAP; ! 0 System calls
168 SOFT_TRAP; ! 1 Breakpoints
169 SOFT_TRAP; ! 2 Division by zero
170 FLW_TRAP; ! 3 Flush windows
171 SOFT_TRAP; ! 4 Clean windows
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /mnt/c/Users/<user>/bcc-2.0.4-gcc/src/examples/hello/hello
Program received signal SIGSEGV, Segmentation fault.
__text_start () at /opt/bcc-2.0.4-gcc/src/libbcc/shared/trap/trap_table_mvt.S:167
167 SOFT_TRAP; ! 0 System calls
Even though it's still saying /opt/... it seems to have found the right file now. However, I don't know why it's giving a segmentation fault.
How can I tell it where to find this file?
With the directory command.
(gdb) dir /mnt/c/Users/<username>/bcc-2.0.4-gcc/src/libbcc/shared/trap
(gdb) list # should find the file in the new location
See also source path and set substitite-path.

Transferring a TLS/SSL certificate via serial

I need to send a PEM-formatted certificate for storaging on a module that can be communicated with through the AT command set via a serial interface on one of Linux device nodes in /dev.
So far I've been using mostly
echo 'AT' > /dev/ttyX
to issue the necessary commands and it has done the trick just fine.
Any output from the device is monitored with cat /dev/ttyX on another terminal window.
I now have a certificate file encoded with ANSI. The documentation tells me to input it to the module using only LF line breaks and to terminate the input with Ctrl+Z, which I believe is hex 0x1A. The document also specifies that the certificate file may not end with an EOF character. I have used a hex editor to verify that the file is formatted as it should be.
I've tried to use both echo and printf to send the certificate chars / string to the module.
I have tried to include the 0x1A character in both the file and send it separately after the certificate chars like so:
printf '\x1a' > /dev/ttyX
or alternatively
echo -n -e '\x1a' > /dev/ttyX
The module seems to acknowledge the 0x1A as it stops the >-prompt for certificate and gives me the most verbose reply ever: ERROR
Generally, I'm sending the certificate file contents as follows:
echo -e "$(cat certfile)" > /dev/ttyX
or
printf '%b' "$(cat certfile)" > /dev/ttyX
Please assume that I have access to basic Linux shell tools (such as echo, printf, nano, stty and so on) with no option to trivially install new ones. I use SSH to access the target device and pscp to transfer the file to the target device. I also have a Windows rig on the side.
Any suggestions what else I should take into consideration? Maybe an stty option that I've missed? Does cat do something nasty in the input phase? A revealing trick to investigate the actual character data about to be send to the module? Some weird kink with serial comms I've missed?
If I
printf '%b' "$(cat cert)" > ./testoutput
and
od -x testoutput
the file looks alright in hex (I reordered the output from od -x manually, it seems to make pairs of the hex digits and switch them around). For example the end is:
2d 2d 2d 2d 2d 45 4e 44 20 43 45 52 54 49 46 49 43 41 54 45 2d 2d 2d 2d 2d 0a 1a 00
There must be something in stty or the receiving end that's causing trouble. Right?
For example the end is:
2d 2d 2d 2d 2d 45 4e 44 20 43 45 52 54 49 46 49 43 41 54 45 2d 2d 2d 2d 2d 0a 00 1a
Wait a sec. What's that 00 doing there, right before the 1a?
That doesn't belong. Try removing it.

Differences between objdump and xxd

I am trying to find a call function in a binary file, so I tried this:
Compile my code (in C),
Use the command: mips-mti-linux-gnu-objdump -d myapp.elf> objdump.txt
My function in objdump.txt file: 9d003350: 42000828 myfunction 0x1
Now, I want to identify this function in myapp.bin when reading this from memory. But, I get this: 28080042.
I tried to use the command: xxd -ps myapp.bin> xxd.txt
Just can find: 28080042.
Is it possible to do that?
That's an endianness conflict. objdump and xxd are giving you the same bytes, they're just using different endianness.
Actual bytes in order:
28 08 00 42
Big endian value:
28 08 00 42
Little endian value:
42 00 08 28
xxd -p will print out the individual bytes in the file in the order in which they exist.
objdump is disassembling it, it knows that the bytes belong in groups of 4, and it's interpreting them as little-endian.
xxd can print in little-endian order, using the -e flag (with a default grouping of 4 bytes, use the -g flag to change the number of bytes per group). However, this is incompatible with the -p flag, because the -p flag ignores any grouping.
objdump can be made to print in big-endian order, using the -EB flag, however, this will affect what instructions it reports.

How to check which symbols on my shared library have non-position independent code (PIC)?

I'm trying to build a .deb package with debuild -i -us -uc -b and in the end I see:
Now running lintian...
warning: the authors of lintian do not recommend running it with root privileges!
W: libluajit-5.1-2: hardening-no-relro usr/lib/powerpc64le-linux-gnu/libluajit-5.1.so.2.1.0
E: libluajit-5.1-2: shlib-with-non-pic-code usr/lib/powerpc64le-linux-gnu/libluajit-5.1.so.2.1.0
W: luajit: hardening-no-relro usr/bin/luajit-2.1.0-alpha
W: luajit: binary-without-manpage usr/bin/luajit-2.1.0-alpha
Finished running lintian.
I have a hunch that I failed to define a "PIC code setup", which must be at the beginning of each external function:
The following code might appear in a PIC code setup sequence to compute
the distance from a function entry point to the TOC base:
addis 2,12,.TOC.-func#ha
addi 2,2,.TOC.-func#l
as specified by the ABI, page 99.
However I couldn't find the symbols which were non-PIC. Or maybe some relevant file that was not compiled with -fPIC?
Info:
system architecture: ppc64le
compiling .so library with: gcc -shared -fPIC
To find which symbols made your elf non-PIC/PIE (Position Independent Code/Executable), use scanelf from pax-utils package (on ubuntu, install it with sudo apt-get install pax-utils):
$ scanelf -qT /usr/local/lib/libluajit-5.1.so.2.1.0 | head -n 3
libluajit-5.1.so.2.1.0: buf_grow [0x7694] in (optimized out: previous lj_BC_MODVN) [0x7600]
libluajit-5.1.so.2.1.0: buf_grow [0x769C] in (optimized out: previous lj_BC_MODVN) [0x7600]
libluajit-5.1.so.2.1.0: buf_grow [0x76A0] in (optimized out: previous lj_BC_MODVN) [0x7600]
$ objdump -Sa /usr/local/lib/libluajit-5.1.so.2.1.0 | grep -A5 \ 7694:
7694: 00 00 80 39 li r12,0
7698: c6 07 8c 79 rldicr r12,r12,32,31
769c: 00 00 8c 65 oris r12,r12,0
76a0: 00 00 8c 61 ori r12,r12,0
76a4: a6 03 89 7d mtctr r12
76a8: 21 04 80 4e bctrl
On my case an absolute address was meant to be load on r12, but that's not possible for a dynamic library, so the linker used 0 for that parameter (I had to use #GOT operator, but that's the particular solution to my case).
On the luajit program, it's possible to define the address on linking time and it looks like this:
1003d0d4: 00 00 80 39 li r12,0
1003d0d8: c6 07 8c 79 rldicr r12,r12,32,31
1003d0dc: 07 10 8c 65 oris r12,r12,4103
1003d0e0: 30 ca 8c 61 ori r12,r12,51760
1003d0e4: a6 03 89 7d mtctr r12
Quite different right?
a much detailed explanation can be found on this wonderful Gentoo wiki page.
The failing lintian check is this:
# Now that we're sure this is really a shared library, report on
# non-PIC problems.
if ($objdump->{$cur_file}->{TEXTREL}) {
tag 'shlib-with-non-pic-code', $cur_file;
}
So you can probably find the offending file by looking for a .o that contains a TEXTREL dynamic section (which is making its way into your final link).
To do this, you can use readelf --dyanamic, in something like the following:
find . -name '*.o' |
while read obj
do
if readelf --dynamic "$obj" | grep -q TEXTREL
then
echo "$obj contains a TEXTREL section"
fi
done

Resources