Identify virtual network interface in c - linux

I am trying to find which interface my device is running on with C.
I scanned all interfaces with ioctl, I arranged the result as :
iface 0 ==> lo IP = 127.0.0.1 FLAGS = 00000049 MAC: 00 00 00 00 00 00
iface 1 ==> eth0 iface no IP FLAGS = 00001043 MAC: 40 D6 3C 02 74 10
iface 2 ==> eth0.1 iface no IP FLAGS = 00001043 MAC: 40 D6 3C 02 74 10
iface 3 ==> ra0 iface no IP FLAGS = 00001043 MAC: 40 D6 3C 02 74 10
iface 4 ==> br-lan IP = 192.168.100.1 FLAGS = 00001043 MAC: 40 D6 3C 02 74 11
iface 5 ==> apcli0 IP = 192.168.1.17 FLAGS = 00001043 MAC: 42 D6 3C 02 74 10
iface 6 ==> mon0 iface no IP FLAGS = 00001002 MAC: 40 D6 3C 02 74 10
I used getifaddrs() to get list of interfces, then ioctl (IP using SIOCGIFADDR), (flags using SIOCGIFFLAGS) enum net_device_flags, and (mac using SIOCGIFHWADDR).
From the list, I can identify the loopback, non-working interfaces that do not have IP. I still have two interfaces that are identical FLAGS and have IP (apcli0 & br-lan).
The br-lan is a virtual interface.
Is there away to identify if the interface is virtual (with C)?
Here is my code:
int i;
int fd;
int cnt = 0;
struct ifaddrs *addrs,*tmp;
char ibuf[256];
struct ifreq ifr;
fd = socket(AF_INET, SOCK_DGRAM, 0);
getifaddrs(&addrs);
tmp = addrs;
while (tmp)
{
if ( (tmp->ifa_addr) && (tmp->ifa_addr->sa_family == AF_PACKET)) {
printf("iface %i ==> %s\n", cnt, tmp->ifa_name);
// get ip
ifr.ifr_addr.sa_family = AF_INET;
strncpy(ifr.ifr_name, tmp->ifa_name, IFNAMSIZ);
if (ioctl(fd, SIOCGIFADDR, &ifr) == 0) {
printf("IP = %s\n", (char *)inet_ntoa(((struct sockaddr_in *)&ifr.ifr_addr)->sin_addr));
} else {
printf("iface no IP\n");
}
// flags
if (ioctl(fd, SIOCGIFFLAGS, &ifr) == 0) {
printf("FLAGS = %08X\n", ifr.ifr_flags);
} else {
printf("iface no flags\n");
}
// mac addr
if (ioctl(fd, SIOCGIFHWADDR, &ifr) == 0) {
memcpy(ibuf, ifr.ifr_hwaddr.sa_data, 6);
printf("MAC: ");
for (i=0; i<6; i++) {
printf("%02X ", ibuf[i] & 0xFF);
}
printf("\n");
} else {
printf("iface no mac\n");
}
cnt++;
}
tmp = tmp->ifa_next;
}
freeifaddrs(addrs);
Thanks.

I don't believe this is possible with information returned by getifaddrs(), because there are no SIOCGIFFLAGS which indicate that a device is virtual. You can find some of the virtual devices by, for example, checking for the IFF_LOOPBACK flag in iface->ifa_flags (loopback is a virtual interface), but there isn't information in the struct which says specifically whether the device is physical or virtual.
With reasonably recent Linux kernels (not sure starting one, but it has to at least provide a sysfs), I believe this can be determined by examining /sys/class/net -- the directories here will represent the interfaces reported by ifconfig and getifaddrs(), and will either be symlinked to a device in /sys/devices/platform (physical devices) or /sys/devices/virtual (virtual devices). I don't know of a clean way to do this in "regular C" (with nice sane function calls and then checking a flag for physical or virtual), but it seems possible to do it by examining the directory structure in /sys.
edit: looks like pci devices, which are physical devices, can also show up under /sys/devices/pci0000:00 or similar, so not all physical devices are going to be in /sys/devices/physical, so a better check is that /sys/class/net directories aren't symlinked to /sys/devices/virtual directories...

Following c++ snippet is based on #sinback answer.
#include <iostream>
#include <filesystem>
#include <cstring>
#include <vector>
#include <limits.h>
#include <stdlib.h>
namespace fs = std::filesystem;
char buf[PATH_MAX];
int main()
{
std::string path = "/sys/class/net";
for (const auto & entry : fs::directory_iterator(path))
{
auto res = realpath(entry.path().c_str(), buf);
if (res) {
auto abs_path = fs::path(buf);
auto abs_path_splits = std::vector<fs::path>(abs_path.begin(), abs_path.end());
if (abs_path_splits[3].filename() == "virtual") {
std::cout << "VIRTUAL ";
} else {
std::cout << "PHYSICAL ";
}
std::cout << "DEVICE " << entry.path() << " evaluates to " << buf << std::endl;
} else {
std::cerr << " Error (" << errno << "): " << strerror(errno) << std::endl;
return EXIT_FAILURE;
}
}
return 0;
}

Related

How to convert hex_dump of packets, which were captured in kernel module, to pcap file?

I am writing a kernel module on Linux (Xubuntu x64). The version of the kernel is 5.4.0-52-generic. My kernel module is capturing traffic from an interface and printing it in hex:
Nov 10 14:04:34 ubuntu kernel: [404009.566887] Packet hex dump:
Nov 10 14:04:34 ubuntu kernel: [404009.566889] 000000 00 00 00 00 00 00 00 00 00 00 00 00 08 00 45 00
Nov 10 14:04:34 ubuntu kernel: [404009.566899] 000010 00 54 49 4C 40 00 40 01 A7 EF C0 A8 64 0E C0 A8
Nov 10 14:04:34 ubuntu kernel: [404009.566907] 000020 64 0E 08 00 9E FE 00 03 00 08 72 0E AB 5F 00 00
Nov 10 14:04:34 ubuntu kernel: [404009.566914] 000030 00 00 7B B5 01 00 00 00 00 00 10 11 12 13 14 15
Nov 10 14:04:34 ubuntu kernel: [404009.566922] 000040 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25
Nov 10 14:04:34 ubuntu kernel: [404009.566929] 000050 26 27 28 29
This output I've got using this command under root: tail -f /var/log/kern.log
The whole problem is that I need to save this output as pcap-file. I know that there is text2pcap but its library (libpcap) is user-mode only so I can't use it in kernel module (or maybe not? Correct me if I'm wrong).
Is it possible to use text2pcap in kernel module? Otherwise, How can I save an output as pcap file while being in kernel module?
Source code:
#include <linux/module.h> // included for all kernel modules
#include <linux/kernel.h> // included for KERN_INFO
#include <linux/init.h> // included for __init and __exit macros
#include <linux/skbuff.h> // included for struct sk_buff
#include <linux/if_packet.h> // include for packet info
#include <linux/ip.h> // include for ip_hdr
#include <linux/netdevice.h> // include for dev_add/remove_pack
#include <linux/if_ether.h> // include for ETH_P_ALL
#include <linux/unistd.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Tester");
MODULE_DESCRIPTION("Sample linux kernel module program to capture all network packets");
struct packet_type ji_proto;
void pkt_hex_dump(struct sk_buff *skb)
{
size_t len;
int rowsize = 16;
int i, l, linelen, remaining;
int li = 0;
uint8_t *data, ch;
printk("Packet hex dump:\n");
data = (uint8_t *) skb_mac_header(skb);
if (skb_is_nonlinear(skb)) {
len = skb->data_len;
} else {
len = skb->len;
}
remaining = len;
for (i = 0; i < len; i += rowsize) {
printk("%06d\t", li);
linelen = min(remaining, rowsize);
remaining -= rowsize;
for (l = 0; l < linelen; l++) {
ch = data[l];
printk(KERN_CONT "%02X ", (uint32_t) ch);
}
data += linelen;
li += 10;
printk(KERN_CONT "\n");
}
}
int ji_packet_rcv (struct sk_buff *skb, struct net_device *dev,struct packet_type *pt, struct net_device *orig_dev)
{
printk(KERN_INFO "New packet captured.\n");
/* linux/if_packet.h : Packet types */
// #define PACKET_HOST 0 /* To us */
// #define PACKET_BROADCAST 1 /* To all */
// #define PACKET_MULTICAST 2 /* To group */
// #define PACKET_OTHERHOST 3 /* To someone else */
// #define PACKET_OUTGOING 4 /* Outgoing of any type */
// #define PACKET_LOOPBACK 5 /* MC/BRD frame looped back */
// #define PACKET_USER 6 /* To user space */
// #define PACKET_KERNEL 7 /* To kernel space */
/* Unused, PACKET_FASTROUTE and PACKET_LOOPBACK are invisible to user space */
// #define PACKET_FASTROUTE 6 /* Fastrouted frame */
switch (skb->pkt_type)
{
case PACKET_HOST:
printk(KERN_INFO "PACKET to us − ");
break;
case PACKET_BROADCAST:
printk(KERN_INFO "PACKET to all − ");
break;
case PACKET_MULTICAST:
printk(KERN_INFO "PACKET to group − ");
break;
case PACKET_OTHERHOST:
printk(KERN_INFO "PACKET to someone else − ");
break;
case PACKET_OUTGOING:
printk(KERN_INFO "PACKET outgoing − ");
break;
case PACKET_LOOPBACK:
printk(KERN_INFO "PACKET LOOPBACK − ");
break;
case PACKET_FASTROUTE:
printk(KERN_INFO "PACKET FASTROUTE − ");
break;
}
//printk(KERN_CONT "Dev: %s ; 0x%.4X ; 0x%.4X \n", skb->dev->name, ntohs(skb->protocol), ip_hdr(skb)->protocol);
struct ethhdr *ether = eth_hdr(skb);
//printk("Source: %x:%x:%x:%x:%x:%x\n", ether->h_source[0], ether->h_source[1], ether->h_source[2], ether->h_source[3], ether->h_source[4], ether->h_source[5]);
//printk("Destination: %x:%x:%x:%x:%x:%x\n", ether->h_dest[0], ether->h_dest[1], ether->h_dest[2], ether->h_dest[3], ether->h_dest[4], ether->h_dest[5]);
//printk("Protocol: %d\n", ether->h_proto);
pkt_hex_dump(skb);
kfree_skb (skb);
return 0;
}
static int __init ji_init(void)
{
/* See the <linux/if_ether.h>
When protocol is set to htons(ETH_P_ALL), then all protocols are received.
All incoming packets of that protocol type will be passed to the packet
socket before they are passed to the protocols implemented in the kernel. */
/* Few examples */
//ETH_P_LOOP 0x0060 /* Ethernet Loopback packet */
//ETH_P_IP 0x0800 /* Internet Protocol packet */
//ETH_P_ARP 0x0806 /* Address Resolution packet */
//ETH_P_LOOPBACK 0x9000 /* Ethernet loopback packet, per IEEE 802.3 */
//ETH_P_ALL 0x0003 /* Every packet (be careful!!!) */
//ETH_P_802_2 0x0004 /* 802.2 frames */
//ETH_P_SNAP 0x0005 /* Internal only */
ji_proto.type = htons(ETH_P_IP);
/* NULL is a wildcard */
//ji_proto.dev = NULL;
ji_proto.dev = dev_get_by_name (&init_net, "enp0s3");
ji_proto.func = ji_packet_rcv;
/* Packet sockets are used to receive or send raw packets at the device
driver (OSI Layer 2) level. They allow the user to implement
protocol modules in user space on top of the physical layer. */
/* Add a protocol handler to the networking stack.
The passed packet_type is linked into kernel lists and may not be freed until
it has been removed from the kernel lists. */
dev_add_pack (&ji_proto);
printk(KERN_INFO "Module insertion completed successfully!\n");
return 0; // Non-zero return means that the module couldn't be loaded.
}
static void __exit ji_cleanup(void)
{
dev_remove_pack(&ji_proto);
printk(KERN_INFO "Cleaning up module....\n");
}
module_init(ji_init);
module_exit(ji_cleanup);
The problem was solved using call_usermodehelper() to call user-mode text2pcap with arguments as if text2pcap was called using terminal.
Is it possible to use text2pcap in kernel module?
Not without putting it and the code it uses to write a pcap file (which isn't from libpcap, it's from a small library that's part of Wireshark, also used by dumpcap to write pcap and pcapng files) into the kernel.
How can I save an output as pcap file while being in kernel module?
You could write your own code to open a file and write to it in the kernel module; "Writing to a file from the Kernel" talks about that.
It also says
A "preferred" technique would be to pass the parameters in via IOCTLs and implement a read() function in your module. Then reading the dump from the module and writing into the file from userspace.
so you might want to consider that; the userspace code could just use libpcap to write the file.

recvfrom(2) receives UDP broadcast twice, but tcpdump(8) receives it only once

Summary: I want to receive packets from a single interface, but setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE, iface, 1 + strlen(iface)) doesn't play well with recvfrom and shows all packets on all interfaces. tcpdump works well, however.
I have a strong feeling that there's something wrong with the receiver program, but I haven't been able to figure it out.
I'm working with a Netronome Agilio CX SmartNIC. The two ports on the NIC are connected together with one cable, and the port on the motherboard are connected to the wall (so I can SSH into it). The board-loaded NIC is eth0 in the OS, while the SmartNIC presents two interfaces as enp1s0np0 and enp1s0np1.
Because the two interfaces on the SmartNIC has no associated IP addresses, I have to send broadcast to one port so it arrives at the other port. For now, I send to enp1s0np0 and expect it from enp1s0np1.
I have also deployed a XDP offload program that modifies part of the packet so I can know whether the packet arrives at enp1s0np1. The program changes the string at position 28~35 to another string (of the form !......!).
The problem I am having is, I wrote a receiver program myself, and it receives two packets for every packet I send - the first is the original, while the second is the XDP-modified packet. However, tcpdump only receives the modified packet (expected behavior).
I am unsure why my program is getting it twice - I don't think it should be able to see the unmodified packet.
This is the packet sender program. It reads 32 double-precision floating point numbers and packs them into a 256-byte block, and prepends the block with 16 bytes of "magic numbers".
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include "config.h"
#include "util.h"
typedef unsigned char byte;
void sanity_check(void);
int main(int argc, char** argv) {
sanity_check();
int sock = socket(AF_INET, SOCK_DGRAM, 0);
if (sock < 0)
errorexit("socket");
char iface[16] = "enp1s0np0";
if (setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE, iface, 1 + strlen(iface)))
errorexit("setsockopt");
int optval = -1;
if (setsockopt(sock, SOL_SOCKET, SO_BROADCAST | SO_REUSEADDR, &optval, sizeof(int)))
errorexit("setsockopt");
struct sockaddr_in target_addr = {
.sin_family = AF_INET,
.sin_addr.s_addr = 0xFFFFFFFF,
.sin_port = htons(6666)
};
byte buf[272];
// Prepare data
{
unsigned long magic = MAGIC;
memcpy(buf + 0, &magic, sizeof magic);
unsigned long zero = 0UL;
memcpy(buf + 8, &zero, sizeof zero);
double data;
for (int i = 0; i < 32; i++) {
scanf(" %lf", &data);
memcpy(buf + 16 + 8 * i, &data, sizeof data);
}
}
int sent = sendto(sock, buf, sizeof(buf), 0, (struct sockaddr*)&target_addr, sizeof(struct sockaddr));
if (sent != 0)
errorexit("send");
printf("%d bytes sent.\n", sent);
// if (shutdown(sock, SHUT_RDWR))
if (close(sock))
errorexit("close");
return 0;
}
void sanity_check(void) {
if (getuid() || geteuid()) {
fprintf(stderr, "Need root to proceed\n");
exit(1);
}
}
This is the receiver program. In fact, it's receiving every single packet that comes into the machine, with most of them being SSH data. I had to add checks for the magic number or it just spams the terminal. I guess it just failed to listen to the specific interface. (Check is if (buf[28] != '!' || buf[35] != '!') continue;)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include "config.h"
#include "util.h"
typedef unsigned char byte;
void sanity_check(void);
int main(int argc, char** argv) {
sanity_check();
int sock = socket(AF_PACKET, SOCK_DGRAM, htons(3));
if (sock < 0)
errorexit("socket");
char iface[16] = "enp1s0np1";
if (setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE, iface, 1 + strlen(iface)))
errorexit("setsockopt");
struct sockaddr_in target_addr = {
.sin_family = AF_INET,
.sin_addr.s_addr = htonl(INADDR_ANY),
.sin_port = htons(UDP_PORT)
};
size_t bufsize = 8192;
byte *buf = malloc(bufsize);
unsigned long magic = MAGIC;
int saddr_size = sizeof(struct sockaddr);
printf("Preparing to receive... ");
fflush(stdout);
while (1) {
int received = recvfrom(sock, buf, bufsize, 0,(struct sockaddr *)&target_addr , (socklen_t*)&saddr_size);
if (received < 0)
errorexit("receive");
if (received == 0)
break;
else if (buf[28] != '!' || buf[35] != '!') // Magic check
continue;
printf("%d bytes received.\n", received);
hexdump(buf, received);
}
if (close(sock))
errorexit("close");
free(buf);
return 0;
}
void sanity_check(void) {
if (getuid() || geteuid()) {
fprintf(stderr, "Need root to proceed\n");
exit(1);
}
}
The file util.c (link to Gist) contains two utility functions (errorexit which is just a wrapper of perror and exit, and a badly hand-crafted hexdump function for displaying) and is irrelevant here.
The constant MAGIC is defined as
#define MAGIC 0x216C7174786A7A21UL // string "!zjxtql!"
Here's the console output of my program (recv.c compiled into recv) and the tcpdump command, with irrelevant data truncated. Before both programs are killed, only one packet is sent from the sender program. The special thing to note is the data at position 28 (was originally !zjxtql!, should be modified to !wjfskb! by the XDP offload program).
$ sudo ./recv
Preparing to receive...
300 bytes received.
00000000 45 00 01 2C 4C 1A 40 00 40 11 B4 7F 72 D6 C6 51 |E..,L.#.#...r..Q|
00000010 FF FF FF FF B0 9F 1A 0A 01 18 3A 51 21 7A 6A 78 |..........:Q!zjx|
00000020 74 71 6C 21 00 00 00 00 00 00 00 00 29 5C 8F C2 |tql!........)\..|
00000120 14 AE F7 3F AE 47 E1 7A 14 AE F7 3F |...?.G.z...?|
0000012C
300 bytes received.
00000000 45 00 01 2C 4C 1A 40 00 40 11 B4 7F 72 D6 C6 51 |E..,L.#.#...r..Q|
00000010 FF FF FF FF B0 9F 1A 0A 01 18 C0 9B 21 77 6A 66 |............!wjf|
00000020 73 6B 62 21 00 00 00 00 00 00 00 00 29 5C 8F C2 |skb!........)\..|
00000120 14 AE F7 3F AE 47 E1 7A 14 AE F7 3F |...?.G.z...?|
0000012C
^C
$ sudo tcpdump -vv -X -i enp1s0np1 port 6666
tcpdump: listening on enp1s0np1, link-type EN10MB (Ethernet), capture size 262144 bytes
04:53:52.819657 IP (tos 0x0, ttl 64, id 8595, offset 0, flags [DF], proto UDP (17), length 300)
agilio1415.47585 > 255.255.255.255.ircu-2: [bad udp cksum 0xb759 -> 0xc274!] UDP, length 272
0x0000: 4500 012c 2193 4000 4011 df06 72d6 c651 E..,!.#.#...r..Q
0x0010: ffff ffff b9e1 1a0a 0118 b759 2177 6a66 ...........Y!wjf
0x0020: 736b 6221 0000 0000 0000 0000 295c 8fc2 skb!........)\..
0x0120: 14ae f73f ae47 e17a 14ae f73f ...?.G.z...?
^C
I have tried straceing tcpdump and trying its job with setsockopt:
sudo strace -e setsockopt tcpdump -vv -X -i enp1s0np1 port 6666
which gives
setsockopt(3, SOL_PACKET, PACKET_ADD_MEMBERSHIP, {mr_ifindex=if_nametoindex("enp1s0np1"), mr_type=PACKET_MR_PROMISC, mr_alen=0, mr_address=}, 16) = 0
setsockopt(3, SOL_PACKET, PACKET_AUXDATA, [1], 4) = 0
setsockopt(3, SOL_PACKET, PACKET_VERSION, [1], 4) = 0
setsockopt(3, SOL_PACKET, PACKET_RESERVE, [4], 4) = 0
setsockopt(3, SOL_PACKET, PACKET_RX_RING, 0x7ffe9d8d1510, 28) = 0
setsockopt(7, SOL_SOCKET, SO_RCVBUF, [8388608], 4) = 0
setsockopt(7, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, {len=1, filter=0x7fa5289de000}, 16) = 0
setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, {len=24, filter=0x56025631f280}, 16) = 0
tcpdump: listening on enp1s0np1, link-type EN10MB (Ethernet), capture size 262144 bytes
because I don't understand the others, I mimicked only the first call to setsockopt of tcpdump:
struct packet_mreq mreq = {
.mr_ifindex = if_nametoindex(iface),
.mr_type = PACKET_MR_PROMISC,
.mr_alen = 0
};
if (setsockopt(sock, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq, sizeof(mreq)))
errorexit("setsockopt");
The above code is a replacement for the setsockopt(SO_BINDTODEVICE) call in the receiver program, but I haven't observed any difference (still all packets from all interfaces are caught).
It looks like all that I'm missing is a bind(2). Unfortunately, SO_BINDTOINTERFACE doesn't work with AF_PACKET, so bind(2) is the only solution.
The code isn't any complex:
struct sockaddr_ll sll = {
.sll_family = AF_PACKET,
.sll_ifindex = if_nametoindex(iface),
.sll_protocol = htons(3) // 3 = ETH_P_ALL
};
if (bind(sock, (struct sockaddr*)&sll, sizeof sll))
errorexit("sock");
From the same socket.7 page:
SO_BINDTOSOCKET
... Note that this works only for some socket types, particularly AF_INET sockets. It is not supported for packet sockets (use normal bind(2) there).
Hmmm, guess I should've read the manual more thoroughly.

how to work with references using ffi?

I have folowing c++ function compiled with node-gyp.
void InterfaceTestOne(const string & mytest)
{
const string *ptr = &mytest;
printf("The variable X is at 0x%p\n", (void *)ptr);
cout << mytest.length() << endl;
cout << mytest << endl;
}
and nodejs code:
var math = ffi.Library(main, {
"_Z16InterfaceTestOneRKSsi": ["void", [ref.refType(string)]],
});
var output = ref.allocCString('hello alex');
console.log("length", output.length) #12
console.log("address", output.address()) #4353704472
console.log("ref", output.ref()) # <Buffer#0x103804228 18 42 80 03 01 00 00 00>
math._Z16InterfaceTestOneRKSsi(output.ref())
#c++ output
The variable X is at 0x0x103804230
4347093136 <-- mytest size
This is TestOne
hello alex�����s�P��X��0��
����������������������p
p�]���^#��psp���5�����#4������(�����G����� ��qf���6�ՃP���f���6�Ճ���q�X؃ \���u������������3������3������!Ą
p�?�����p �J
����h����������8n���������8v
�d�����s����� ����������(������a......
So instead of string size 12 i got in c++ size 4353704472
ref address in nodejs and c++ also is difference 0x103804228 and 0x0x103804230.
and looks that's why i got in addition to my text a ton of garbage.
Any idea how to fix that?

Why my spi test C code get this result?

At the bottom is the spi test code (spitest.c) I used, and when running it on my linux kit, I got this result:
root#abcd-kit:/system # ./spitest
open device: /dev/spidev0.0
set spi mode: 0
set bits per word: 8
set max speed: 2000000 Hz (2 MHz)
the received data is below:
00 00 00 00 30 30
30 0A 00 00 00 00
00 00 00 00 2F 73
the received data is below:
00 00 00 00 30 30
30 0A 00 00 00 00
00 00 00 00 2F 73
...
dmesg output:
<7>[ 1254.714088] usif-spi e1100000.usif1: Pushing msg a8085ed0
<6>[ 1254.714367] SPI XFER :ae81c700 , Length : 18
<6>[ 1254.714404] TX Buf :a6207000 , TX DMA : (null)
<6>[ 1254.714425] RX Buf :92bf5000 , RX DMA : (null)
<6>[ 1254.714445] CS change:0, bits/w :8, delay : 0 us, speed : 2000000 Hz
<7>[ 1254.714471] TX--->:31 a5 bb 00 00 bb fc 76 80 84 1e 00 5c 29 7d 77
<7>[ 1254.714491] TX--->:44 b9
<7>[ 1254.714511] RX--->:00 00 00 00 30 30 30 0a 00 00 00 00 00 00 00 00
<7>[ 1254.714534] RX--->:2f 73
<7>[ 1254.714558] usif-spi e1100000.usif1: Msg a8085ed0 completed with status 0
<7>[ 1255.725936] usif-spi e1100000.usif1: Pushing msg a8085ed0
<6>[ 1255.726472] SPI XFER :ae81cc40 , Length : 18
<6>[ 1255.726604] TX Buf :a6207000 , TX DMA : (null)
<6>[ 1255.726656] RX Buf :92bf5000 , RX DMA : (null)
<6>[ 1255.726706] CS change:0, bits/w :8, delay : 0 us, speed : 2000000 Hz
<7>[ 1255.726773] TX--->:31 a5 bb 00 00 bb fc 76 94 29 7d 77 5c 29 7d 77
<7>[ 1255.726829] TX--->:44 b9
<7>[ 1255.726875] RX--->:00 00 00 00 30 30 30 0a 00 00 00 00 00 00 00 00
<7>[ 1255.726925] RX--->:2f 73
And the biggest problem is that I cannot get correct result from miso pin (read is wrong, can do write correctly). Whatever I do, e.g. connect miso to ground or 1.8V, it always give this kind of result. The fisrt 5 data are always zero (I think it is because tx buffer has size of 5 and it is half duplex), and then followed random data, even that I used memset() to set rx buffer data to be zero before each spi transfer. And if I stop the program and run it again, the data changed but they are still random.
How could I read correct data from miso pin?
Thanks!
spitest.c
/*
* SPI testing utility (using spidev driver)
*
* Copyright (c) 2007 MontaVista Software, Inc.
* Copyright (c) 2007 Anton Vorontsov <avorontsov#ru.mvista.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License.
*
* Cross-compile with cross-gcc -I/path/to/cross-kernel/include
*/
#include <stdint.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include "spidev.h"
#define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]))
static void pabort(const char *s)
{
perror(s);
abort();
}
static const char *device = "/dev/spidev0.0";
static uint8_t mode;
static uint8_t bits = 8;
static uint32_t speed = 2000000;
static uint16_t delay;
#define LENGTH 18
static void transfer(int fd)
{
int ret, i;
uint8_t tx[5] = {0x31, 0xa5, 0xbb};
uint8_t rx[LENGTH] = {0, };
struct spi_ioc_transfer tr = {
.tx_buf = (unsigned long)tx,
.rx_buf = (unsigned long)rx,
.len = LENGTH,
.delay_usecs = delay,
.speed_hz = speed,
.bits_per_word = bits, //important, bits = 8 means byte transfer is possible
};
memset(rx, 0, LENGTH);
ret = ioctl(fd, SPI_IOC_MESSAGE(1), &tr);
if (ret < 1)
pabort("can't send spi message\n");
printf("the received data is below:\n");
for (ret = 0; ret < LENGTH; ret++) { //print the received data, by Tom Xue
if (!(ret % 6))
puts("");
printf("%.2X ", rx[ret]);
}
puts("");
}
int main(int argc, char *argv[])
{
int ret = 0;
int fd;
unsigned char rd_buf[32];
fd = open(device, O_RDWR);
if (fd < 0)
pabort("can't open device\n");
/*
* * spi mode
* */
ret = ioctl(fd, SPI_IOC_WR_MODE, &mode);
if (ret == -1)
pabort("can't set spi mode\n");
ret = ioctl(fd, SPI_IOC_RD_MODE, &mode);
if (ret == -1)
pabort("can't get spi mode\n");
/*
* * bits per word
* */
ret = ioctl(fd, SPI_IOC_WR_BITS_PER_WORD, &bits);
if (ret == -1)
pabort("can't set bits per word\n");
ret = ioctl(fd, SPI_IOC_RD_BITS_PER_WORD, &bits);
if (ret == -1)
pabort("can't get bits per word\n");
/*
* * max speed hz
* */
ret = ioctl(fd, SPI_IOC_WR_MAX_SPEED_HZ, &speed);
if (ret == -1)
pabort("can't set max speed hz\n");
ret = ioctl(fd, SPI_IOC_RD_MAX_SPEED_HZ, &speed);
if (ret == -1)
pabort("can't get max speed hz\n");
printf("open device: %s\n", device);
printf("set spi mode: %d\n", mode);
printf("set bits per word: %d\n", bits);
printf("set max speed: %d Hz (%d MHz)\n", speed, speed/1000000);
while(1){
transfer(fd);
//read(fd, rd_buf, 4);
//printf("rd_buf = %s, %d, %d, %d, %d\n", rd_buf, rd_buf[0], rd_buf[1], rd_buf[2], rd_buf[3]);
//memset(rd_buf, 0, 10);
sleep(1);
}
close(fd);
return ret;
}
More:
My CPU is Intel Sofia-3gr, I guess its spec is not publicly released. I see the Tx data from my oscilloscope, and confirmed that TX is right.
I can also printk the pinmux/pinctrl setting (use ioremap and ioread32), it is also right. I say it right also because that I can see how to set it as SPI from those dts reference files, I just follow them.
Key:
I just find that the SPI TX interrupt is pending each time a spi transfer starts, but no SPI RX interrupt pending. Hence the spi driver code will not read the RX data at all. As the reason, I don't know.

Capturing user-space assembly with ftrace and kprobes (by using virtual address translation)?

Apologies for the longish post, I'm having trouble formulating it in a shorter way. Also, maybe this is more appropriate for Unix & Linux Stack Exchange, but I'll try here at SO first, as there is an ftrace tag.
Anyways - I'd like to observe do machine instructions of a user program execute in the context of a full function_graph capture using ftrace. One problem is that I need this for an older kernel:
$ uname -a
Linux mypc 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 18:00:43 UTC 2012 i686 i686 i386 GNU/Linux
... and in this edition, there is no UPROBES - which, as Uprobes in 3.5 [LWN.net] notes, should be able to do something like that. (As long as I don't have to patch the original kernel, I would be willing to try a kernel module built out of tree, as User-Space Probes (Uprobes) [chunghwan.com] seems to demonstrate; but as far as I can see from 0: Inode based uprobes [LWN.net], the 2.6 would probably need a full patch)
However, on this version, there is a /sys/kernel/debug/kprobes, and /sys/kernel/debug/tracing/kprobe_events; and Documentation/trace/kprobetrace.txt implies that a kprobe can be set directly on an address; even if I cannot find an example anywhere on how this is used.
In any case, I would still not be sure what addresses to use - as a small example, let's say I want to trace the start of the main function of the wtest.c program (included below). I can do this to compile and obtain an machine instruction assembly listing:
$ gcc -g -O0 wtest.c -o wtest
$ objdump -S wtest | less
...
08048474 <main>:
int main(void) {
8048474: 55 push %ebp
8048475: 89 e5 mov %esp,%ebp
8048477: 83 e4 f0 and $0xfffffff0,%esp
804847a: 83 ec 30 sub $0x30,%esp
804847d: 65 a1 14 00 00 00 mov %gs:0x14,%eax
8048483: 89 44 24 2c mov %eax,0x2c(%esp)
8048487: 31 c0 xor %eax,%eax
char filename[] = "/tmp/wtest.txt";
...
return 0;
804850a: b8 00 00 00 00 mov $0x0,%eax
}
...
I would set up ftrace logging via this script:
sudo bash -c '
KDBGPATH="/sys/kernel/debug/tracing"
echo function_graph > $KDBGPATH/current_tracer
echo funcgraph-abstime > $KDBGPATH/trace_options
echo funcgraph-proc > $KDBGPATH/trace_options
echo 0 > $KDBGPATH/tracing_on
echo > $KDBGPATH/trace
echo 1 > $KDBGPATH/tracing_on ; ./wtest ; echo 0 > $KDBGPATH/tracing_on
cat $KDBGPATH/trace > wtest.ftrace
'
You can see a portion of the (otherwise complex) resulting ftrace log in debugging - Observing a hard-disk write in kernel space (with drivers/modules) - Unix & Linux Stack Exchange (where I got the example from).
Basically, I'd want a printout in this ftrace log, when the first instructions of main - say, the instructions at 0x8048474, 0x8048475, 0x8048477, 0x804847a, 0x804847d, 0x8048483 and 0x8048487 - are executed by (any) CPU. The problem is, as far as I can understand from Anatomy of a Program in Memory : Gustavo Duarte, these addresses are the virtual addresses, as seen from the perspective of the process itself (and I gather, the same perspective is shown by /proc/PID/maps)... And apparently, for krpobe_event I'd need a physical address?
So, my idea would be: if I can find the physical addresses corresponding to the virtual addresses of the program disassembly (say by coding a kernel module, which would accept pid and address, and return the physical address via procfs), I could set up addresses as a sort of "tracepoints" via /sys/kernel/debug/tracing/kprobe_events in the above script - and hopefully get them in the ftrace log. Could this work, in principle?
One problem with this, I found on Linux(ubuntu), C language: Virtual to Physical Address Translation - Stack Overflow:
In user code, you can't know the physical address corresponding to a virtual address. This is information is simply not exported outside the kernel. It could even change at any time, especially if the kernel decides to swap out part of your process's memory.
...
Pass the virtual address to the kernel using systemcall/procfs and use vmalloc_to_pfn. Return the Physical address through procfs/registers.
However, vmalloc_to_pfn doesn't seem to be trivial either:
x86 64 - vmalloc_to_pfn returns 32 bit address on Linux 32 system. Why does it chop off higher bits of PAE physical address? - Stack Overflow
VA: 0xf8ab87fc PA using vmalloc_to_pfn: 0x36f7f7fc. But I'm actually expecting: 0x136f7f7fc.
...
The physical address falls between 4 to 5 GB. But I can't get the exact physical address, I only get the chopped off 32-bit address. Is there another way to get true physical address?
So, I'm not sure how reliably I could extract the physical addresses so they are traced by kprobes - especially since "it could even change at any time". But here, I would hope that since the program is small and trivial, there would be a reasonable chance that the program would not swap while being traced, allowing for a proper capture to be obtained. (So even if I have to run the debug script above multiple times, as long as I can hope to obtain a "proper" capture once out of 10 times (or even 100 times), I'd be OK with it.).
Note that I'd want an output through ftrace, so that the timestamps are expressed in the same domain (see Reliable Linux kernel timestamps (or adjustment thereof) with both usbmon and ftrace? - Stack Overflow for an illustration of a problem with timestamps). Thus, even if I could come up with, say, a gdb script, to run and trace the program from userspace (while simultaneously an ftrace capture is obtained) - I'd like to avoid that, as the overhead from gdb itself will show in the ftrace logs.
So, in summary:
Is the approach of obtaining (possibly through a separate kernel module) physical addresses from the virtual (from a disassembly of an executable) addresses - so they are used to trigger a kprobe_event logged by ftrace - worth pursuing? If so, are there any examples of kernel modules that can be used for this address translation purpose?
Could I otherwise use a kernel module to "register" a callback/handler function when a particular memory address is being executed? Then I could simply use a trace_printk in that function to have an ftrace log (or even without that, the handler function name itself should show in the ftrace log), and it doesn't seem there will be too much overhead with that...
Actually, in this 2007 posting, Jim Keniston - utrace-based uprobes: systemtap mailing list, there is a 11. Uprobes Example (added to Documentation/uprobes.txt), which seems to be exactly that - a kernel module registering a handler function. Unfortunately, it uses linux/uprobes.h; and I have only kprobes.h in my /usr/src/linux-headers-2.6.38-16/include/linux/. Also, on my system, even systemtap complains about CONFIG_UTRACE not being enabled (see this comment)... So if there's any other approach I could use to obtain a debug trace like I want, without having to recompile the kernel to get uprobes, it would be great to know...
wtest.c:
#include <stdio.h>
#include <fcntl.h> // O_CREAT, O_WRONLY, S_IRUSR
int main(void) {
char filename[] = "/tmp/wtest.txt";
char buffer[] = "abcd";
int fd;
mode_t perms = S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH;
fd = open(filename, O_RDWR|O_CREAT, perms);
write(fd,buffer,4);
close(fd);
return 0;
}
Obviously, this would be much easier with built-in uprobes on kernels 3.5+; but given that uprobes for my kernel 2.6.38 is a very deep-going patch (which I couldn't really isolate in a separate kernel module, so as to avoid patching the kernel), here is what I can note for a standalone module on 2.6.38. (Since I'm still unsure of many things, I would still like to see an answer that would corrects any misunderstandings in this post.)
I think I got somewhere, but not with kprobes. I'm not sure, but it seems I managed to get physical addresses right; however, kprobes documentation is specific that when using "#ADDR : fetch memory at ADDR (ADDR should be in kernel)"; and the physical addresses I get are below kernel boundary of 0xc0000000 (but then, 0xc0000000 is usually together with the virtual memory layout?).
So I used a hardware breakpoint instead - the module is below, however caveat emptor - it behaves randomly, and occasionally can cause a kernel oops!. By compiling the module, and running in bash:
$ sudo bash -c 'KDBGPATH="/sys/kernel/debug/tracing" ;
echo function_graph > $KDBGPATH/current_tracer ; echo funcgraph-abstime > $KDBGPATH/trace_options
echo funcgraph-proc > $KDBGPATH/trace_options ; echo 8192 > $KDBGPATH/buffer_size_kb ;
echo 0 > $KDBGPATH/tracing_on ; echo > $KDBGPATH/trace'
$ sudo insmod ./callmodule.ko && sleep 0.1 && sudo rmmod callmodule && \
tail -n25 /var/log/syslog | tee log.txt && \
sudo cat /sys/kernel/debug/tracing/trace >> log.txt
... I get a log. I want to trace the first two instructions of the main() of wtest, which for me are:
$ objdump -S wtest/wtest | grep -A3 'int main'
int main(void) {
8048474: 55 push %ebp
8048475: 89 e5 mov %esp,%ebp
8048477: 83 e4 f0 and $0xfffffff0,%esp
... at virtual addresses 0x08048474 and 0x08048475. In the syslog output, I could get, say:
...
[ 1106.383011] callmodule: parent task a: f40a9940 c: kworker/u:1 p: [14] s: stopped
[ 1106.383017] callmodule: - wtest [9404]
[ 1106.383023] callmodule: Trying to walk page table; addr task 0xEAE90CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4
[ 1106.383029] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is # f63e5d80; *virtual (page_address) # (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
[ 1106.383049] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is # f63e5d80; *virtual (page_address) # (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
[ 1106.383067] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is # f63e5d80; *virtual (page_address) # (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
[ 1106.383083] callmodule: physaddr : (0x080483c0 ->) 0x639ec3c0 : (0x08048474 ->) 0x639ec474
[ 1106.383106] callmodule: 0x08048474 id [3]
[ 1106.383113] callmodule: 0x08048475 id [4]
[ 1106.383118] callmodule: (( 0x08048000 is_vmalloc_addr 0 virt_addr_valid 0 ))
[ 1106.383130] callmodule: cont pid task a: eae90ca0 c: wtest p: [9404] s: runnable
[ 1106.383147] initcall callmodule_init+0x0/0x1000 [callmodule] returned with preemption imbalance
[ 1106.518074] callmodule: < exit
... meaning that it mapped the virtual address 0x08048474 to physical address 0x639ec474. However, the physical is not used for hardware breakpoints - there we can supply a virtual address directly to register_user_hw_breakpoint; however, we also need to supply the task_struct of the process too. With that, I can get something like this in the ftrace output:
...
597.907256 | 1) wtest-5339 | | handle_mm_fault() {
...
597.907310 | 1) wtest-5339 | + 35.627 us | }
597.907311 | 1) wtest-5339 | + 46.245 us | }
597.907312 | 1) wtest-5339 | + 56.143 us | }
597.907313 | 1) wtest-5339 | 1.039 us | up_read();
597.907317 | 1) wtest-5339 | 1.285 us | native_get_debugreg();
597.907319 | 1) wtest-5339 | 1.075 us | native_set_debugreg();
597.907322 | 1) wtest-5339 | 1.129 us | native_get_debugreg();
597.907324 | 1) wtest-5339 | 1.189 us | native_set_debugreg();
597.907329 | 1) wtest-5339 | | () {
597.907333 | 1) wtest-5339 | | /* callmodule: hwbp hit: id [3] */
597.907334 | 1) wtest-5339 | 5.567 us | }
597.907336 | 1) wtest-5339 | 1.123 us | native_set_debugreg();
597.907339 | 1) wtest-5339 | 1.130 us | native_get_debugreg();
597.907341 | 1) wtest-5339 | 1.075 us | native_set_debugreg();
597.907343 | 1) wtest-5339 | 1.075 us | native_get_debugreg();
597.907345 | 1) wtest-5339 | 1.081 us | native_set_debugreg();
597.907348 | 1) wtest-5339 | | () {
597.907350 | 1) wtest-5339 | | /* callmodule: hwbp hit: id [4] */
597.907351 | 1) wtest-5339 | 3.033 us | }
597.907352 | 1) wtest-5339 | 1.105 us | native_set_debugreg();
597.907358 | 1) wtest-5339 | 1.315 us | down_read_trylock();
597.907360 | 1) wtest-5339 | 1.123 us | _cond_resched();
597.907362 | 1) wtest-5339 | 1.027 us | find_vma();
597.907364 | 1) wtest-5339 | | handle_mm_fault() {
...
... where the traces corresponding to the assembly are marked by breakpoint id. Thankfully, they are right after another, as expected; however, ftrace has also captured some debug commands in-between. In any case, this is what I wanted to see.
Here are some notes about the module:
Most of the module is from Execute/invoke user-space program, and get its pid, from a kernel module ; where a user process is started and pid obtained
Since we have to get to the task_struct to get to the pid; here I save both (which is kind of redundant)
Where functions symbols are not exported; if the symbol is in kallsyms, then I use a function pointer to the address; else other needed functions are copied from source
I didn't know how to start the user-space process stopped, so after spawning I issue a SIGSTOP (which on its own, seems kind of unreliable at that point), and set state to __TASK_STOPPED).
I may still get status "runnable" where I don't expect it sometimes - however, if the init exits early with an error, I've noticed wtest hanging in process list long after it would have terminated naturally, so I guess that works.
To get absolute/physical addresses, I used Walking page tables of a process in Linux to get to the page corresponding to a virtual address, and then digging through kernel sources I found page_to_phys() to get to the address (internally via page frame number); LDD3 ch.15 helps with understanding relationship between pfn and physical address.
Since here I expect to have physical address, I don't use PAGE_SHIFT, but calculate offsets directly from objdump's assembly output - I am not 100% sure this is correct, though.
Note, ( see also How to get a struct page from any address in the Linux kernel ), the module output says that the virtual address 0x08048000 is neither is_vmalloc_addr nor virt_addr_valid; I guess, this should tell me, one couldn't have used neither vmalloc_to_pfn() nor virt_to_page() to get to its physical address !?
Setting up kprobes for ftrace from kernel space is kinda tricky (needs functions copied)
Trying to set a kprobe on the physical addresses I get (e.g. 0x639ec474), always results with "Could not insert probe(-22)"
Just to see if the format is parsed, I'm trying with the kallsyms address of the tracing_on() function (0xc10bcf60) below; that seems to work - because it raises a fatal "BUG: scheduling while atomic" (apparently, we're not meant to set breakpoints in module_init?). Bug is fatal, because it makes the kprobes directory dissapear from the ftrace debug directory
Just creating the kprobe would not make it appear in the ftrace log - it also needs to be enabled; the necessary code for enabling is there - but I've never tried it, because of the previous bug
Finally, the breakpoint setting is from Watch a variable (memory address) change in Linux kernel, and print stack trace when it changes?
I've never seen an example for setting an executable hardware breakpoint; it kept failing for me, until through kernel source search, I found that for HW_BREAKPOINT_X, attr.bp_len need to be set to sizeof(long)
If I try to printk the attr variable(s) - from _init or from the handler - something gets seriously messed up, and whatever variable I try to print next, I get value 0x5 (or 0x48) for it (?!)
Since I'm trying to use a single handler function for both breakpoints, the only reliable piece of info that survives from _init to the handler, able to differentiate between the two, seems to be bp->id
These id's are autoassigned, and seems they are not re-claimed if you unregister the breakpoints (I do not unregister them to avoid extra ftrace printouts).
As far as the randomness goes, I think this is because the process is not started in a stopped state; and by the time it gets stopped, it ends up in a different state (or, quite possibly, I'm missing some locking somewhere). Anyways, you can also expect in syslog:
[ 1661.815114] callmodule: Trying to walk page table; addr task 0xEAF68CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4
[ 1661.815319] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is # f5772000; *virtual (page_address) # c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
[ 1661.815837] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is # f5772000; *virtual (page_address) # c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
[ 1661.816846] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is # f5772000; *virtual (page_address) # c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
... that is, even with a proper task pointer (judging by start_code), only 0x0 is obtained as physical address. Sometimes you get the same outcome, but with start_code: 0x00000000 ->end_code: 0x00000000. And sometimes, a task_struct cannot be obtained, even if pid can:
[ 833.380417] callmodule:c: pid 7663
[ 833.380424] callmodule: everything all right; pid 7663 (7663)
[ 833.380430] callmodule: p is NULL - exiting
[ 833.516160] callmodule: < exit
Well, hopefully someone will comment and clarify some of the behavior of this module :)
Hope this helps someone,
Cheers!
Makefile:
EXTRA_CFLAGS=-g -O0
obj-m += callmodule.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
callmodule.c:
#include <linux/module.h>
#include <linux/slab.h> //kzalloc
#include <linux/syscalls.h> // SIGCHLD, ... sys_wait4, ...
#include <linux/kallsyms.h> // kallsyms_lookup, print_symbol
#include <linux/highmem.h> // ‘kmap_atomic’ (via pte_offset_map)
#include <asm/io.h> // page_to_phys (arch/x86/include/asm/io.h)
struct subprocess_infoB; // forward declare
// global variable - to avoid intervening too much in the return of call_usermodehelperB:
static int callmodule_pid;
static struct subprocess_infoB* callmodule_infoB;
#define TRY_USE_KPROBES 0 // 1 // enable/disable kprobes usage code
#include <linux/kprobes.h> // enable_kprobe
// for hardware breakpoint:
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
// define a modified struct (with extra fields) here:
struct subprocess_infoB {
struct work_struct work;
struct completion *complete;
char *path;
char **argv;
char **envp;
int wait; //enum umh_wait wait;
int retval;
int (*init)(struct subprocess_info *info);
void (*cleanup)(struct subprocess_info *info);
void *data;
pid_t pid;
struct task_struct *task;
unsigned long long last_page_physaddr;
};
struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
char **envp, gfp_t gfp_mask);
static inline int
call_usermodehelper_fnsB(char *path, char **argv, char **envp,
int wait, //enum umh_wait wait,
int (*init)(struct subprocess_info *info),
void (*cleanup)(struct subprocess_info *), void *data)
{
struct subprocess_info *info;
struct subprocess_infoB *infoB;
gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
int ret;
populate_rootfs_wait();
infoB = call_usermodehelper_setupB(path, argv, envp, gfp_mask);
printk(KBUILD_MODNAME ":a: pid %d\n", infoB->pid);
info = (struct subprocess_info *) infoB;
if (info == NULL)
return -ENOMEM;
call_usermodehelper_setfns(info, init, cleanup, data);
printk(KBUILD_MODNAME ":b: pid %d\n", infoB->pid);
// this must be called first, before infoB->pid is populated (by __call_usermodehelperB):
ret = call_usermodehelper_exec(info, wait);
// assign global pid (and infoB) here, so rest of the code has it:
callmodule_pid = infoB->pid;
callmodule_infoB = infoB;
printk(KBUILD_MODNAME ":c: pid %d\n", callmodule_pid);
return ret;
}
static inline int
call_usermodehelperB(char *path, char **argv, char **envp, int wait) //enum umh_wait wait)
{
return call_usermodehelper_fnsB(path, argv, envp, wait,
NULL, NULL, NULL);
}
static void __call_usermodehelperB(struct work_struct *work)
{
struct subprocess_infoB *sub_infoB =
container_of(work, struct subprocess_infoB, work);
int wait = sub_infoB->wait; // enum umh_wait wait = sub_info->wait;
pid_t pid;
struct subprocess_info *sub_info;
// hack - declare function pointers
int (*ptrwait_for_helper)(void *data);
int (*ptr____call_usermodehelper)(void *data);
// assign function pointers to verbatim addresses as obtained from /proc/kallsyms
int killret;
struct task_struct *spawned_task;
ptrwait_for_helper = (void *)0xc1065b60;
ptr____call_usermodehelper = (void *)0xc1065ed0;
sub_info = (struct subprocess_info *)sub_infoB;
if (wait == UMH_WAIT_PROC)
pid = kernel_thread((*ptrwait_for_helper), sub_info, //(wait_for_helper, sub_info,
CLONE_FS | CLONE_FILES | SIGCHLD);
else
pid = kernel_thread((*ptr____call_usermodehelper), sub_info, //(____call_usermodehelper, sub_info,
CLONE_VFORK | SIGCHLD);
spawned_task = pid_task(find_vpid(pid), PIDTYPE_PID);
// stop/suspend/pause task
killret = kill_pid(find_vpid(pid), SIGSTOP, 1);
if (spawned_task!=NULL) {
// does this stop the process really?
spawned_task->state = __TASK_STOPPED;
printk(KBUILD_MODNAME ": : exst %d exco %d exsi %d diex %d inex %d inio %d\n", spawned_task->exit_state, spawned_task->exit_code, spawned_task->exit_signal, spawned_task->did_exec, spawned_task->in_execve, spawned_task->in_iowait);
}
printk(KBUILD_MODNAME ": : (kr: %d)\n", killret);
printk(KBUILD_MODNAME ": : pid %d (%p) (%s)\n", pid, spawned_task,
(spawned_task!=NULL)?((spawned_task->state==-1)?"unrunnable":((spawned_task->state==0)?"runnable":"stopped")):"null" );
// grab and save the pid (and task_struct) here:
sub_infoB->pid = pid;
sub_infoB->task = spawned_task;
switch (wait) {
case UMH_NO_WAIT:
call_usermodehelper_freeinfo(sub_info);
break;
case UMH_WAIT_PROC:
if (pid > 0)
break;
/* FALLTHROUGH */
case UMH_WAIT_EXEC:
if (pid < 0)
sub_info->retval = pid;
complete(sub_info->complete);
}
}
struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
char **envp, gfp_t gfp_mask)
{
struct subprocess_infoB *sub_infoB;
sub_infoB = kzalloc(sizeof(struct subprocess_infoB), gfp_mask);
if (!sub_infoB)
goto out;
INIT_WORK(&sub_infoB->work, __call_usermodehelperB);
sub_infoB->path = path;
sub_infoB->argv = argv;
sub_infoB->envp = envp;
out:
return sub_infoB;
}
#if TRY_USE_KPROBES
// copy from /kernel/trace/trace_probe.c (is unexported)
int traceprobe_command(const char *buf, int (*createfn)(int, char **))
{
char **argv;
int argc, ret;
argc = 0;
ret = 0;
argv = argv_split(GFP_KERNEL, buf, &argc);
if (!argv)
return -ENOMEM;
if (argc)
ret = createfn(argc, argv);
argv_free(argv);
return ret;
}
// copy from kernel/trace/trace_kprobe.c?v=2.6.38 (is unexported)
#define TP_FLAG_TRACE 1
#define TP_FLAG_PROFILE 2
typedef void (*fetch_func_t)(struct pt_regs *, void *, void *);
struct fetch_param {
fetch_func_t fn;
void *data;
};
typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *, void *);
enum {
FETCH_MTD_reg = 0,
FETCH_MTD_stack,
FETCH_MTD_retval,
FETCH_MTD_memory,
FETCH_MTD_symbol,
FETCH_MTD_deref,
FETCH_MTD_END,
};
// Fetch type information table * /
struct fetch_type {
const char *name; /* Name of type */
size_t size; /* Byte size of type */
int is_signed; /* Signed flag */
print_type_func_t print; /* Print functions */
const char *fmt; /* Fromat string */
const char *fmttype; /* Name in format file */
// Fetch functions * /
fetch_func_t fetch[FETCH_MTD_END];
};
struct probe_arg {
struct fetch_param fetch;
struct fetch_param fetch_size;
unsigned int offset; /* Offset from argument entry */
const char *name; /* Name of this argument */
const char *comm; /* Command of this argument */
const struct fetch_type *type; /* Type of this argument */
};
struct trace_probe {
struct list_head list;
struct kretprobe rp; /* Use rp.kp for kprobe use */
unsigned long nhit;
unsigned int flags; /* For TP_FLAG_* */
const char *symbol; /* symbol name */
struct ftrace_event_class class;
struct ftrace_event_call call;
ssize_t size; /* trace entry size */
unsigned int nr_args;
struct probe_arg args[];
};
static int probe_is_return(struct trace_probe *tp)
{
return tp->rp.handler != NULL;
}
static int probe_event_enable(struct ftrace_event_call *call)
{
struct trace_probe *tp = (struct trace_probe *)call->data;
tp->flags |= TP_FLAG_TRACE;
if (probe_is_return(tp))
return enable_kretprobe(&tp->rp);
else
return enable_kprobe(&tp->rp.kp);
}
#define KPROBE_EVENT_SYSTEM "kprobes"
#endif // TRY_USE_KPROBES
// <<<<<<<<<<<<<<<<<<<<<<
static struct page *walk_page_table(unsigned long addr, struct task_struct *intask)
{
pgd_t *pgd;
pte_t *ptep, pte;
pud_t *pud;
pmd_t *pmd;
struct page *page = NULL;
struct mm_struct *mm = intask->mm;
callmodule_infoB->last_page_physaddr = 0ULL; // reset here, in case of early exit
printk(KBUILD_MODNAME ": walk_ 0x%lx ", addr);
pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd) || pgd_bad(*pgd))
goto out;
printk(KBUILD_MODNAME ": Valid pgd ");
pud = pud_offset(pgd, addr);
if (pud_none(*pud) || pud_bad(*pud))
goto out;
printk( ": Valid pud");
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd) || pmd_bad(*pmd))
goto out;
printk( ": Valid pmd");
ptep = pte_offset_map(pmd, addr);
if (!ptep)
goto out;
pte = *ptep;
page = pte_page(pte);
if (page) {
callmodule_infoB->last_page_physaddr = (unsigned long long)page_to_phys(page);
printk( ": page frame struct is # %p; *virtual (page_address) # %p (is_vmalloc_addr %d virt_addr_valid %d virt_to_phys 0x%llx) page_to_pfn %lx page_to_phys 0x%llx", page, page_address(page), is_vmalloc_addr((void*)page_address(page)), virt_addr_valid(page_address(page)), (unsigned long long)virt_to_phys(page_address(page)), page_to_pfn(page), callmodule_infoB->last_page_physaddr);
}
//~ pte_unmap(ptep);
out:
printk("\n");
return page;
}
static void sample_hbp_handler(struct perf_event *bp,
struct perf_sample_data *data,
struct pt_regs *regs)
{
trace_printk(KBUILD_MODNAME ": hwbp hit: id [%llu]\n", bp->id );
//~ unregister_hw_breakpoint(bp);
}
// ----------------------
static int __init callmodule_init(void)
{
int ret = 0;
char userprog[] = "/path/to/wtest";
char *argv[] = {userprog, "2", NULL };
char *envp[] = {"HOME=/", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL };
struct task_struct *p;
struct task_struct *par;
struct task_struct *pc;
struct list_head *children_list_head;
struct list_head *cchildren_list_head;
char *state_str;
unsigned long offset, taddr;
int (*ptr_create_trace_probe)(int argc, char **argv);
struct trace_probe* (*ptr_find_probe_event)(const char *event, const char *group);
//int (*ptr_probe_event_enable)(struct ftrace_event_call *call); // not exported, copy
#if TRY_USE_KPROBES
char trcmd[256] = "";
struct trace_probe *tp;
#endif //TRY_USE_KPROBES
struct perf_event *sample_hbp, *sample_hbpb;
struct perf_event_attr attr, attrb;
printk(KBUILD_MODNAME ": > init %s\n", userprog);
ptr_create_trace_probe = (void *)0xc10d5120;
ptr_find_probe_event = (void *)0xc10d41e0;
print_symbol(KBUILD_MODNAME ": symbol # 0xc1065b60 is %s\n", 0xc1065b60); // shows wait_for_helper+0x0/0xb0
print_symbol(KBUILD_MODNAME ": symbol # 0xc1065ed0 is %s\n", 0xc1065ed0); // shows ____call_usermodehelper+0x0/0x90
print_symbol(KBUILD_MODNAME ": symbol # 0xc10d5120 is %s\n", 0xc10d5120); // shows create_trace_probe+0x0/0x590
ret = call_usermodehelperB(userprog, argv, envp, UMH_WAIT_EXEC);
if (ret != 0)
printk(KBUILD_MODNAME ": error in call to usermodehelper: %i\n", ret);
else
printk(KBUILD_MODNAME ": everything all right; pid %d (%d)\n", callmodule_pid, callmodule_infoB->pid);
tracing_on(); // earlier, so trace_printk of handler is caught!
// find the task:
rcu_read_lock();
p = pid_task(find_vpid(callmodule_pid), PIDTYPE_PID);
rcu_read_unlock();
if (p == NULL) {
printk(KBUILD_MODNAME ": p is NULL - exiting\n");
return 0;
}
state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
printk(KBUILD_MODNAME ": pid task a: %p c: %s p: [%d] s: %s\n",
p, p->comm, p->pid, state_str);
// find parent task:
par = p->parent;
if (par == NULL) {
printk(KBUILD_MODNAME ": par is NULL - exiting\n");
return 0;
}
state_str = (par->state==-1)?"unrunnable":((par->state==0)?"runnable":"stopped");
printk(KBUILD_MODNAME ": parent task a: %p c: %s p: [%d] s: %s\n",
par, par->comm, par->pid, state_str);
// iterate through parent's (and our task's) child processes:
rcu_read_lock(); // read_lock(&tasklist_lock);
list_for_each(children_list_head, &par->children){
p = list_entry(children_list_head, struct task_struct, sibling);
printk(KBUILD_MODNAME ": - %s [%d] \n", p->comm, p->pid);
if (p->pid == callmodule_pid) {
list_for_each(cchildren_list_head, &p->children){
pc = list_entry(cchildren_list_head, struct task_struct, sibling);
printk(KBUILD_MODNAME ": - - %s [%d] \n", pc->comm, pc->pid);
}
}
}
rcu_read_unlock(); //~ read_unlock(&tasklist_lock);
// NOTE: here p == callmodule_infoB->task !!
printk(KBUILD_MODNAME ": Trying to walk page table; addr task 0x%X ->mm ->start_code: 0x%08lX ->end_code: 0x%08lX \n", (unsigned int) callmodule_infoB->task, callmodule_infoB->task->mm->start_code, callmodule_infoB->task->mm->end_code);
walk_page_table(0x08048000, callmodule_infoB->task);
// 080483c0 is start of .text; 08048474 start of main; for objdump -S wtest
walk_page_table(0x080483c0, callmodule_infoB->task);
walk_page_table(0x08048474, callmodule_infoB->task);
if (callmodule_infoB->last_page_physaddr != 0ULL) {
printk(KBUILD_MODNAME ": physaddr ");
taddr = 0x080483c0; // .text
offset = taddr - callmodule_infoB->task->mm->start_code;
printk(": (0x%08lx ->) 0x%08llx ", taddr, callmodule_infoB->last_page_physaddr+offset);
taddr = 0x08048474; // main
offset = taddr - callmodule_infoB->task->mm->start_code;
printk(": (0x%08lx ->) 0x%08llx ", taddr, callmodule_infoB->last_page_physaddr+offset);
printk("\n");
#if TRY_USE_KPROBES // can't use this here (BUG: scheduling while atomic, if probe inserts)
//~ sprintf(trcmd, "p:myprobe 0x%08llx", callmodule_infoB->last_page_physaddr+offset);
// try symbol for c10bcf60 - tracing_on
sprintf(trcmd, "p:myprobe 0x%08llx", (unsigned long long)0xc10bcf60);
ret = traceprobe_command(trcmd, ptr_create_trace_probe); //create_trace_probe);
printk("%s -- ret: %d\n", trcmd, ret);
// try find probe and enable it (compiles, but untested):
tp = ptr_find_probe_event("myprobe", KPROBE_EVENT_SYSTEM);
if (tp != NULL) probe_event_enable(&tp->call);
#endif //TRY_USE_KPROBES
}
hw_breakpoint_init(&attr);
attr.bp_len = sizeof(long); //HW_BREAKPOINT_LEN_1;
attr.bp_type = HW_BREAKPOINT_X ;
attr.bp_addr = 0x08048474; // main
sample_hbp = register_user_hw_breakpoint(&attr, (perf_overflow_handler_t)sample_hbp_handler, p);
printk(KBUILD_MODNAME ": 0x08048474 id [%llu]\n", sample_hbp->id); //
if (IS_ERR((void __force *)sample_hbp)) {
int ret = PTR_ERR((void __force *)sample_hbp);
printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n", ret);
//~ return ret;
}
hw_breakpoint_init(&attrb);
attrb.bp_len = sizeof(long);
attrb.bp_type = HW_BREAKPOINT_X ;
attrb.bp_addr = 0x08048475; // first instruction after main
sample_hbpb = register_user_hw_breakpoint(&attrb, (perf_overflow_handler_t)sample_hbp_handler, p);
printk(KBUILD_MODNAME ": 0x08048475 id [%llu]\n", sample_hbpb->id); //45
if (IS_ERR((void __force *)sample_hbpb)) {
int ret = PTR_ERR((void __force *)sample_hbpb);
printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n", ret);
//~ return ret;
}
printk(KBUILD_MODNAME ": (( 0x08048000 is_vmalloc_addr %d virt_addr_valid %d ))\n", is_vmalloc_addr((void*)0x08048000), virt_addr_valid(0x08048000));
kill_pid(find_vpid(callmodule_pid), SIGCONT, 1); // resume/continue/restart task
state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
printk(KBUILD_MODNAME ": cont pid task a: %p c: %s p: [%d] s: %s\n",
p, p->comm, p->pid, state_str);
return 0;
}
static void __exit callmodule_exit(void)
{
tracing_off(); //corresponds to the user space /sys/kernel/debug/tracing/tracing_on file
printk(KBUILD_MODNAME ": < exit\n");
}
module_init(callmodule_init);
module_exit(callmodule_exit);
MODULE_LICENSE("GPL");

Resources