Handling long probe paths in systemtap userspace scripts (variables)? - linux

I am trying to instrument a user-space program using systemtap, Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP; stap --version says: "Systemtap translator/driver (version 2.3/0.158, Debian version 2.3-1ubuntu1.4 (trusty))"
So I first tried getting a list of all callable functions, with:
stap -L 'process("/home/username/pathone/subpathtwo/subpaththree/subpathfour/subpathFive/subpathsix/subpathSeven/subpatheight/myprogramab").function("*").call' 2>&1 | tee /tmp/stap
This worked - however, note that the absolute path to my program is massive. So from the /tmp/stap file, I read some probes of interest, however, they are even longer, such as:
process("/home/username/pathone/subpathtwo/subpaththree/subpathfour/subpathFive/subpathsix/subpathSeven/subpatheight/myprogramab").function("DoSomething#/home/username/pathone/subpathtwo/subpaththree/subpathfour/subpathFive/src/BasicTestCodeFileInterface.cpp:201").call
... and as this is very difficult to read/unreadable for me, I wanted to do something to split this line into more readable pieces. The first thing I thought of was using variables, so I tried this test.stp script:
#!/usr/bin/env stap
global exepath = "/home/username/pathone/subpathtwo/subpaththree/subpathfour/subpathFive/subpathsix/subpathSeven/subpatheight/myprogramab"
global srcpath = "/home/username/pathone/subpathtwo/subpaththree/subpathfour/subpathFive/src"
probe begin {
printf("%s\n", exepath)
exit() # must have; else probe end runs only upon Ctrl-C
}
probe end {
newstr = "DoSomething#" . srcpath # concatenate strings
printf("%s\n", newstr)
}
This works, I can run sudo stap /path/to/test.stp, and I get the two strings printed.
However, when I try to use these strings in probes, they fail:
Writing a probe like this:
probe process(exepath).function("DoSomething#".srcpath."/BasicTestCodeFileInterface.cpp:201").call { ...
... fails with: parse error: expected literal string or number; saw: identifier 'exepath' ....
Trying to put the "process" part in a variable:
global tproc = process("/home/username/pathone/subpathtwo/subpaththree/subpathfour/subpathFive/subpathsix/subpathSeven/subpatheight/myprogramab")
... fails with parse error: expected literal string or number; saw: identifier 'process' ....
So, what options do I have, to somehow shorten the probe lines in a script?

Your best bet is probably to use macros:
#define part1 %( "/path/part/1" %)
#define part2 %( "/part/2" %)
probe process(#part1 #part2).function("...") { }
Note no explicit concatenation operator between the (macros that expand to) string literals. The parser will auto-concatenate them, just as in C.

Related

Get a symbol's value by its name in a sub

I'm making a package, where I have to get a symbol's value by its name in a sub, while the symbol is defined outside the sub.
Here is the simplified code, it works as expected:
#! /usr/bin/env perl6
sub dump_value($symbol) {
say ::("$symbol")
}
# usage:
my $x = 10;
dump_value('$x');
# expected output: 10
# actual output: 10
Then I put the 'dump_value' in a standalone file as below:
# somelib.pm6
unit module somelib;
sub dump_value($symbol) is export {
say ::("$symbol")
}
# client.pl6
#! /usr/bin/env perl6
use lib ".";
use somelib;
my $x = 10;
dump_value('$x');
The compiler complained:
No such symbol '$x'
in sub dump_value at xxx/somelib.pm6 (somelib) line 3
in block <unit> at ./client.pl6 line 8
Following are some experiments. None of them succeeded.
say ::("MY::$symbol")
say ::("OUR::$symbol")
say ::("OUTER::$symbol")
say ::("CLIENT::$symbol")
...
So how to fix the code?
UPDATE:
Thank you! CALLERS::($symbol) solved my original problem. But in a bit more complex situation, the complier complained again:
# somelib.pm6
unit module somelib;
sub dump_value(#symbols) is export {
# output: 6
say CALLERS::('$x');
# error: No such symbol 'CALLERS::$x'
say #symbols.map({ CALLERS::($^id) } )
}
# client.pl6
#! /usr/bin/env perl6
use lib ".";
use somelib;
my $x = 6;
my $y = 8;
dump_value(<$x $y>);
UPDATE AGAIN:
use OUTER::CALLERS::($^id).
UPDATE AGAIN AND AGAIN:
After I put the 'dump_value' in another sub, it didn't work any more!
# somelib.pm6
unit module somelib;
sub dump_value(#symbols) is export {
say #symbols.map({ OUTER::CALLERS::($^id) } )
}
sub wrapped_dump_value(#symbols) is export {
dump_value(#symbols)
}
#! /usr/bin/env perl6
use lib ".";
use somelib;
my $x = 6;
my $y = 8;
# ouput: (6 8)
dump_value(<$x $y>);
# error: No such symbol 'OUTER::CALLERS::$x'
wrapped_dump_value(<$x $y>);
According to the documentation:
An initial :: doesn't imply global. Here as part of the interpolation
syntax it doesn't even imply package. After the interpolation of the
::() component, the indirect name is looked up exactly as if it had
been there in the original source code, with priority given first to
leading pseudo-package names, then to names in the lexical scope
(searching scopes outwards, ending at CORE).
So when you write say ::("$symbol") in dump_value() in the somelib package, it will first lookup $symbol in the current scope, which has value '$x' then try to look up $x (also in the current scope), but the variable $x is defined in the caller's lexical scope, so you get the No such symbol '$x' error.
You can refer to the caller's lexical symbol given by the value of $symbol using either:
CALLER::MY::($symbol); # lexical symbols from the immediate caller's lexical scope
or
CALLERS::($symbol); # Dynamic symbols in any caller's lexical scope
see the package documentation page.
Couple of things:
use lib ".";
use somelib;
our $x = 10; # You need to export the value into the global scope
dump_value('$x');
Then, use the global scope:
unit module somelib;
sub dump_value($symbol) is export {
say GLOBAL::("$symbol")
}

Segmentation fault while trying to print parts of a pointer struct

I'm writing a program that must take user input to assign values to parts of a structure. I need to create a pointer to the structure that I will pass through as a one and only parameter for a function that will print each part of the structure individually. I also must malloc memory for the structure. As it is now, the program compiles and runs through main and asks the user for inputs. A segmentation fault occurs after the last user input is collected and when I'm assuming the call to the printContents function is run. Any help would be appreciated!
#include <stdio.h>
#include <stdlib.h>
struct info
{
char name[100], type;
int size;
long int stamp;
};
void printContents(struct info *iptr);
int main(void)
{
struct info *ptr=malloc(sizeof(struct info));
printf("Enter the type: \n");
scanf("%c", &(*ptr).type);
printf("Enter the filename: \n");
scanf("%s", (*ptr).name);
printf("Enter the access time: \n");
scanf("%d", &(*ptr).stamp);
printf("Enter the size: \n");
scanf("%d", &(*ptr).size);
printf("%c", (*ptr).type);
printContents(ptr);
}
void printContents(struct info *iptr)
{
printf("Filename %s Size %d Type[%s] Accessed # %d \n", (*iptr).name, (*iptr).size, (*iptr).type, (*iptr).stamp);
}
Check the operator precedence. Is this &(*ptr).type the thing you're trying to do? Maybe &((*ptr).type) ?
ptr->member is like access to structure variable right? Also same for scanf() usr &ptr->member to get value. For char input use only ptr->charmember .
First let's do it the hard way. We'll assume that the code is already written, the compiler tells us nothing useful, and we don't have a debugger. First we put in some diagnostic output statements, and we discover that the crash happens in printContents:
printf("testing four\n"); /* we see this, so the program gets this far */
printf("Filename %s Size %d Type[%s] Accessed # %d \n", (*iptr).name, (*iptr).size, (*iptr).type, (*iptr).stamp);
printf("testing five\n"); /* the program crashes before this */
If we still can't see the bug, we narrow the problem down by preparing a minimal compete example. (This is a very valuable skill.) We compile and run the code over and over, commenting things out. When we comment something out and the code still segfaults, we remove it entirely; but if commenting it out makes the problem go away, we put it back in. Eventually we get down to a minimal case:
#include <stdio.h>
int main(void)
{
char type;
type = 'a';
printf("Type[%s]\n", type);
}
Now it should be obvious: when we printf("%s", x) something, printf expects x to be a string. That is, x should be a pointer to (i.e. the address of) the first element of a character array which ends with a null character. Instead we've given it a character (in this case 'a'), which it interprets as a number (in this case 97), and it tries to go to that address in memory and start reading; we're lucky to get nothing worse than a segfault. The fix is easy: decide whether type should be a char or a char[], if it's char then change the printf statement to "%c", if it's char[] then change its declaration.
Now an easy way. If we're using a good compiler like gcc, it will warn us that we're doing something fishy:
gcc foo.c -o foo
foo.c:35: warning: format ‘%s’ expects type ‘char *’, but argument 4 has type ‘int’
In future, there's a way you can save yourself all this trouble. Instead of writing a lot of code, getting a mysterious bug and backtracking, you can write in small increments. If you had added one term to that printf statement at a time, you would have seen exactly when the bug appeared, and which term was to blame.
Remember: start small and simple, add complexity a little at a time, test at every step, and never add to code that doesn't work.

Fixing 'no symbolic type information' from dtrace in Linux?

Just documenting this: (self-answer to follow)
I'm aware that Sun's dtrace is not packaged for Ubuntu due to licensing issues; so I downloaded it and built it from source on Ubuntu - but I'm having an issue pretty much like the one in Simple dtraces not working · Issue #17 · dtrace4linux/linux · GitHub; namely loading of the driver seems fine:
dtrace-20130712$ sudo make load
tools/load.pl
23:20:31 Syncing...
23:20:31 Loading: build-2.6.38-16-generic/driver/dtracedrv.ko
23:20:34 Preparing symbols...
23:20:34 Probes available: 364377
23:20:44 Time: 13s
... however, if I try to run a simple script, it fails:
$ sudo ./build/dtrace -n 'BEGIN { printf("Hello, world"); exit(0); }'
dtrace: invalid probe specifier BEGIN { printf("Hello, world"); exit(0); }: "/path/to/src/dtrace-20130712/etc/sched.d", line 60: no symbolic type information is available for kernel`dtrace_cpu_id: Invalid argument
As per the issue link above:
(ctf requires a private and working libdwarf lib - most older releases have broken versions).
... I then built libdwarf from source, and then dtrace based on it (not trivial, requires manually finding the right placement of symlinks); and I still get the same failure.
Is it possible to fix this?
Well, after a trip to gdb, I figured that the problem occurs in dtrace's function dt_module_getctf (called via dtrace_symbol_type and, I think, dt_module_lookup_by_name). In it, I noticed that most calls propagate the attribute/variable dm_name = "linux"; but when the failure occurs, I'd get dm_name = "kernel"!
Note that original line 60 from sched.d is:
cpu_id = `dtrace_cpu_id; /* C->cpu_id; */
Then I found thr3ads.net - dtrace discuss - accessing symbols without type info [Nov 2006]; where this error message is mentioned:
dtrace: invalid probe specifier fbt::calcloadavg:entry {
printf("CMS_USER: %d, CMS_SYSTEM: %d, cpu_waitrq: %d\n",
`cpu0.cpu_acct[0], `cpu0.cpu_acct[1], `cpu0.cpu_waitrq);}: in action
list: no symbolic type information is available for unix`cpu0: No type
information available for symbol
So:
on that system, the request `cpu0.cpu_acct[0] got resolved to unix`cpu0;
and on my system, the request `dtrace_cpu_id got resolved to kernel`dtrace_cpu_id.
And since "The backtick operator is used to read the
value of kernel variables, which will be specific to the running kernel." (howto measure CPU load - DTrace General Discussion - ArchiveOrange), I thought maybe explicitly "casting" this "backtick variable" to linux would help.
And indeed it does - only a small section of sched.d needs to be changed to this:
translator cpuinfo_t < dtrace_cpu_t *C > {
cpu_id = linux`dtrace_cpu_id; /* C->cpu_id; */
cpu_pset = -1;
cpu_chip = linux`dtrace_cpu_id; /* C->cpu_id; */
cpu_lgrp = 0; /* XXX */
/* cpu_info = *((_processor_info_t *)`dtrace_zero); /* ` */ /* XXX */
};
inline cpuinfo_t *curcpu = xlate <cpuinfo_t *> (&linux`dtrace_curcpu);
... and suddenly - it starts working!:
dtrace-20130712$ sudo ./build/dtrace -n 'BEGIN { printf("Hello, world"); exit(0); }'
dtrace: description 'BEGIN ' matched 1 probe
CPU ID FUNCTION:NAME
1 1 :BEGIN Hello, world
PS:
Protip 1: NEVER do dtrace -n '::: { printf("Hello"); }' - this means "do a printf on each and every kernel event", and it will completely freeze the kernel; not even CTRL-Alt-Del will work!
Protip 2: If you want to use DTRACE_DEBUG as in Debugging DTrace, use sudo -E:
dtrace-20130712$ DTRACE_DEBUG=1 sudo -E ./build/dtrace -n 'BEGIN { printf("Hello, world"); exit(0); }'
libdtrace DEBUG: reading kernel .ctf: /path/to/src/dtrace-20130712/build-2.6.38-16-generic/linux-2.6.38-16-generic.ctf
libdtrace DEBUG: opened 32-bit /proc/kallsyms (syms=75761)
...

How to chown filesystem element by device:inode pair in perl, or a better solution

I have a relatively complex perl script which is walking over a filesystem and storing a list of updated ownership, then going over that list and applying the changes. I'm doing this in order to update changed UIDs. Because I have several situations where I'm swapping user a's and user b's UIDs, I can't just say "everything which is now 1 should be 2 and everything which is 2 should be 1", as it's also possible that this script could be interrupted, and the system would be left in a completely busted, pretty much unrecoverable state outside of "restore from backup and start over". Which would pretty much suck.
To avoid that problem, I do the two-pas approach above, creating a structure like $changes->{path}->\%c, where c has attributes line newuid, olduid, newgid, and olduid. I then freeze the hash, and once it's written to disk, I read the hash back in and start making changes. This way, if I'm interrupted, I can check to see if the frozen hash exists or not, and just start applying changes again if it does.
The drawback is that sometimes a changing user has literally millions of files, often with very long paths. This means I'm storing a lot of really long strings as hash keys, and I'm running out of memory sometimes. So, I've come up with two options. The one relevant to this question is to instead store the elements as device:inode pairs. That'd be way more space-efficient, and would uniquely identify filesystem elements. The drawback is that I haven't figured out a particularly efficient way to either get a device-relative path from the inode, or to just apply the stat() changes I want to the inode. Yes, I could do another find, and for each file do a lookup against my stored list of devices and inodes to see if a change is needed or not. But if there's a perl-accessible system call - which is portable across HP-UX, AIX, and Linux - from which I can directly just say "on this device make these changes to this inode", it'd be notably better from a performance perspective.
I'm running this across several thousand systems, some of which have filesystems in the petabyte range, having trillions of files. So, while performance may not make much of a differece on my home PC, it's actually somewhat significant in this scenario. :) That performance need, BTW, is why I really don't want to do the other option - which would be to bypass the memory problem by just tie-ing a hash to a disk-based file. And is why I'd rather do more work to avoid having to traverse the whole filesystem a second time.
Alternate suggestions which could reduce memory consumption are, of course, also welcome. :) My requirement is just that I need to record both the old and new UID/GID values, so I can back the changes out / validate changes / update files restored from backups taken prior to the cleanup date. I've considered making /path/to/file look like ${changes}->{root}->{path}->{to}->{file}, but that's a lot more work to traverse, and I dont know that it'll really save me enough memory space to resolve my problem. Collapsing the whole thing to ->{device}->{inode} makes it basically just the size of two integers rather than N characters, which is substatial for any path longer than, say, 2 chars. :)
Simplified idea
When I mentioned streaming, I didn't mean uncontrolled. A database journal (e.g.) is also written in streaming mode, for comparison.
Also note, that the statement that you 'cannot afford to sort even a single subdirectory' directly contradicts the use of a Perl hash to store the same info (I won't blame you if you don't have the CS background).
So here is a really simple illustration of what you could do. Note that every step on the way is streaming, repeatable and logged.
# export SOME_FIND_OPTIONS=...?
find $SOME_FIND_OPTIONS -print0 | ./generate_script.pl > chownscript.sh
# and then
sh -e ./chownscript.sh
An example of generate_script.pl (obviously, adapt it to your needs:)
#!/usr/bin/perl
use strict;
use warnings;
$/="\0";
while (<>)
{
my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat;
# demo purpose, silly translation:
my ($newuid, $newgid) = ($uid+1000, $gid+1000);
print "./chmod.pl $uid:$gid $newuid:$newgid '$_'\n"
}
You could have a system dependent implementation of chmod.pl (this helps to reduce complexity and therefore: risk):
#!/usr/bin/perl
use strict;
use warnings;
my $oldown = shift;
my $newown = shift;
my $path = shift;
($oldown and $newown and $path) or die "usage: $0 <uid:gid> <newuid:newgid> <path>";
my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat $path;
die "file not found: $path" unless $ino;
die "precondition failed" unless ($oldown eq "$uid:$gid");
($uid, $gid) = split /:/, $newown;
chown $uid, $gid, $path or die "unable to chown: $path"
This will allow you to restart when things bork midway, it will even allow you to hand-pick exceptions if necessary. You can save the scripts so you'll have accountability. I've done a reasonable stab at making the scripts operate safely. However, this is obviously just a starting point. Most importantly, I do not deal with filesystem crossings, symbolic links, sockets, device nodes where you might want to pay attention to them.
original response follows:
Ideas
Yeah, if performance is the issue, do it in C
Do not do persistent logging for the whole filesystem (by the way, why the need to keep them in a single hash? streaming output is your friend there)
Instead, log completed runs per directory. You could easily break the mapping up in steps:
user A: 1 -> 99
user B: 2 -> 1
user A: 99 -> 2
Ownify - what I use (code)
As long as you can reserve a range for temporary uids/guids like the 99 there won't be any risk on having to restart (not any more than doing this transnumeration on a live filesystem, anyway).
You could start from this nice tidbit of C code (which admittedly is not very highly optmized):
// vim: se ts=4 sw=4 et ar aw
//
// make: g++ -D_FILE_OFFSET_BITS=64 ownify.cpp -o ownify
//
// Ownify: ownify -h
//
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
/* old habits die hard. can't stick to pure C ... */
#include <string>
#include <iostream>
#define do_stat(a,b) lstat(a,b)
#define do_chown(a,b,c) lchown(a,b,c)
//////////////////////////////////////////////////////////
// logic declarations
//
void ownify(struct stat& file)
{
// if (S_ISLNK(file.st_mode))
// return;
switch (file.st_uid)
{
#if defined(PASS1)
case 1: file.st_uid = 99; break;
case 99: fputs(err, "Unexpected existing owned file!"); exit(255);
#elif defined(PASS2)
case 2: file.st_uid = 1; break;
#elif defined(PASS3)
case 99: file.st_uid = 1; break;
#endif
}
switch (file.st_gid) // optionally map groups as well
{
#if defined(PASS1)
#elif defined(PASS2)
#elif defined(PASS3)
#endif
}
}
/////////////////////////////////////////////////////////
// driver
//
static unsigned int changed = 0, skipped = 0, failed = 0;
static bool dryrun = false;
void process(const char* const fname)
{
struct stat s;
if (0==do_stat(fname, &s))
{
struct stat n = s;
ownify(n);
if ((n.st_uid!=s.st_uid) || (n.st_gid!=s.st_gid))
{
if (dryrun || 0==do_chown(fname, n.st_uid, n.st_gid))
printf("%u\tchanging owner %i:%i '%s'\t(was %i:%i)\n",
++changed,
n.st_uid, n.st_gid,
fname,
s.st_uid, s.st_gid);
else
{
failed++;
int e = errno;
fprintf(stderr, "'%s': cannot change owner %i:%i (%s)\n",
fname,
n.st_uid, n.st_gid,
strerror(e));
}
}
else
skipped++;
} else
{
int e = errno;
fprintf(stderr, "'%s': cannot stat (%s)\n", fname, strerror(e));
failed++;
}
}
int main(int argc, char* argv[])
{
switch(argc)
{
case 0: //huh?
case 1: break;
case 2:
dryrun = 0==strcmp(argv[1],"-n") ||
0==strcmp(argv[1],"--dry-run");
if (dryrun)
break;
default:
std::cerr << "Illegal arguments" << std::endl;
std::cout <<
argv[0] << " (Ownify): efficient bulk adjust of owner user:group for many files\n\n"
"Goal: be flexible and a tiny bit fast\n\n"
"Synopsis:\n"
" find / -print0 | ./ownify -n 2>&1 | tee ownify.log\n\n"
"Input:\n"
" reads a null-delimited stream of filespecifications from the\n"
" standard input; links are _not_ dereferenced.\n\n"
"Options:\n"
" -n/--dry-run - test run (no changes)\n\n"
"Exit code:\n"
" number of failed items" << std::endl;
return 255;
}
std::string fname("/dev/null");
while (std::getline(std::cin, fname, '\0'))
process(fname.c_str());
fprintf(stderr, "%s: completed with %u skipped, %u changed and %u failed%s\n",
argv[0], skipped, changed, failed, dryrun?" (DRYRUN)":"");
return failed;
}
Note that this comes with quite a few safety measures
paranoia check in first pass (check no fiels with reserved uid exists)
ability to change behaviour of do_stat and do_chown with regards to links
a --dry-run option (to observe what would be done) -n
The program will gladly tell you how to use it with ownify -h:
./ownify (Ownify): efficient bulk adjust of owner user:group for many files
Goal: be flexible and a tiny bit fast
Synopsis:
find / -print0 | ./ownify -n 2>&1 | tee ownify.log
Input:
reads a null-delimited stream of file specifications from the
standard input;
Options:
-n/--dry-run - test run (no changes)
Exit code:
number of failed items
A few possible solutions that come to mind:
1) Do not store a hash in the file, just a sorted list in any format that can be reasonably parsed serially. By sorting the list by filename, you should get the equivalent of running find again, without actually doing it:
# UID, GID, MODE, Filename
0,0,600,/a/b/c/d/e
1,1,777,/a/b/c/f/g
...
Since the list is sorted by filename, the contents of each directory should be bunched together in the file. You do not have to use Perl to sort the file, sort will do nicely in most cases.
You can then just read in the file line-by-line - or with any delimiter that will not mangle your filenames - and just perform any changes. Assuming that you can tell which changes are needed for each file at once, it does not sound as if you actually need the random-access capabilities of a hash, so this should do.
So the process would happen in three steps:
Create the change file
Sort the change file
Perform changes per the change file
2) If you cannot tell which changes each file needs at once, you could have multiple lines for each file, each detailing a part of the changes. Each line would be produced the moment you determine a needed change at the first step. You can then merge them after sorting.
3) If you do need random access capabilities, consider using a proper embedded database, such as BerkeleyDB or SQLite. There are Perl modules for most embedded databases around. This will not be quite as fast, though.

linux get process name from pid within kernel

hi
i have used sys_getpid() from within kernel to get process id
how can I find out process name from kernel struct? does it exist in kernel??
thanks very much
struct task_struct contains a member called comm, it contains executable name excluding path.
Get current macro from this file will get you the name of the program that launched the current process (as in insmod / modprobe).
Using above info you can use get the name info.
Not sure, but find_task_by_pid_ns might be useful.
My kernel module loads with "modprobe -v my_module --allow-unsupported -o some-data" and I extract the "some-data" parameter. The following code gave me the entire command line, and here is how I parsed out the parameter of interest:
struct mm_struct *mm;
unsigned char x, cmdlen;
mm = get_task_mm(current);
down_read(&mm->mmap_sem);
cmdlen = mm->arg_end - mm->arg_start;
for(x=0; x<cmdlen; x++) {
if(*(unsigned char *)(mm->arg_start + x) == '-' && *(unsigned char *)(mm->arg_start + (x+1)) == 'o') {
break;
}
}
up_read(&mm->mmap_sem);
if(x == cmdlen) {
printk(KERN_ERR "inject: ERROR - no target specified\n");
return -EINVAL;
}
strcpy(target,(unsigned char *)(mm->arg_start + (x+3)));
"target" holds the string after the -o parameter. You can compress this somewhat - the caller (in this case, modprobe) will be the first string in mm->arg_start - to suit your needs.
you can look at the special files in /proc/<pid>/
For example, /proc/<pid>/exe is a symlink pointing to the actual binary.
/proc/<pid>/cmdline is a null-delimited list of the command line, so the first word is the process name.

Resources