Unix and Linux /proc PID system

Unix and Linux /proc PID system - linux

For my intro to operating systems class we were introduced to the /proc directory and many of the features that can be used to access data stored in the process ID's that are available in /proc.
When I was trying out some commands learned (and a few I looked up) on the UNIX server hosted by my school I noticed that some sub directories that were present in a process, that I created, were a file type called "TeX font metric data" or a .tfm file. I figured that was the file type that was used when my professor showed us how to get data from the directories like status and map.
When I entered the command cat /proc/(PID)/status to look into the status file I got a random assortment of characters and white space. When I tried the same command on a process I created in my schools Linux server I was shown the information I expected to see in the status and map files.
My question is:
Why did the Unix server produce the random characters from my process's /proc/(PID)/status file while the Linux server gave me the data I would expect from the same command? Also Is there a way to access the Unix /proc data by accessing the /proc directory?

The Linux procfs you are familiar with, aka /proc/ is not a POSIX thing. It's OS-specific and multiple OSes just happen to implement similar things also called /proc.
Because no formal standard covers it, it's allowed to be / going to be different on any *nix-like system that implements it.
My guess with /proc/(PID)/status is that your UNIX is dumping the process status in a binary form instead of easy to read plain text.
See also:
Knowing the process status using procf/<pid>/status
If you can determine WHAT Unix you're on (odds are, Solaris since there's a free variant) you should be able to find a more specific answer.

Related

Is there a go command equivalent to ps?

I want a way to iterate through a list of PIDs scanning for processes with a particular command. For example the columns of ps ax are
PID TTY STAT TIME COMMAND
I was wondering if there was a way for me to determine the COMMAND column of a PID given its number.

The Go language and the ps command are unrelated.
The ps command is part of POSIX specification, and available on all Unix-like systems (including Linux, Solaris, *BSD, ....). Read ps(1). It is related to your operating system (and you probably don't have it on Windows). Read Operating Systems: Three Easy Pieces to learn more about OSes, and some Linux programming book like ALP to learn more about Linux programming. See also intro(2) & syscalls(2) (and find the Go equivalent of them).
I want a way to iterate through a list of PIDs scanning for processes with a particular command.
I was wondering if there was a way for me to determine the COMMAND column of a PID given its number.
This is unrelated to Go. You could use the /proc/ pseudo file system, see proc(5), which exists on all Linux systems, both with and without Go installed on them. /proc/ is internally used by ps(1), top(1), pmap(1), etc...
To iterate on the list of processes (on Linux), you need to read the /proc/ directory for numerical entries (e.g. /proc/1234/ exists if there is a process of pid 1234). To read a directory, use opendir(3), readdir(3), closedir(3), stat(2) in C and they all have their Go equivalent e.g. in ioutils package.
In particular, for process 1234, you could read /proc/1234/cmdline (which contains NUL byte separated strings). Of course you could read that file from some Go program. Try the od -cx /proc/self/cmdline command (using od(1)) to understand the format of that file ...
Pseudofiles in /proc/ are "pipe-like", have an apparent size (as given by stat(2) or by ls(1)...) of 0, and should be read sequentially, see this.

go-ps may be useful for you if you want to do it in a portable way.

Linux Console / standard out save by default

I have been attempting to find an answer for the following question everywhere:
Does the standard output / console on linux save the contents to a file by default?
I am not looking to save the contents or redirect the output (i already know about that), i am just wondering if it happens already by some default process included with linux and ran by root. Finding an answer has been difficult due to all the redirection questions.
Thanks.

Unix based systems won't save the console outputs anywhere by default.
As you may know, hardware terminals (tty) and pseudo-terminals (pty) are just ways for a process to cast bytes, but it seems there is no system process that catches and log these casts.
What is stored in /dev/pts files and can we open them?

How to check if a file is opened in Linux?

The thing is, I want to track if a user tries to open a file on a shared account. I'm looking for any record/technique that helps me know if the concerned file is opened, at run time.
I want to create a script which monitors if the file is open, and if it is, I want it to send an alert to a particular email address. The file I'm thinking of is a regular file.
I tried using lsof | grep filename for checking if a file is open in gedit, but the command doesn't return anything.
Actually, I'm trying this for a pet project, and thus the question.

The command lsof -t filename shows the IDs of all processes that have the particular file opened. lsof -t filename | wc -w gives you the number of processes currently accessing the file.

The fact that a file has been read into an editor like gedit does not mean that the file is still open. The editor most likely opens the file, reads its contents and then closes the file. After you have edited the file you have the choice to overwrite the existing file or save as another file.

You could (in addition of other answers) use the Linux-specific inotify(7) facilities.
I am understanding that you want to track one (or a few) particular given file, with a fixed file path (actually a given i-node). E.g. you would want to track when /var/run/foobar is accessed or modified, and do something when that happens
In particular, you might want to install and use incrond(8) and configure it thru incrontab(5)
If you want to run a script when some given file (on a native local, e.g. Ext4, BTRS, ... but not NFS file system) is accessed or modified, use inotify incrond is exactly done for that purpose.
PS. AFAIK, inotify don't work well for remote network files, e.g. NFS filesystems (in particular when another NFS client machine is modifying a file).
If the files you are fond of are somehow source files, you might be interested by revision control systems (like git) or builder systems (like GNU make); in a certain way these tools are related to file modification.
You could also have the particular file system sits in some FUSE filesystem, and write your own FUSE daemon.
If you can restrict and modify the programs accessing the file, you might want to use advisory locking, e.g. flock(2), lockf(3).
Perhaps the data sitting in the file should be in some database (e.g. sqlite or a real DBMS like PostGreSQL ou MongoDB). ACID properties are important ....
Notice that the filesystem and the mount options may matter a lot.
You might want to use the stat(1) command.
It is difficult to help more without understanding the real use case and the motivation. You should avoid some XY problem
Probably, the workflow is wrong (having a shared file between several users able to write it), and you should approach the overall issue in some other way. For a pet project I would at least recommend using some advisory lock, and access & modify the information only thru your own programs (perhaps setuid) using flock (this excludes ordinary editors like gedit or commands like cat ...). However, your implicit use case seems to be well suited for a DBMS approach (a database does not have to contain a lot of data, it might be tiny), or some index locked file like GDBM library is handling.
Remember that on POSIX systems and Linux, several processes can access (and even modify) the same file simultaneously (unless you use some locking or synchronization).
Reading the Advanced Linux Programming book (freely available) would give you a broader picture (but it does not mention inotify which appeared aften the book was written).

You can use ls -lrt, it displays the last RW operations in the shell. Then you can conclude whether the file is opened or not. Make sure that you are in the exact directory.

Can you load a tree structure in memory with Linux shell?

I want to create an application with a Linux shell script like this — but can it be done?
This application will create a tree containing data. The tree should be loaded in the memory. The tree (loaded in memory) could be readable from any other external Linux script.
Is it possible to do it with a Linux shell?
If yes, how can you do it?
And are there any simple examples for that?

There are a large number of misconceptions on display in the question.
Each process normally has its own memory; there's no trivial way to load 'the tree' into one process's memory and make it available to all other processes. You might devise a system of related programs that know about a shared memory segment (somehow — there's a problem right there) that contains the tree, but that's about it. They'd be special programs, not general shell scripts. That doesn't meet your 'any other external Linux script' requirement.
What you're seeking is simply not available in the Linux shell infrastructure. That answers your first question; the other two are moot given the answer to the first.

There is a related discussion here. They use shared memory device /dev/shm and, ostensibly, it works for multiple users. At least, it's worth a try:
http://www.linuxquestions.org/questions/linux-newbie-8/bash-is-it-possible-to-write-to-memory-rather-than-a-file-671891/
Edit: just tried it with two users on Ubuntu - looks like a normal directory and REALLY WORKS with the right chmod.
See also:
http://www.cyberciti.biz/tips/what-is-devshm-and-its-practical-usage.html

I don't think there is a way to do this as if you want to keep all the requirements of:
Building this as a shell script
In-memory
Usable across terminals / from external scripts
You would have to give up at least one requirement:
Give up shell script req - Build this in C to run as a Linux process. I only understand this up to the point to say that it would be non-trivial
Give up in-memory req - You can serialize the tree and keep the data in a temp file. This works as long as the file is small and performance bottleneck isn't around access to the tree. The good news is you can use the data across terminals / from external scripts
Give up usability from external scripts req - You can technically build a script and run it by sourcing it to add many (read: a mess of) variables representing the tree into your current shell session.
None of these alternatives are great, but if you had to go with one, number 2 is probably the least problematic.

Accessing /proc

I'm currently developing an application which needs a lot of system and process information, some of which is only available through /proc, and I have some general questions about accessing the structures.
The application will be run on Linux (kernel >= 2.6), not on any other Unix-flavored OS. It should have access to any data in /proc, I can't say what is necessary now as the specifications are not clear yet, but the whole /proc directory is relevant to the application.
First of all: Is there a good documentation which covers all the features added / removed from kernel version to kernel version? One thing I'm curious about in particular is the format of the individual files. Can I take that for granted? Does it change among kernel versions?
Hooking up the parsing process based on the kernel wouldn't be a problem at all, it's just that I couldn't find any good docs on what has changed from version to version which could help me catching parsing errors in beforehand.
In addition: Is there a definite list of features that can be activated / deactivated by kernel options (except of course the /proc-feature itself)? I'm looking for a list of files / directories that only exist with the appropriate options being set in the kernel.
As an example of what I'm thinking of, this is a link to the proc manpage (http://linux.die.net/man/5/proc) which includes a lot of good information, e.g. some options include the earliest kernel version they were available at, some include whether a module is necessary to be loaded. This does not describe the output format of all information though, which is something I need if I want to parse it (e.g. if it is consistent throughout all kernel versions or changed at some point).
The second thing I'm wondering about is what happens if the process queried dies while being queried. What is my time interval? For example if I'm going to fetch a list of processes reading all the structures, and parse them one after another, what happens if my process x dies before I get to read it? Even if I check if the directory exists, it could still be gone one application call later.
Last but not least: Is there any major distribution out there that is not mounting proc?
From what I understand, a lot of common tools are based on the /proc interface such as lsmod or free, so I'm guessing that I can expect /proc to exist almost always.

The /proc interfaces are pretty stable (unlike the /sys interfaces), even if nothing is guaranteed. Almost all changes are backwards compatible, at least if they've been around for a few versions. You should
stick to the documented interfaces to be safe. If a file exists, its format may be extended in later versions, but normally in a backwards compatible way, e.g. adding columns to a table. The parts that are most at risk of disappearing are parts concerning hardware susbystems such as ACPI or SCSI, which are migrating to /sys (with a long transition period when both exist).
Most of the information is architecture-independent, except for hardware information (e.g. /proc/cpuinfo has very different fields on different architectures).
The main documentation is Documentation/filesystems/proc.txt in the kernel source. Consider proc(5) to be the overview and proc.txt to be the fine details. The kernel documentation is often incomplete, so don't be surprised if you need to resort to reading the source sometimes.
Most optional parts of /proc are activated by default if the driver whose data it exposes is included in the kernel. The exceptions are mostly related to hardware features that rarely need to be accessed from outside the kernel; if you need to access these features, you're probably already expecting to need to dig deeper. Look through Kconfig files in the kernel source for detailed information.
Process data (or hardware data related to removable hardware or provided by unloadable modules) can disappear under your nose. Most files under /proc can be read atomically, with a single read call with a reasonably-sized buffer; if you perform multiple read calls in sequence, drivers are supposed to guarantee that you get well-formed data. There is no way to guarantee atomicity between reads of separate files; if you're reading information about a process, this process can die at any time, and in principle could even be replaced by another process with the same PID before you're finished.
As it says in the description of /proc, “everyone should say Y here”. All desktop/server Linux systems and most embedded Linux systems must have /proc; a lot of things, including ps and other process management commands, many filesystem and device-related tools, and module loading, require it. The only systems that might be able to dispense with /proc are very small single-purpose embedded systems that support a single hardware configuration and run a fixed set of programs. You can count on its being here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string