Split output of command by columns using Bash? - linux

I want to do this:
run a command
capture the output
select a line
select a column of that line
Just as an example, let's say I want to get the command name from a $PID (please note this is just an example, I'm not suggesting this is the easiest way to get a command name from a process id - my real problem is with another command whose output format I can't control).
If I run ps I get:
PID TTY TIME CMD
11383 pts/1 00:00:00 bash
11771 pts/1 00:00:00 ps
Now I do ps | egrep 11383 and get
11383 pts/1 00:00:00 bash
Next step: ps | egrep 11383 | cut -d" " -f 4. Output is:
<absolutely nothing/>
The problem is that cut cuts the output by single spaces, and as ps adds some spaces between the 2nd and 3rd columns to keep some resemblance of a table, cut picks an empty string. Of course, I could use cut to select the 7th and not the 4th field, but how can I know, specially when the output is variable and unknown on beforehand.

One easy way is to add a pass of tr to squeeze any repeated field separators out:
$ ps | egrep 11383 | tr -s ' ' | cut -d ' ' -f 4

I think the simplest way is to use awk. Example:
$ echo "11383 pts/1 00:00:00 bash" | awk '{ print $4; }'
bash

Please note that the tr -s ' ' option will not remove any single leading spaces. If your column is right-aligned (as with ps pid)...
$ ps h -o pid,user -C ssh,sshd | tr -s " "
1543 root
19645 root
19731 root
Then cutting will result in a blank line for some of those fields if it is the first column:
$ <previous command> | cut -d ' ' -f1
19645
19731
Unless you precede it with a space, obviously
$ <command> | sed -e "s/.*/ &/" | tr -s " "
Now, for this particular case of pid numbers (not names), there is a function called pgrep:
$ pgrep ssh
Shell functions
However, in general it is actually still possible to use shell functions in a concise manner, because there is a neat thing about the read command:
$ <command> | while read a b; do echo $a; done
The first parameter to read, a, selects the first column, and if there is more, everything else will be put in b. As a result, you never need more variables than the number of your column +1.
So,
while read a b c d; do echo $c; done
will then output the 3rd column. As indicated in my comment...
A piped read will be executed in an environment that does not pass variables to the calling script.
out=$(ps whatever | { read a b c d; echo $c; })
arr=($(ps whatever | { read a b c d; echo $c $b; }))
echo ${arr[1]} # will output 'b'`
The Array Solution
So we then end up with the answer by #frayser which is to use the shell variable IFS which defaults to a space, to split the string into an array. It only works in Bash though. Dash and Ash do not support it. I have had a really hard time splitting a string into components in a Busybox thing. It is easy enough to get a single component (e.g. using awk) and then to repeat that for every parameter you need. But then you end up repeatedly calling awk on the same line, or repeatedly using a read block with echo on the same line. Which is not efficient or pretty. So you end up splitting using ${name%% *} and so on. Makes you yearn for some Python skills because in fact shell scripting is not a lot of fun anymore if half or more of the features you are accustomed to, are gone. But you can assume that even python would not be installed on such a system, and it wasn't ;-).

try
ps |&
while read -p first second third fourth etc ; do
if [[ $first == '11383' ]]
then
echo got: $fourth
fi
done

Your command
ps | egrep 11383 | cut -d" " -f 4
misses a tr -s to squeeze spaces, as unwind explains in his answer.
However, you maybe want to use awk, since it handles all of these actions in a single command:
ps | awk '/11383/ {print $4}'
This prints the 4th column in those lines containing 11383. If you want this to match 11383 if it appears in the beginning of the line, then you can say ps | awk '/^11383/ {print $4}'.

Using array variables
set $(ps | egrep "^11383 "); echo $4
or
A=( $(ps | egrep "^11383 ") ) ; echo ${A[3]}

Similar to brianegge's awk solution, here is the Perl equivalent:
ps | egrep 11383 | perl -lane 'print $F[3]'
-a enables autosplit mode, which populates the #F array with the column data.
Use -F, if your data is comma-delimited, rather than space-delimited.
Field 3 is printed since Perl starts counting from 0 rather than 1

Getting the correct line (example for line no. 6) is done with head and tail and the correct word (word no. 4) can be captured with awk:
command|head -n 6|tail -n 1|awk '{print $4}'

Instead of doing all these greps and stuff, I'd advise you to use ps capabilities of changing output format.
ps -o cmd= -p 12345
You get the cmmand line of a process with the pid specified and nothing else.
This is POSIX-conformant and may be thus considered portable.

Bash's set will parse all output into position parameters.
For instance, with set $(free -h) command, echo $7 will show "Mem:"

Related

Set part of grep to variable

mysqladmin proc status | grep "Threads"
Output:
Uptime: 2304 Threads: 14 Questions: 2652099 Slow queries: 0 Opens: 48791 Flush tables: 3 Open tables: 4000 Queries per second avg: 1151.08
I would like to set it so $mysqlthread would output 14 after running echo $mysqlthread
Probably the easiest way is with Perl instead of grep.
mysqladmin proc status | perl -nle'/Threads: (\d+)/ && print $1'
perl -n means "go through each line of input".
perl -l means "print a \n at the end of every print"
perl -e means "here is my program"
/Threads: (\d+)/ means "match Threads: followed by one or more digits. And print $1 means "print the digits I found as denoted by the parentheses around \d+.
Using grep
$ mysqlthread=$(mysqladmin proc status | grep -Po 'Threads: \K\d+')
$ echo "$mysqlthread"
14
There are many ways to solve this. This is one:
mysqladmin proc status | grep "Threads" | tr -s ' ' | cut -d' ' -f4
The tr command with flag -s is used to translate all multiple consecutive spaces into a single space. Then, cut command return the 4th field using a single space as delimiter.
The advantage of piping commands is that one can make this process interactively. And whenever you aren't sure which flag to use, the manual pages are there to help: man grep, man tr, man cut, etc.
Add awk to split the output,
mysqlthread=$(mysqladmin proc status | grep "Threads" | awk '{print $4}')

Shell script to get count of a variable from a single line output

How can I get the count of the # character from the following output. I had used tr command and extracted? I am curious to know what is the best way to do it? I mean other ways of doing the same thing.
{running_device,[test#01,test#02]},
My solution was:
echo '{running_device,[test#01,test#02]},' | tr ',' '\n' | grep '#' | wc -l
I think it is simpler to use:
echo '{running_device,[test#01,test#02]},' | tr -cd # | wc -c
This yields 2 for me (tested on Mac OS X 10.7.5). The -c option to tr means 'complement' (of the set of specified characters) and -d means 'delete', so that deletes every non-# character, and wc counts what's provided (no newline, so the line count is 0, but the character count is 2).
Nothing wrong with your approach. Here are a couple of other approaches:
echo $(echo {running_device,[test#01,test#02]}, |awk -F"#" '{print NF - 1}')
or
echo $((`echo {running_device,[test#01,test#02]} | sed 's+[^#]++g' | wc -c` - 1 ))
The only concern I would have is if you are running this command in a loop (e.g. once for every line in a large file). If that is the case, then execution time could be an issue as stringing together shell utilities incurs the overhead of launching processes which can be sloooow. If this is the case, then I would suggest writing a pure awk version to process the entire file.
Use GNU Grep to Avoid Character Translation
Here's another way to do this that I personally find more intuitive: extract just the matching characters with grep, then count grep's output lines. For example:
echo '{running_device,[test#01,test#02]},' |
fgrep --fixed-strings --only-matching # |
wc -l
yields 2 as the result.

How to determine the exact character of a whitespace in linux?

I'm trying to get the PID field of ps -aux. I know I can achieve this using ps -aux | awk '{print $2}', but as practice wanted to see if I can do the same using the cut command. My idea is to specify a delimiter and chose the second field like this:
ps -aux | cut -d[delimiter] -f2
Using space as a delimiter (' ') did not work, neither did tab (\t).
In general, how do I find out the exact character of a white-space in linux?
To identify otherwise unprintable or similar-looking characters (like whitespace), pipe output to a tool like xxd or od -c. For example, this outputs both the hex values of each character as well as the text for easy lookup:
ps -aux | xxd -g 1 # -g 1 outputs each character individually
However I think your issue is that ps -aux uses multiple spaces between the fields; cut does not handle multiple consecutive delimiters, so it prints whatever's between the first and second space, i.e. nothing.
If you really want to use cut you have to remove both leading spaces and duplicate spaces:
ps -aux | sed 's/^ *//;s/ */ /g' | cut -d' ' -f2
cut doesn't support multi-chars as delimit.
There are multiple whitespace between fields, if you really want to use cut:
ps aux | sed 's/ */ /g' | cut -d ' ' -f 2
To get the PID of a ps command you can do this:
ps -aux | cut -c10-15
For information: the u that you use in ps aux means, according to man ps:
u Display user-oriented format
So you're explicitly asking for a human readable output and then you parse it with some tool? That's not very appropriate (to say the least). If you need to format the output of ps, please use the -o (or --format) option, if your version of ps accepts it. Hence:
ps ax -o pid
will be much better.

selecting only required lines from unix shell prompt

Lets say I am running
$: ps au
in a shell prompt and want to select 2nd field of 5th entry in that, no matter which process it is. How do I do that ?
With awk.
awk 'NR==6 { print $2 }'
The 6th record because you need to skip the header.
If you don't want to use awk or the equivalent perl or ruby commands, you can also use more low-level tools:
ps au | head -6 | tail -1 | cut -d ' ' -f 2
In "ps au" output, second field is the process ID; you can extract it directly by telling to ps what you need:
ps a -o pid=
Then you just need to output the fifth line:
ps a -o pid= | sed '5!d'

How to pass AWK output into variable?

I have a small bash script that greps/awk paragraph by using a keyword.
But after adding in the extra codes : set var = "(......)" it only prints a blank line and not the paragraph.
So I would like to ask if anyone knows how to properly pass the awk output into a variable for outputting?
My codes:
#!/bin/sh
set var = "(awk 'BEGIN{RS=ORS="\n\n";FS=OFS="\n"}/FileHeader/' /root/Desktop
/logs/Default.log)"
echo $var;
Thanks!
Use command substitution to capture the output of a process.
#!/bin/sh
VAR="$(awk 'BEGIN{RS=ORS="\n\n";FS=OFS="\n"}/FileHeader/' /root/Desktop/logs/Default.log)"
echo "$VAR"
some general advice with regards to shell scripting:
(almost) always quote every variable reference.
never put spaces around the equals sign in variable assignment.
You need to use "command substitution". Place the command inside either backticks, `COMMAND` or, in a pair of parentheses preceded by a dollar sign, $(COMMAND).
To set a variable you don't use set and you can't have spaces before and after the =.
Try this:
var=$(awk 'BEGIN{RS=ORS="\n\n";FS=OFS="\n"}/FileHeader/' /root/Desktop/logs/Default.log)
echo $var
You gave me the idea of this for killing a process :). Just chromium to whatever process you wanna kill.
Try this:
VAR=$(ps -ef | grep -i chromium | awk '{print $2}'); kill -9 $VAR 2>/dev/null; unset VAR;
anytime you see grep piped to awk, you can drop the grep. for the above,
awk '/^password/ {print $2}'
awk can easily replace any text command like cut, tail, wc, tr etc. and especally multiple greps piped next to each other. i.e
grep some_co.mand | a | grep b ... to | awk '/a|b|and so on/ {some action}.
Try to create a variable coming from vault/Hashicorp, when using packer template variables, like so:
BUILD_PASSWORD=$(vault read secret/buildAccount| grep ^password | awk '{print $2}')
echo $BUILD_PASSWORD
You can to the same with grep ^user

Resources