Finding a process by argument string - linux

I'm using ps, grep, and sed to try to identify some java processes that are uniquely identified by some specific argument, e.g. -DAppService=DDDABC_456 or -DAppService=DDDXYZ_456_cazorla. I want to return a comma separated list: PID,argument,process
I'm working on CentOS7. So far I'm only about half way down the line but getting tangled up.
I'm shooting for this:
1234,-DAppService=DDDABC_456,/usr/java/jdk1.8.0_112/bin/java
2345,-DAppService=DDDABC_456_cazorla,/usr/java/jdk1.8.0_112/bin/java
3456,-DAppService=DDDXYZ_789,/usr/java/jdk1.8.0_112/bin/java
4567,-DAppService=DDDXYZ_789_cazorla,/usr/java/jdk1.8.0_112/bin/java
Note that the argument may or may not have a suffix of "_cazorla".
I tried this but it loses the arguments (and the number of arguments may vary so I don't think I can continue with $9, $10, etc.):
ps -ef | grep DAppService=DDD[A-Z]*_[0-9]*(?:_[a-z]*)? | grep -v grep | awk '{OFS=","; print $2,$8}'
Gives me:
1234,/usr/java/jdk1.8.0_112/bin/java
2345,/usr/java/jdk1.8.0_112/bin/java
3456,/usr/java/jdk1.8.0_112/bin/java
4567,/usr/java/jdk1.8.0_112/bin/java
Also this which comma separates all the grep column results and all arguments too which I don't want:
ps -aef | grep DAppService=DDD[A-Z]*_[0-9]*(?:_[a-z]*)? | grep -v grep | sed -e "s/\s\+/,/g"
Actual result too much to list here but e.g.
user,1234,1,0,Jul03,pts/0,00:03:21,/usr/java/jdk1.8.0_112/bin/java,arg1,arg2,arg3,argn...
user,2345,1,0,Jul03,pts/0,00:03:21,/usr/java/jdk1.8.0_112/bin/java,arg1,arg2,arg3,argn...
user,3456,1,0,Jul03,pts/0,00:03:21,/usr/java/jdk1.8.0_112/bin/java,arg1,arg2,arg3,argn...
user,4567,1,0,Jul03,pts/0,00:03:21,/usr/java/jdk1.8.0_112/bin/java,arg1,arg2,arg3,argn...
My sed knowledge is pretty poor (as is awk but would be open to that as an option too). Once I'm happy with the commands I want to put them into a bash script that I can call from elsewhere.

ps -eo pid=,args= |\
awk '
{
for (i=3; i<=NF; i++)
if ($i ~ regex) {
print $1, $i, $2
next
}
}
' OFS=, regex='awk re to match arg'
ask ps to output just pid and the commandline
specify a regex to awk and have it check each argument (fields 3 to NF) for a match
if found, output pid ($1), command ($2), and the relevant argument ($i)
Notes:
awk can't distinguish cmd "arg1 with spaces" from cmd arg1 arg2 arg3 but that may not matter here
spaces in the command (eg. in a directory name in the path) will cause the command to be truncated at the first space
commas in the command (or the relevant argument) will break the csv output

Related

A Shell Script to simulate the wc command with its options?

we have to write a shell script program , which works similar to wc command.
receives -l, -c and -w as its options.
Shell scripting syntax aside; MY QUESTION is that can we simulate logic of wc -c or wc -l or wc -w using sed or grep or anything else ; if yes then how?
IMP: Don't use wc in script
A single awk command that you can parameterize by setting the appropriate -v variables to 0:
LC_ALL=C awk -v l=1 -v w=1 -v c=1 '
{ wc+=NF; cc+=1+length($0) }
END { printf "%s\t%s\t%s\n", l ? NR : "", w ? wc: "", c ? cc : ""}
' file
Note:
For simplicity, you always get 3 \t-separated output fields, with fields whose output wasn't requested empty; it wouldn't be hard to modify this to emulate wc's output behavior, however.
As explained in choroba's grep answer, you must prepend LC_ALL=C  to awk ..., if you really want to count bytes (-c) rather than (potentially multi-byte) characters (-m).
To count characters (the equivalent of wc -m), remove LC_ALL=C  above.
Caveat: This won't work BSD awk, as also found on macOS, unfortunately, because it not Unicode-aware and always counts the number of bytes (try awk '{print length($0)}' <<<ü).
wc -l strictly counts the number of \n characters, so it doesn't count an incomplete line - one missing a trailing \n - at the end of its input; the above awk command, by contrast, does count that line (and an implied trailing newline in the byte/character count).
How it works:
awk's NF variables contains the number of fields on each input line, where the line is broken into fields by arbitrary runs of whitespace by default; in other words: by default, fields are words.
$0 is the input line at hand, whose length() tells you the number of characters / bytes, with 1 added to account for the \n character at the end of the line.
Note how variables wc and cc need to initialization, because awk implicitly treats empty/undefined variables as 0 in a numeric context (such as with compound operator +=).
NR contains the current, 1-based line number, which in the END block is equal to the total number of input lines.
Using awk:
-l:
awk 'END{print NR}' inFile
-w:
awk '{words+=NF}END{print words}' inFile
-c:
ls -l inFile | awk '{print $5}'
If you can use grep, simulating the line count is easy: just count how many times something that matches always happens:
grep -c '^' filename
This should output the same as wc -l (but it might report one more line if the file doesn't end in a newline).
To get the number of words, you can use the following pipeline:
grep -o '[^[:space:]]\+' filename | grep -c '^'
You need grep that supports the -o option which prints each matching string to a line of its own. The expression matches all non-space sequences, and piping them into what we used in the previous case just counts them.
To get the number of characters (wc -c), you can use
LC_ALL=C grep -o . filename | grep -c '^'
Setting LC_ALL is needed if your locale supports UTF-8, otherwise you'd count wc -m. You need to add the number of newlines to the output number, so
echo $(( $( grep -c '^' filename )
+ $( LC_ALL=C grep -o . filename | grep -c '^' ) ))

awk system does not take hyphens

I want to redirect the output of some command to awk and use system call in awk. But Awk does not accept flags with hyphen. For example, Lets say I have bunch of files, and I want to "cat" them. I would use ls -1 | awk '{ system(" cat " $1)}'
Now, if I want to print the line number also with -n then it does not work ls -1 | awk '{ system(" cat -n" $1)}'
You need a space between -n and the file name:
ls -1 | awk '{ system(" cat -n " $1)}'
Notes
-1 is not needed. ls implicitly prints 1 file per line when its output goes to a pipe.
Any file name with whitespace in it will cause this code to fail.
Parsing the output of ls is generally a bad idea. Both find and the shell offer superior handling of difficult file names.
John1024's helpful answer fixes your problem and contains helpful advice, but let me focus on the syntax aspects:
As a command string, cat -n <file> requires at least 1 space (or tab) between the n, which is an option, and <file>, which is an operand.
String concatenation works differently in awk than in the shell:
" cat -n" $1, despite the presence of a space between " cat -n" and $1, does not insert that space in the resulting string, because awk's string concatenation works by directly joining strings placed next to one another irrespective of intervening whitespace.
For instance, the following commands all yield string literal ab, irrespective of any whitespace between the operands of the string concatenation:
awk 'BEGIN { print "a""b" }'
awk 'BEGIN { print "a" "b" }'
awk 'BEGIN { s = "b"; print "a"s }'
awk 'BEGIN { s = "b"; print "a" s }'
this is not a proper use case for awk, you're better off with something like this
find . -maxdepth 1 -type f -exec cat -n {} \;

search a line that contain a special character using sed or awk

I wonder if there is a command in Linux that can help me to find a line that begins with "*" and contains the special character "|"
for example
* Date | Auteurs
Simply use:
grep -ne '^\*.*|' "${filename}"
Or if you want to use sed:
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'
Or (gnu) awk equivalent (require to backslash the pipe):
awk '/^\*.*\|/' "${filename}"
Where:
^ : start of the line
\*: a literal *
.*: zero or more generic char (not newline)
| : a literal pipe
NB: "${filename}": i've assumed you're using the command in a script with the target file passed in a double quoted variable as "${filename}". In the shell simply use the actual name of the file (or the path to it).
UPDATE (line numbers)
Modify the above commands to obtain also the line number of the matched lines. With grep is simple as to add -n switch:
grep -ne '^\*.*|' "${filename}"
We obtain an output like this:
81806:* Date | Auteurs
To obtain exactly the same output from sed and awk we have to complicate the commands a little bit:
awk '/^\*.*\|/{print NR ":" $0}' "${filename}"
# the = print the line number, p the actual match but it's on two different lines so the second sed call
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'

return all lines that match String1 in a file after the last matching String2 in the same file

I figured out how to get the line number of the last matching word in the file :
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
It gave me the value of 1787. So, I passed it manually to the sed command to search for the lines that contains the sentence "blades are down" after that line number and it returned all the lines successfully
sed -n '1787,$s/blades are down/&/p' myfile.txt
Is there a way that I can pass the line number from the first command to the second one through a variable or a file so I can but them in the script to be executed automatically ?
Thank you.
You can do this by just connecting your two commands with xargs. 'xargs -I %' allows you to take the stdin from a previous command and place it whenever you want in the next command. The '%' is where your '1787' will be written:
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1 | xargs -I % sed -n %',$s/blades are down/&/p' myfile.txt
You can use:
command substitution to capture the result of the first command in a variable.
simple string concatenation to use the variable in your sed comand
startLine=$(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)
sed -n ${startLine}',$s/blades are down/&/p' myfile.txt
You don't strictly need the intermediate variable - you could simply use:
sed $(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)',$s/blades are down/&/p' myfile.txt`
but it may make sense to do error checking on the result of the command substitution first.
Note that I've streamlined the first command by using grep's -n option, which puts the line number separated with : before each match.
First we can get "half" of the file after the last match of string2, then you can use grep to match all the string1
tac your_file | awk '{ if (match($0, "string2")) { exit ; } else {print;} }' | \
grep "string1"
but the order is reversed if you don't care about the order. But if you do care, just add another tac at the end with a pipe |.
This might work for you (GNU sed):
sed -n '/\n/ba;/ b /h;//!H;$!d;x;//!d;s/$/\n/;:a;/\`.*blades are down.*$/MP;D' file
This reads through the file storing all lines following the last match of the first string (" b ") in the hold space.
At the end of file, it swaps to the hold space, checks that it does indeed have at least one match, then prints out those lines that match the second string ("blades are down").
N.B. it makes the end case (/\n/) possible by adding a new line to the end of the hold space, which will eventually be thrown away. This also caters for the last line edge condition.

Adding an || regex to a bash ``ed one-liner in a perl script

I am trying to add an || regex to a bash ``ed one-liner in a perl script, if that makes any sense.
my $result = `df -H | grep -vE '^Filesystem|tmpfs|cdrom|none' | awk '{ print \$1 "\t" \$5 " used."}'`;
# .Private maybe the same as /dev/sdb1 so I'm trying to remove it too
# by trying to add || (m/\.Private/) to the above
print "$result";
So I am removing lines from the output that start with Filesystem, tmpfs, cdrom or none, at present, but I would also like to and the "or lines containing .Private" to the one-liner, if possible...
I have the below also, but want to reproduce its results with the above code...
my #result2 =Shell::df ("-H");
shift #result2; # get rid of "Filesystem..."
for( #result2 ){
next if ((/^tmpfs|tmpfs|cdrom|none/) || (m/\.Private/));
my #words2 = split('\s+', $_);
print $words2[0], "\t", $words2[4], " used\.\n";
}
I'd recommend that you get rid of the "awk" part entirely. Calling awk from inside perl is silly.
Instead, rely on capturing lines using list context, and then do your processing inside perl.
my #lines = df -H;
my #results = grep ... #lines; # perl 'grep' builtin
If you insist on using the unix grep, why not just add '|.Private' to your grep exclusion pattern?
You just need to add the \.Private part to the current regexp:
grep -vE '^Filesystem|tmpfs|cdrom|none|\.Private'
On a side note, the pattern ^Filesystem|tmpfs|cdrom|none might not actually do what you want, as only Filesystem is matched at the beginning of the line, the other parts will be matched if they appear anywhere in the input. To match them at the beginning, change it to:
'^Filesystem|^tmpfs|^cdrom|^none'
Your regex doesn't do what you think it does. It matches strings that start with Fileystem or that contain the other words anywhere.
Try with this:
grep -vE '^(Filesystem|tmpfs|cdrom|none)|\.Private'
Like this?
my $result = `df -H | grep -vE '(^Filesystem|tmpfs|cdrom|none)|\.Private' | awk '{ print \$1 "\t" \$5 " used."}'`;

Resources