I have been trying to extract specific strings from the output in Linux
For example:
ps -eo pid,args | grep PRD_ | egrep startscen.sh | more
gives the following output
(Full-size image: http://i.imgur.com/reS7wZ1.png)
I am aware awk, sed, tr can be used to extract details like PID but I am not sure how to write a query to get exactly the pid of the row where the fourth column has a specific string like 'PROCESS_ALL_BETS'
Or how do I extract every character after _NAME=?
Awk to the rescue.
ps -eo pid,args | awk '/PRD_/ && /startscen\.sh/ && $4 ~ /PROCESS_ALLBETS/'
(In the image, you have PROCESS_ALLBETS, so I guess that's what you actually want, even though your text says PROCESS_ALL_BETS.)
This selects for printing every line which matches all the following conditions:
/PRD_/ -- there is a "PRD_" somewhere in the line. Maybe you would tighten this to something like $6 ~ /^-NAME=PRD_/ to only match on the beginning of the sixth field.
/stratscen\.sh/ -- there is a match for this regex somewhere on the line. Again, for improved precision, you might want to change this to $3 ~ /startscen\.sh/ or even $3 == "startscen.sh" if you only want exact matches.
$4 ~ /PROCESS_ALLBETS/ -- the fourth field matches this regular expression.
The above will simply print all matching lines. To print just the first field and the eight field with the prefix -SESSION_NAME= removed, add something like
{ n=$8; sub(/^-SESSION_NAME=/,"",n); print $1, n }
just before the closing single quote.
Related
I'm experiencing some issues with a awk command right now. The original script was developed using awk on MacOS and was then ported to Linux. There awk shows a different behavior.
What I want to do is to count the occurrences of single strings provided via /tmp/test.uniq.txt in the file /tmp/test.txt.
awk '{print $1, system("cat /tmp/test.txt | grep -o -c " $1)}' /tmp/test.uniq.txt
Mac delivers an expected output like:
test1 2
test2 1
The output is in one line, the sting and the number of occurrences, separated by a whitespace.
Linux delivers an output like:
2
test1 1
test2
The output is not in one line an the output of the system command is printed first.
Sample input:
test.txt looks like:
test1 test test
test1 test test
test2 test test
test.uniq.txt looks like:
test1
test2
As comments suggested that using grep and cat etc using system function is not recommended as awk is complete language that can perform most of these tasks.
You can use following awk command to replace your cat | grep functionality:
awk 'FNR == NR {a[$1]=0; next} {for (i=1; i<=NF; i++) if ($i in a) a[$i]++}
END { for (i in a) print i, a[i] }' uniq.txt test.txt
test1 2
test2 1
Note that this output doesn't match with the count 5 as your question states as your sample data is probably different.
References:
Effective AWK Programming
Awk Tutorial
It looks to me as if you're trying to count the number of line containing each unique string in the uniq file. But the way you're doing it is .. awkward, and as you've demonstrated, inconsistent between versions of awk.
The following might work a little better:
$ awk '
NR==FNR {
a[$1]
next
}
{
for (i in a) {
if ($1~i) {
a[i]++
}
}
}
END {
for (i in a)
printf "%6d\t%s\n",a[i],i
}
' test.uniq.txt test.txt
2 test1
1 test2
This loads your uniq file into an array, then for every line in your text file, steps through the array to count the matches.
Note that these are being compared as regular expressions, without word boundaries, so test1 will also be counted as part of test12.
Another way might be to use grep+sort+uniq:
grep -o -w -F -f uniq.txt test.txt | sort | uniq -c
It's a pipeline but a short one
From man grep:
-F, --fixed-strings, --fixed-regexp Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.
(-F is specified by POSIX, --fixed-regexp is an obsoleted alias, please do not use it in new scripts.)
-f FILE, --file=FILE Obtain patterns from FILE, one per line. The empty file contains zero patterns and therefore matches nothing. (-f is specified by POSIX.)
-o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
Case example:
$ cat data.txt
foo,bar,moo
I can obtain the field data by using cut, assuming , as separator, but only if I know which position it has. Example to obtain value bar (second field):
$ cat data.txt | cut -d "," -f 2
bar
How can I obtain that same bar (or number field == 2) if I only know it contains a a letter?
Something like:
$ cat data.txt | reversecut -d "," --string "a"
[results could be both "2" or "bar"]
In other words: how can I know what is the field containing a substring in a text-delimited file using linux shell commands/tools?
Of course, programming is allowed, but do I really need looping and conditional structures? Isn't there a command that solves this?
Case of specific shell, I would prefer Bash solutions.
A close solution here, but not exactly the same.
More same-example based scenario (upon requestion):
For a search pattern of m or mo, the results could be both 3 or moo.
For a search pattern of f or fo, the results could be both 1 or foo.
Following simple awk may also help you in same.
awk -F, '$2~/a/{print $2}' data.txt
Output will be bar in this case.
Explanation:
-F,: Setting field separator for lines as comma, to identify the fields easily.
$2~/a/: checking condition here if 2nd field is having letter a in it, if yes then printing that 2nd field.
EDIT: Adding solution as per OP's comment and edited question too now.
Let's say following Input_file is there
cat data.txt
foo,bar,moo
mo,too,far
foo,test,test1
fo,test2,test3
Then following is the code for same:
awk -F, '{for(i=1;i<=NF;i++){if($i ~ /fo/){print $i}}}' data.txt
foo
foo
fo
OR
awk -F, '{for(i=1;i<=NF;i++){if($i ~ /mo/){print $i}}}' data.txt
moo
mo
I have a rather large file. What is common to all is the hostname to break each section example :
HOSTNAME:host1
data 1
data here
data 2
text here
section 1
text here
part 4
data here
comm = 2
HOSTNAME:host-2
data 1
data here
data 2
text here
section 1
text here
part 4
data here
comm = 1
The above prints
As you see above, in between each section there are other sections broken down by key words or lines that have specific values
I like to use a oneliner to print host name for each section and then print which ever lines I want to extract under each hostname section
Can you please help. I am using now grep -C 10 HOSTNAME | gerp -C pattern
but this assumes that there are 10 lines in each section. This is not an optimal way to do this; can someone show a better way. I also need to be able to print more than one line under each pattern that I find . So if I find data1 and there are additional lines under it I like to grab and print them
So output of command would be like
grep -C 10 HOSTNAME | grep data 1
grep -C 10 HOSTNAME | grep -A 2 data 1
HOSTNAME:Host1
data 1
HOSTNAME:Hoss2
data 1
Beside Grep I use this sed command to print my output
sed -r '/HOSTNAME|shared/!d' filename
The only problem with this sed command is that it only prints the lines that have patterns shared & HOSTNAME in them. I also need to specify the number of lines I like to print in my case under the line that matched patterns shared. So I like to print HOSTNAME and give the number of lines I like to print under second search pattern shared.
Thanks
awk to the rescue!
$ awk -v lines=2 '/HOSTNAME/{c=lines} NF&&c&&c--' file
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
print lines number of lines including pattern match, skips empty lines.
If you want to specify secondary keyword instead number of lines
$ awk -v key='data 1' '/HOSTNAME/{h=1; print} h&&$0~key{print; h=0}' file
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
Here is a sed twoliner:
sed -n -r '/HOSTNAME/ { p }
/^\s+data 1/ {p }' hostnames.txt
It prints (p)
when the line contains a HOSTNAME
when the line starts with some whitespace (\s+) followed by your search criterion (data 1)
non-mathing lines are not printed (due to the sed -n option)
Edit: Some remarks:
this was tested with GNU sed 4.2.2 under linux
you dont need the -r if your sed version does not support it, replace the second pattern to /^.*data 1/
we can squash everything in one line with ;
Putting it all together, here is a revised version in one line, without the need for the extended regex ( i.e without -r):
sed -n '/HOSTNAME/ { p } ; /^.*data 1/ {p }' hostnames.txt
The OP requirements seem to be very unclear, but the following is consistent with one interpretation of what has been requested, and more importantly, the program has no special requirements, and the code can easily be modified to meet a variety of requirements. In particular, both search patterns (the HOSTNAME pattern and the "data 1" pattern) can easily be parameterized.
The main idea is to print all lines in a specified subsection, or at least a certain number up to some limit.
If there is a limit on how many lines in a subsection should be printed, specify a value for limit, otherwise set it to 0.
awk -v limit=0 '
/^HOSTNAME:/ { subheader=0; hostname=1; print; next}
/^ *data 1/ { subheader=1; print; next }
/^ *data / { subheader=0; next }
subheader && (limit==0 || (subheader++ < limit)) { print }'
Given the lines provided in the question, the output would be:
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
(Yes, I know the variable 'hostname' in the awk program is currently unused, but I included it to make it easy to add a test to satisfy certain obvious requirements regarding the preconditions for identifying a subheader.)
sed -n -e '/hostname/,+p' -e '/Duplex/,+p'
The simplest way to do it is to combine two sed commands ..
I want to match print the lines of log file whose 10th field starts with below digits
.I succeeded to print the 10th field only.How to print whole line.
In short I want to compare 10th field with multiple strings at same time
tail -f *log|awk -F" " '/Gateway request received for service type Ussd/ {print $10}'|egrep '254720|254721|254722|254723|254724|254725|254726|254727|254728|254729|254710|254712|254713|254714|254715|254716|254717|254718|254719|254700|254701|254702'.*
tail -f *log|awk '/Gateway (and rest)/ && $10~/254720|2542...and rest/'
should do.
-F" " is not necessary
grep could (should) be saved
If the $10 matches number exactly, you can use $10>lower-bound && $10<upper-bound instead of $10~/regex/
perhaps condensing numbers into regex ranges will help
$ ... | awk '/Gate... etc./ && $10~/25471[02-9]|25472[0-9]|25470[01]/'
or grouping further
$ ... | awk '/Gate... etc./ && $10~/2547(1[02-9]|2[0-9]|0[01])/'
I have written a little script using sed to transform this:
kaefert#Ultrablech ~ $ cat /sys/class/power_supply/BAT0/uevent
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_STATUS=Full
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CYCLE_COUNT=0
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000
POWER_SUPPLY_VOLTAGE_NOW=8370000
POWER_SUPPLY_POWER_NOW=0
POWER_SUPPLY_ENERGY_FULL_DESIGN=45640000
POWER_SUPPLY_ENERGY_FULL=44541000
POWER_SUPPLY_ENERGY_NOW=44541000
POWER_SUPPLY_MODEL_NAME=UX32-65
POWER_SUPPLY_MANUFACTURER=ASUSTeK
POWER_SUPPLY_SERIAL_NUMBER=
into a csv file format like this:
kaefert#Ultrablech ~ $ Documents/Asus\ Zenbook\ UX32VD/power_to_csv.sh
"date";"status";"voltage µV";"power µW";"energy full µWh";"energy now µWh"
2012-07-30 11:29:01;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:02;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:04;"Full";8369000;0;44541000;44541000
... (in a loop)
What I would like now is to divide each of those numbers by 1.000.000 so that they don't represent µV but V and W instead of µW, so that they are easily interpretable on a quick glance. Of course I could do this manually afterwards once I've opened this csv inside libre office calc, but I would like to automatize it.
So what I found is, that I can call external programs in between sed, like this:
...
s/\nPOWER_SUPPLY_PRESENT=1\nPOWER_SUPPLY_TECHNOLOGY=Li-ion\nPOWER_SUPPLY_CYCLE_COUNT=0\nPOWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000\nPOWER_SUPPLY_VOLTAGE_NOW=\([0-9]\{1,\}\)/";'`echo 0`'\1/
and that I could get values like I want by something like this:
echo "scale=6;3094030/1000000" | bc | sed 's/0\{1,\}$//'
But the problem now is, how do I pass my match "\1" into the external command?
If you are interested in looking at the full script, you'll find it there:
http://koega.no-ip.org/mediawiki/index.php/Battery_info_to_csv
if your sed is GNU sed. you can use 'e' to pass matched group to external command/tools within sed command.
an example might be helpful to make it clear:
say, you have a problem:
you have a string "120+20foobar" now you want to get the calculation result of 120+20 part, and replace "oo" to "xx" in "foobar" part.
Note that this example is not for solving the problem above, just for
showing the sed 'e' usage
so you could make 120+20 in the first match group, and rest in 2nd group, then pass two groups to different command/tools and then get the result. like:
kent$ echo "100+20foobar"|sed -r 's#([0-9+]*)(.*)#echo \1 \|bc\;echo \2 \| sed "s/oo/xx/g"#ge'
120
fxxbar
in this way, you could nest many seds one in another one, till you get lost. :D
As sed doesn't do arithmetic on its own I would recommend using awk for something like this, e.g. to divide 3rd, 5th and 6th field by a million do something like this:
awk -F';' -v OFS=';' '
NR == 1
NR != 1 {
$3 /= 1e6
$5 /= 1e6
$6 /= 1e6
print
}'
Explanation
-F';' and -v OFS=';' specify the input and output field separator.
NR == 1 pass first line through without change.
NR != 1 if it is not the first line, divide and print.
To divide by 1,000,000 directly, you do so :
Q='3094030/1000000'
sed ':r /^[[:digit:]]\{7\}/{s$\([[:digit:]]*\)\([[:digit:]]\{6\}\)/1000000$\1.\2$;p;d};s:^:0:;br;d'