Simpler way of extracting text from file - linux

I've put together a batch script to generate panoramas using the command line tools used by Hugin. One interesting thing about several of those tools is they allow multi-core usage, but this option has to be flagged within the command.
What I've come up with so far:
#get the last fields of each line in the file, initialize the line counter
results=$(more /proc/cpuinfo | awk '{print ($NF)}')
count=0
#loop through the results till the 12th line for cpu core count
for result in $results; do
if [ $count == 12 ]; then
echo "Core Count: $result"
fi
count=$((count+1))
done
Is there a simpler way to do this?

result=$(awk 'NR==12{print $NF}' /proc/cpuinfo)

To answer your question about getting the first/last so many lines, you could use head and tail,e.g. :
cat /proc/cpuinfo | awk '{print ($NF)}' | head -12 | tail -1
But instead of searching for the 12th line, how about searching semantically for any line containing cores. For example, some machines may have multiple cores, so you may want to sum the results:
cat /proc/cpuinfo | grep "cores" | awk '{s+=$NF} END {print s}'

count=$(getconf _NPROCESSORS_ONLN)
see getconf(1) and sysconf(3) constants.
According to the Linux manpage, _SC_NPROCESSORS_ONLN "may not be standard". My guess is this requires glibc or even a Linux system specifically. If that doesn't work, I'd probably take looking at /sys/class/cpuid (perhaps there's something better?) over parsing /proc/cpuinfo. None of the above are completely portable.

There are many ways:
head -n 12 /proc/cpuinfo | tail -1 | awk -F: '{print $2}'
grep 'cpu cores' /proc/cpuinfo | head -1 | awk -F: '{print $2}'
and so on.
But I must note that you take only the information from the first section of /proc/cpuinfo and I am not sure that that is what you need.

And if the cpuinfo changes its format ;) ? Maybe something like this will be better:
cat /proc/cpuinfo|sed -n 's/cpu cores\s\+:\s\+\(.*\)/\1/p'|tail -n 1
And make sure to sum the cores. Mine has got like 12 or 16 of them ;)

unsure what you are trying to do and why what ormaaj said above wouldn't wouldn't work either. my instinct based on your description would have been much simpler along the lines of.
grep processor /proc/cpuinfo | wc -l

Related

Optimizing search in linux

I have a huge log file close to 3GB in size.
My task is to generate some reporting based on # of times something is being logged.
I need to find the number of time StringA , StringB , StringC is being called separately.
What I am doing right now is:
grep "StringA" server.log | wc -l
grep "StringB" server.log | wc -l
grep "StringC" server.log | wc -l
This is a long process and my script takes close to 10 minutes to complete. What I want to know is that whether this can be optimized or not ? Is is possible to run one grep command and find out the number of time StringA, StringB and StringC has been called individually ?
You can use grep -c instead of wc -l:
grep -c "StringA" server.log
grep can't report count of individual strings. You can use awk:
out=$(awk '/StringA/{a++;} /StringB/{b++;} /StringC/{c++;} END{print a, b, c}' server.log)
Then you can extract each count with a simple bash array:
arr=($out)
echo "StringA="${arr[0]}
echo "StringA="${arr[1]}
echo "StringA="${arr[2]}
This (grep without wc) is certainly going to be faster and possibly awk solution is also faster. But I haven't measured any.
Certainly this approach could be optimized since grep doesn't perform any text indexing. I would use a text indexing engine like one of those from this review or this stackexchange QA . Also you may consider using journald from systemd which stores logs in a structured and indexed format so lookups are more effective.
So many greps so little time... :-)
According to David Lyness, a straight grep search is about 7 times as fast as an awk in large file searches.
If that is the case, the current approach could be optimized by changing grep to fgrep, but only if the patterns being searched for are not regular expressions. fgrep is optimized for fixed patterns.
If the number of instances is relatively small compared to the original log file entries, it may be an improvement to use the egrep version of grep to create a temporary file filled with all three instances:
egrep "StringA|StringB|StringC" server.log > tmp.log
grep "StringA" tmp.log | wc -c
grep "StringB" tmp.log | wc -c
grep "StringC" tmp.log | wc -c
The egrep variant of grep allows for a | (vertical bar/pipe) character to be used between two or more separate search strings so that you can find multiple strings in statement. You can use grep -E to do the same thing.
Full documentation is in the man grep page and information about the Extended Regular Expressions that egrep uses from the man 7 re_format command.

Filter lines by number of fields

I am filtering very long text files in Linux (usually > 1GB) to get only those lines I am interested in. I use with this command:
cat ./my/file.txt | LC_ALL=C fgrep -f ./my/patterns.txt | $decoder > ./path/to/result.txt
$decoder is the path to a program I was given to decode these files. The problem now is that it only accept lines with 7 fields, this is, 7 strings separated by spaces (e.g. "11 22 33 44 55 66 77"). Whenever a string with more or less fields is passed into this program makes it crash, and I get a broken pipe error message.
To fix it, I wrote a super simple script in Bash:
while read line ; do
if [[ $( echo $line | awk '{ print NF }') == 7 ]]; then
echo $line;
fi;
done
But the problem is that now it take ages to finish. Before it took seconds and now it takes ~30 minutes.
Does anyone know a better/faster way to do this? Thank you in advance.
Well perhaps you can insert awk between instead. No need to rely on Bash:
LC_ALL=C fgrep -f ./my/patterns.txt ./my/file.txt | awk 'NF == 7' | "$decoder" > ./path/to/result.txt
Perhaps awk can be the starter. Performance may be better that way:
awk 'NF == 7' ./my/file.txt | LC_ALL=C fgrep -f ./my/patterns.txt | "$decoder" > ./path/to/result.txt
You can merge fgrep and awk as a single awk command however I'm not sure if that would affect anything that require LC_ALL=C and that it would give better performance.

Shell script to get count of a variable from a single line output

How can I get the count of the # character from the following output. I had used tr command and extracted? I am curious to know what is the best way to do it? I mean other ways of doing the same thing.
{running_device,[test#01,test#02]},
My solution was:
echo '{running_device,[test#01,test#02]},' | tr ',' '\n' | grep '#' | wc -l
I think it is simpler to use:
echo '{running_device,[test#01,test#02]},' | tr -cd # | wc -c
This yields 2 for me (tested on Mac OS X 10.7.5). The -c option to tr means 'complement' (of the set of specified characters) and -d means 'delete', so that deletes every non-# character, and wc counts what's provided (no newline, so the line count is 0, but the character count is 2).
Nothing wrong with your approach. Here are a couple of other approaches:
echo $(echo {running_device,[test#01,test#02]}, |awk -F"#" '{print NF - 1}')
or
echo $((`echo {running_device,[test#01,test#02]} | sed 's+[^#]++g' | wc -c` - 1 ))
The only concern I would have is if you are running this command in a loop (e.g. once for every line in a large file). If that is the case, then execution time could be an issue as stringing together shell utilities incurs the overhead of launching processes which can be sloooow. If this is the case, then I would suggest writing a pure awk version to process the entire file.
Use GNU Grep to Avoid Character Translation
Here's another way to do this that I personally find more intuitive: extract just the matching characters with grep, then count grep's output lines. For example:
echo '{running_device,[test#01,test#02]},' |
fgrep --fixed-strings --only-matching # |
wc -l
yields 2 as the result.

Calculate percentage free swap space with `free` and `awk`

I'm trying to calculate the percentage of free swap space available.
Using something like this:
free | grep 'Swap' | awk '{t = $2; f = $4; print ($f/$t)}'
but awk is throwing:
awk: program limit exceeded: maximum number of fields size=32767
And I don't really understand why, my program is quite simple, is it possible I'm having a weird range error?
Try this one :
free | grep 'Swap' | awk '{t = $2; f = $4; print (f/t)}'
In your code you are trying to print $f and $t which is respectively $FreeMemory and $TotalMemory. So i guess you have about 4gig ram in total which would refer to ~ $400000 which is a little bit over the total of fields awk uses in standard config. Apart from the easier attempt with meminfo try just printing f/t which refers to the variables and you get your answer.
Note that it might be easier/more robust to read the info by using /proc/meminfo's SwapFree line.
Something like:
$ grep SwapFree /proc/meminfo | awk '{print $2}'
You do not need the variables. You can use plain
awk '{ print $4/$2 }'
Read it from /proc/meminfo:
lennart#trololol:~$ grep SwapFree /proc/meminfo | awk '{print $2}'
0
I realise that the question is about using "free" and "awk", but if you have SAR running, then this will give you the most recently recorded percentage value:
sar -S|tail -2|head -1|awk '{print $5}'

Apply two greps and awk to same input

I'm using two short UNIX commands in my python script to get some data about nearby wireless access points.
n°1, gets the ESSID of the access point :
"iwlist NIC scan | grep ESSID | awk '{print $1}'"
n°2, gets the signal strength of the access point :
"iwlist NIC scan | grep level | awk '{print $3}'"
My problem is that I use these two commands one after the other which means that it doesn't generate "symmetric" data. You might get 6 ESSIDs and 4 Signal strength data.
Because the first time, the script found 6 APs (A, B, C, D, E and F) and the next time only 4 APs (A, C, E and F).
Some my question is the following :
Is there a way to "split" the result of the first iwlist NIC scan and then apply two different grep and awk sequences to the same input ?
Just so that you at least get a symmetric list of results.
Thank you in advance !
What about using awk as grep:
iwlist NIC scan | awk '/ESSID/ {print $1} /level/ {print $3}'
This gives you the ESSID and level lines all at once. You'd probably want to be a little more sophisticated and at least tag the lines with what it represents; the options are legion. It isn't clear from your code how you're going to use the output, so I'm not going to try and second-guess how best to present it (but I would expect that network ID and level on the same line would be a nice output — and it is doable).
In general, you can accomplish this type of routing using tee and process substitution:
iwlist NIC scan | tee >( grep -i ESSID | awk '{print $1}' ) | grep -i level | awk '{print $3}'
but this is inferior in this situation for several reasons:
grep is superfluous, since awk can do the filtering itself
The two branches are similar enough to fold into a single awk command, as Jonathan Leffler points out.
The two output streams are merged together in a nondeterministic manner, so it may be difficult or impossible to determine which level corresponds to which ESSID. Storing the output of each branch in a file and later matching them line by line helps, but then this is not much better than asgs's solution.
But the technique of passing one command's output to two different pipelines without an explicit temporary file may be useful elsewhere; consider this answer just a demonstration.
#!/bin/bash
iwlist <NIC> scan > tmpfile
grep -i ESSID tmpfile | awk '{print $1}'
grep -i level tmpfile | awk '{print $3}'
rm tmpfile
A script something like this might just do what you're expecting.

Resources