Output line counts for files with 1 number filename difference - linux

We have a requirement to report on the number of lines written to a 7 day cycle of log files. The log files are called - [filename].log.1.gz for today, [filename].log.2.gz for yesterday up to [filename].log.7.gz for the 7th day
I was hoping to create a script that would output the numbers at once, instead of running the zcat [filename].log.1.gz | wc -l command against each line. I was also hoping to have a meaningful message against each outputted value
I can write a bash script that will do each line as the name of the files are the same, but I was hoping for something a bit more elegant
Instead of this
zcat position.log.3.gz | wc -l
zcat position.log.4.gz | wc -l
zcat position.log.5.gz | wc -l
zcat position.log.6.gz | wc -l
zcat position.log.7.gz | wc -l
I was hoping for something more like this
for i in {1..7}
c=$(zcat position.log.$i.gz | wc -l)
message=$"The count for "
date1=$(date --date='$i days ago')
result=$"$message$date1$c"
echo $result
done
However, I can't get this to run.
Any ideas?

You script with small fixes:
# for loop starts with a 'do'
for i in {1..7}; do
# indentation makes everything redable
# always quote your variables
c=$(zcat "position.log.$i.gz" | wc -l)
# The "" is just a string, no '$' sign before it
message="The count for "
# The '$i' is literally `$i`, if you want to _expand_ a variable, you have to use "$i"
date1=$(date --date="$i days ago")
# same as above, drop the $
result="$message$date1$c"
# always quote your variables
echo "$result"
done
Or maybe a tiny little bit shorter:
for i in {1..7}; do
echo "The count for $(date --date="$i days ago") is $(zcat "position.log.$i.gz" | wc -l)"
done

Related

Suppressing summary information in `wc -l` output

I use the command wc -l count number of lines in my text files (also i want to sort everything through a pipe), like this:
wc -l $directory-path/*.txt | sort -rn
The output includes "total" line, which is the sum of lines of all files:
10 total
5 ./directory/1.txt
3 ./directory/2.txt
2 ./directory/3.txt
Is there any way to suppress this summary line? Or even better, to change the way the summary line is worded? For example, instead of "10", the word "lines" and instead of "total" the word "file".
Yet a sed solution!
1. short and quick
As total are comming on last line, $d is the sed command for deleting last line.
wc -l $directory-path/*.txt | sed '$d'
2. with header line addition:
wc -l $directory-path/*.txt | sed '$d;1ilines total'
Unfortunely, there is no alignment.
3. With alignment: formatting left column at 11 char width.
wc -l $directory-path/*.txt |
sed -e '
s/^ *\([0-9]\+\)/ \1/;
s/^ *\([0-9 ]\{11\}\) /\1 /;
/^ *[0-9]\+ total$/d;
1i\ lines filename'
Will do the job
lines file
5 ./directory/1.txt
3 ./directory/2.txt
2 ./directory/3.txt
4. But if really your wc version could put total on 1st line:
This one is for fun, because I don't belive there is a wc version that put total on 1st line, but...
This version drop total line everywhere and add header line at top of output.
wc -l $directory-path/*.txt |
sed -e '
s/^ *\([0-9]\+\)/ \1/;
s/^ *\([0-9 ]\{11\}\) /\1 /;
1{
/^ *[0-9]\+ total$/ba;
bb;
:a;
s/^.*$/ lines file/
};
bc;
:b;
1i\ lines file' -e '
:c;
/^ *[0-9]\+ total$/d
'
This is more complicated because we won't drop 1st line, even if it's total line.
This is actually fairly tricky.
I'm basing this on the GNU coreutils version of the wc command. Note that the total line is normally printed last, not first (see my comment on the question).
wc -l prints one line for each input file, consisting of the number of lines in the file followed by the name of the file. (The file name is omitted if there are no file name arguments; in that case it counts lines in stdin.)
If and only if there's more than one file name argument, it prints a final line containing the total number of lines and the word total. The documentation indicates no way to inhibit that summary line.
Other than the fact that it's preceded by other output, that line is indistinguishable from output for a file whose name happens to be total.
So to reliably filter out the total line, you'd have to read all the output of wc -l, and remove the final line only if the total length of the output is greater than 1. (Even that can fail if you have files with newlines in their names, but you can probably ignore that possibility.)
A more reliable method is to invoke wc -l on each file individually, avoiding the total line:
for file in $directory-path/*.txt ; do wc -l "$file" ; done
And if you want to sort the output (something you mentioned in a comment but not in your question):
for file in $directory-path/*.txt ; do wc -l "$file" ; done | sort -rn
If you happen to know that there are no files named total, a quick-and-dirty method is:
wc -l $directory-path/*.txt | grep -v ' total$'
If you want to run wc -l on all the files and then filter out the total line, here's a bash script that should do the job. Adjust the *.txt as needed.
#!/bin/bash
wc -l *.txt > .wc.out
lines=$(wc -l < .wc.out)
if [[ lines -eq 1 ]] ; then
cat .wc.out
else
(( lines-- ))
head -n $lines .wc.out
fi
rm .wc.out
Another option is this Perl one-liner:
wc -l *.txt | perl -e '#lines = <>; pop #lines if scalar #lines > 1; print #lines'
#lines = <> slurps all the input into an array of strings. pop #lines discards the last line if there are more than one, i.e., if the last line is the total line.
The program wc, always displays the total when they are two or more than two files ( fragment of wc.c):
if (argc > 2)
report ("total", total_ccount, total_wcount, total_lcount);
return 0;
also the easiest is to use wc with only one file and find present - one after the other - the file to wc:
find $dir -name '*.txt' -exec wc -l {} \;
Or as specified by liborm.
dir="."
find $dir -name '*.txt' -exec wc -l {} \; | sort -rn | sed 's/\.txt$//'
This is a job tailor-made for head:
wc -l | head --lines=-1
This way, you can still run in one process.
Can you use another wc ?
The POSIX wc(man -s1p wc) shows
If more than one input file operand is specified, an additional line shall be written, of the same format as the other lines, except that the word total (in the POSIX locale) shall be written instead of a pathname and the total of each column shall be written as appropriate. Such an additional line, if any, is written at the end of the output.
You said the Total line was the first line, the manual states its the last and other wc's don't show it at all. Removing the first or last line is dangerous, so I would grep -v the line with the total (in the POSIX locale...), or just grep the slash that's part of all other lines:
wc -l $directory-path/*.txt | grep "/"
Not the most optimized way since you can use combinations of cat, echo, coreutils, awk, sed, tac, etc., but this will get you want you want:
wc -l ./*.txt | awk 'BEGIN{print "Line\tFile"}1' | sed '$d'
wc -l ./*.txt will extract the line count. awk 'BEGIN{print "Line\tFile"}1' will add the header titles. The 1 corresponds to the first line of the stdin. sed '$d' will print all lines except the last one.
Example Result
Line File
6 ./test1.txt
1 ./test2.txt
You can solve it (and many other problems that appear to need a for loop) quite succinctly using GNU Parallel like this:
parallel wc -l ::: tmp/*txt
Sample Output
3 tmp/lines.txt
5 tmp/unfiltered.txt
42 tmp/file.txt
6 tmp/used.txt
The simplicity of using just grep -c
I rarely use wc -l in my scripts because of these issues. I use grep -c instead. Though it is not as efficient as wc -l, we don't need to worry about other issues like the summary line, white space, or forking extra processes.
For example:
/var/log# grep -c '^' *
alternatives.log:0
alternatives.log.1:3
apache2:0
apport.log:160
apport.log.1:196
apt:0
auth.log:8741
auth.log.1:21534
boot.log:94
btmp:0
btmp.1:0
<snip>
Very straight forward for a single file:
line_count=$(grep -c '^' my_file.txt)
Performance comparison: grep -c vs wc -l
/tmp# ls -l *txt
-rw-r--r-- 1 root root 721009809 Dec 29 22:09 x.txt
-rw-r----- 1 root root 809338646 Dec 29 22:10 xyz.txt
/tmp# time grep -c '^' *txt
x.txt:7558434
xyz.txt:8484396
real 0m12.742s
user 0m1.960s
sys 0m3.480s
/tmp/# time wc -l *txt
7558434 x.txt
8484396 xyz.txt
16042830 total
real 0m9.790s
user 0m0.776s
sys 0m2.576s
Similar to Mark Setchell's answer you can also use xargs with an explicit separator:
ls | xargs -I% wc -l %
Then xargs explicitly doesn't send all the inputs to wc, but one operand line at a time.
Shortest answer:
ls | xargs -l wc
What about using sed with the pattern removal option as below which would only remove the total line if it is present (but also any files with total in them).
wc -l $directory-path/*.txt | sort -rn | sed '/total/d'

What is the unix/linux way to divide the number of lines contained from two different files?

I have a file and I am processing it line by line and producing another file with the result. I want to monitor the percentage of completion. In my case, it is just the number of lines in the new file divide by the number of lines from the input file. A simple example would be:
$ cat infile
unix
is
awesome
$ cat infile | process.sh >> outfile &
Now, if I run my command, I should get 0.33 if process.sh completed the first line.
Any suggestions?
You can use pv for progress (in debian/ubuntu inside package pv):
pv -l -s `wc -l file.txt` file.txt | process.sh
This will use number of lines for progress.
Or you can use just the number of bytes:
pv file.txt | process.sh
The above commands will show you the percentage of completion and ETA.
You can use bc:
echo "scale=2; $(cat outfile | wc -l) / $(cat infile | wc -l) * 100" | bc
In addition, combine this with watch for updated progress:
watch -d "echo \"scale=2; \$(cat outfile | wc -l) / \$(cat infile | wc -l) * 100\" | bc"
TOTAL_LINES=`wc -l infile`
LINES=`wc -l outfile`
PERCENT=`echo "scale=2;${LINES}/${TOTAL_LINES}" | bc | sed -e 's_^\.__'`
echo "${PERCENT} % Complete"
scale=2 means you get two significant digits.

How to get count of file with T date as part of file name

consider todays date as 24/02/14
I have set of files in mention directory "/apps_kplus/KplusOpenReport/crs_scripts/rconvysya" and file names as
INTTRADIVBMM20142402
INTTRADIVBFX20142402
INTTRADIVBFI20142402
INTTRADIVBDE20142402
INTPOSIVBIR20142402
INTPOSIVBIR20142302
INTTRADIVBDE20142302
INTTRADIVBFI20142302
INTTRADIVBFX20142302
INTTRADIVBMM20142302
I wanted to get count of file with current date. for that i am using below query
#! /bin/bash
tm=$(date +%y%d%m)
x=$(ls /apps_kplus/KplusOpenReport/crs_scripts/rconvysya/INT*$tm.txt 2>/dev/null | wc -l)
if [ $x -eq 5 ]
then
exit 4
else
exit 3
fi
But not getting desired output.
what is wrong.
You're searching for files with a .txt extension, and the files you list have none. If you remove the .txt from the pathname, it works.
With Awk, anything is possible!
tm=$(date +%Y%d%m) # Capital Y is for the four digit year!
ls -a | awk -v tm=$tm '$0 ~ tm' | wc -l
This is the Great and Awful Awk Command. (Awful as in both full of awe and as in Starbucks Coffee).
The -v parameter allows me to set Awk variables. The $0 represents the entire line which is the file name from the ls command. The $0 ~ tm means I'm looking for files that contain the time string you specified.
I could do this too:
ls -a | awk "/$tm\$/"
Which lets the shell interpolate $tm into the Awk program. This is looking only for files that end in your $tm stamp. The /.../ by itself means matching the entire line. It's an even shorter Awk shortcut than I had before. Plus, it makes sure you're only matching the timestamp on the end of the file.
You can try the following awk:
awk -v d="$date" '{split(d,ary,/\//);if($0~"20"ary[3]ary[1]ary[2]) cnt++}END{print cnt}' file
where $date is your shell variable containing the date you wish to search for.
$ cat file
INTTRADIVBMM20142402
INTTRADIVBFX20142402
INTTRADIVBFI20142402
INTTRADIVBDE20142402
INTPOSIVBIR20142402
INTPOSIVBIR20142302
INTTRADIVBDE20142302
INTTRADIVBFI20142302
INTTRADIVBFX20142302
INTTRADIVBMM20142302
$ awk -v d="$date" '{split(d,ary,/\//);if($0~"20"ary[3]ary[1]ary[2]) cnt++}END{print cnt}' file
5
ls /apps_kplus/KplusOpenReport/crs_scripts/rconvysya | grep "`date +%Y%m%d`"| wc -l
This worked for me and it seems like the simplest solution, I hope it fits you needs
To get exit status 4 if there are 0k files present use code bellow:
0=`ls /apps_kplus/KplusOpenReport/crs_scripts/rconvysya | grep "`date +%Y%m%d`"| xargs -n 1 stat -c%s | grep "^0$" | wc -l`
if [ "$0" -eq "0" ]
then
exit 4
else
exit 3
fi

Find a maximum and minimum in subdirectory (less than 10 seconds with a shell script)

I have several files of this type:
Sensor Location Temp Threshold
------ -------- ---- ---------
#1 PROCESSOR_ZONE 23C/73F 62C/143F
#2 CPU#1 30C/86F 73C/163F
#3 I/O_ZONE 32C/89F 68C/154F
#4 CPU#2 22C/71F 73C/163F
#5 POWER_SUPPLY_BAY 17C/62F 55C/131F
There is approximately 124630 in several sub-directories
I try to determine the maximal and minimal temperature of PROCESSOR_ZONE
Here is my script at the moment:
#!/bin/bash
max_value=0
min_value=50
find $1 -name hp-temps.txt -exec grep "PROCESSOR_ZONE" {} + | sed -e 's/\ \+/,/g' | cut -d, -f3 | cut -dC -f1 | while read current_value ;
do
echo $current_value;
done
output after my script:
30
28
26
23
...
My script is not finished and it sets 10 minutes to show all the temperatures.
I think that to arrive there, I have to put the result of my command in a file, sort out it and get back the first line which is the max and the last one the min. But I do not know how to make it.
Instead of this bit:
... | while read current_value ;
do
echo $current_value;
done
just direct the output after cut to a file:
... > temperatures.txt
If you need them sorted, sort them first:
... | sort -n > temperatures.txt
Then the first line of the file will be the minimum temperature returned, and the last line will be the maximum temperature.
Performance suggestion:
This find command runs a new grep process on each file. If there are hundreds of thousands of these files in your directory, it will run grep hundreds of thousands of times. You can speed it up by telling find to run a grep command once for each batch of a few thousand files:
find $1 -name hp-temps.txt -print | xargs grep -h "PROCESSOR_ZONE" | sed ...
The find command prints the filenames out on standard output; the xargs command reads these in and runs grep on a batch of files at once. The -h option to grep means "don't include the filename in the output".
Running it this way should speed up your search considerably if there are thousands of files to be processed.
If your script is slow, you might want to analyse first which command is slow. Using find on for example Windows/Cygwin with a lot of files is going to be slow.
Perl is a perfect match for your problem:
find $1 -name hp-temps.txt -exec perl -ne '/PROCESSOR_ZONE\s+(\d+)C/ and print "$1\n"' {} +
In this way you do a (Perl) regular expression match on a lot of files at the same time. The brackets match the temperature digits (\d+) and $1 references that. The and makes sure print only is executed if the match succeeds.
You could even consider using opendir and readdir to recursively decend into directories in Perl to get rid of the find, but it is not going to be faster.
To get the minimum and maximum values:
find $1 -name hp-temps.txt -exec perl -ne 'if (/PROCESSOR_ZONE\s+(\d+)C/){ $min=$1 if $1<$min or $min == undef; $max=$1 if $1>$max }; sub END { print "$min - $max\n" }' {} +
With 100k+ lines of output on your terminal this should save quite a bit of time.
#!/bin/bash
max_value=0
min_value=50
find $1 -name file.txt -exec grep "PROCESSOR_ZONE" {} + | sed -e 's/\ \+/,/g' | cut -d, -f3 | cut -dC -f1 |
{
while read current_value ; do
#For maximum
if [[ $current_value -gt $max_value ]]; then
max_value=$current_value
fi
#For minimum
if [[ $current_value -lt $min_value ]]; then
min_value=$current_value
echo "new min $min_value"
fi
done
echo "NEW MAX : $max_value °C"
echo "NEW MIN : $min_value °C"
}

Linux Shell - String manipulation then calculating age of file in minutes

I am writing a script that calculates the age of the oldest file in a directory. The first commands run are:
OLDFILE=`ls -lt $DIR | grep "^-" | tail -1 `
echo $OLDFILE
The output contains a lot more than just the filename. eg
-rwxrwxr-- 1 abc abc 334 May 10 2011 ABCD_xyz20110510113817046.abc.bak
Q1/. How do I obtain the output after the last space of the above line? This would give me the filename. I realise some sort of string manipulation is required but am new to shell scripting.
Q2/. How do I obtain the age of this file in minutes?
To obtain just the oldest file's name,
ls -lt | awk '/^-/{file=$NF}END{print file}'
However, this is not robust if you have files with spaces in their names, etc. Generally, you should try to avoid parsing the output from ls.
With stat you can obtain a file's creation date in machine-readable format, expressed as seconds since Jan 1, 1970; with date +%s you can obtain the current time in the same format. Subtract and divide by 60. (More Awk skills would come in handy for the arithmetic.)
Finally, for an alternate solution, look at the options for find; in particular, its printf format strings allow you to extract a file's age. The following will directly get you the age in seconds and inode number of the oldest file:
find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1
Using the inode number avoids the issues of funny file names; once you have a single inode, converting that to a file name is a snap:
find . -maxdepth 1 -inum "$number"
Tying the two together, you might want something like this:
# set -- Replace $# with output from command
set -- $(find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1)
# now $1 is the timestamp and $2 is the inode
oldest_filename=$(find . -maxdepth 1 -inum "$2")
age_in_minutes=$(date +%s | awk -v d="$1" '{ print ($1 - d) / 60 }')
an awk solution, giving you how old the file is in minutes, (as your ls output does not contain the min of creation, so 00 is assumed by default). Also as tripleee pointed out, ls outputs are inherently risky to be parsed.
[[bash_prompt$]]$ echo $l; echo "##############";echo $l | awk -f test.sh ; echo "#############"; cat test.sh
-rwxrwxr-- 1 abc abc 334 May 20 2013 ABCD_xyz20110510113817046.abc.bak
##############
File ABCD_xyz20110510113817046.abc.bak is 2074.67 min old
#############
BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
date[d[o]]=sprintf("%02d",o)
}
}
{
month=date[$6];
old_time=mktime($8" "month" "$7" "00" "00" "00);
curr_time=systime();
print "File " $NF " is " (curr_time-old_time)/60 " min old";
}
For Q2 a bash one-liner could be:
let mtime_minutes=\(`date +%s`-`stat -c%Y "$file_to_inspect"`\)/60
You probably want ls -t. Including the '-l' option explicitly asks for ls to give you all that other info that you actually want to get rid of.
However what you probably REALLY want is find, first:
NEWEST=$(find . -maxdepth 1 -type f | cut -c3- |xargs ls -t| head -1)
this will get you the plain filename of the newest item, no other processing needed.
Directories will be correctly excluded. (thanks to tripleee for pointing out that's what you were aiming for.)
For the second question, you can use stat and bc:
TIME_MINUTES=$(stat --format %Y/60 "$NEWEST" |bc -l)
remove the '-l' option from bc if you only want a whole number of minutes.

Resources