I am writing a script that calculates the age of the oldest file in a directory. The first commands run are:
OLDFILE=`ls -lt $DIR | grep "^-" | tail -1 `
echo $OLDFILE
The output contains a lot more than just the filename. eg
-rwxrwxr-- 1 abc abc 334 May 10 2011 ABCD_xyz20110510113817046.abc.bak
Q1/. How do I obtain the output after the last space of the above line? This would give me the filename. I realise some sort of string manipulation is required but am new to shell scripting.
Q2/. How do I obtain the age of this file in minutes?
To obtain just the oldest file's name,
ls -lt | awk '/^-/{file=$NF}END{print file}'
However, this is not robust if you have files with spaces in their names, etc. Generally, you should try to avoid parsing the output from ls.
With stat you can obtain a file's creation date in machine-readable format, expressed as seconds since Jan 1, 1970; with date +%s you can obtain the current time in the same format. Subtract and divide by 60. (More Awk skills would come in handy for the arithmetic.)
Finally, for an alternate solution, look at the options for find; in particular, its printf format strings allow you to extract a file's age. The following will directly get you the age in seconds and inode number of the oldest file:
find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1
Using the inode number avoids the issues of funny file names; once you have a single inode, converting that to a file name is a snap:
find . -maxdepth 1 -inum "$number"
Tying the two together, you might want something like this:
# set -- Replace $# with output from command
set -- $(find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1)
# now $1 is the timestamp and $2 is the inode
oldest_filename=$(find . -maxdepth 1 -inum "$2")
age_in_minutes=$(date +%s | awk -v d="$1" '{ print ($1 - d) / 60 }')
an awk solution, giving you how old the file is in minutes, (as your ls output does not contain the min of creation, so 00 is assumed by default). Also as tripleee pointed out, ls outputs are inherently risky to be parsed.
[[bash_prompt$]]$ echo $l; echo "##############";echo $l | awk -f test.sh ; echo "#############"; cat test.sh
-rwxrwxr-- 1 abc abc 334 May 20 2013 ABCD_xyz20110510113817046.abc.bak
##############
File ABCD_xyz20110510113817046.abc.bak is 2074.67 min old
#############
BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
date[d[o]]=sprintf("%02d",o)
}
}
{
month=date[$6];
old_time=mktime($8" "month" "$7" "00" "00" "00);
curr_time=systime();
print "File " $NF " is " (curr_time-old_time)/60 " min old";
}
For Q2 a bash one-liner could be:
let mtime_minutes=\(`date +%s`-`stat -c%Y "$file_to_inspect"`\)/60
You probably want ls -t. Including the '-l' option explicitly asks for ls to give you all that other info that you actually want to get rid of.
However what you probably REALLY want is find, first:
NEWEST=$(find . -maxdepth 1 -type f | cut -c3- |xargs ls -t| head -1)
this will get you the plain filename of the newest item, no other processing needed.
Directories will be correctly excluded. (thanks to tripleee for pointing out that's what you were aiming for.)
For the second question, you can use stat and bc:
TIME_MINUTES=$(stat --format %Y/60 "$NEWEST" |bc -l)
remove the '-l' option from bc if you only want a whole number of minutes.
Related
I want to rename multiple jpg files in a directory so they have 9 digit sequence number. I also want the files to be sorted by date from oldest to newest. I came up with this:
ls -tr | nl -v 100000000 | while read n f; do mv "$f" "$n.jpg"; done
this renames the files as I want them but the sequence numbers do not follow the date. I have also tried doing
ls -tr | cat -n .....
but that does not allow me to sepecify the starting sequence number.
Any suggestions what's wrong with my syntax?
Any other ways of achieving my goal?
Thanks
If any of your filename contains a whitespace, you can use the following:
i=100000000
find -type f -printf '%T# %p\0' | \
sort -zk1nr | \
sed -z 's/^[^ ]* //' | \
xargs -0 -I % echo % | \
while read f; do
mv "$f" "$(printf "%09d" $i).jpg"
let i++
done
Note that this doesn't use ls for parsing, but uses the null byte as field separator in the different commands, respectively set as \0, -z, -0.
The find command prints the file time together with the name.
Then the file are sorted and sed removes the timestamp. xargs is giving the filenames to the mv command through read.
DIR="/tmp/images"
FILELIST=$(ls -tr ${DIR})
n=1
for file in ${FILELIST}; do
printf -v digit "%09d" $n
mv "$DIR/${file}" "$DIR/${digit}.jpg"
n=$[n + 1]
done
Something like this? Then you can use n to sepecify the starting sequence number. However, if you have spaces in your file names this would not work.
If using external tool is acceptable, you can use rnm:
rnm -ns '/i/.jpg' -si 100000000 -s/mt *.jpg
-ns: Name string (new name).
/i/: Index (A name string rule).
-si: Option that sets starting index.
-s/mt: Sort according to modification time.
If you want an arbitrary increment value:
rnm -ns '/i/.jpg' -si 100000000 -inc 45 -s/mt *.jpg
-inc: Specify an increment value.
I have several files of this type:
Sensor Location Temp Threshold
------ -------- ---- ---------
#1 PROCESSOR_ZONE 23C/73F 62C/143F
#2 CPU#1 30C/86F 73C/163F
#3 I/O_ZONE 32C/89F 68C/154F
#4 CPU#2 22C/71F 73C/163F
#5 POWER_SUPPLY_BAY 17C/62F 55C/131F
There is approximately 124630 in several sub-directories
I try to determine the maximal and minimal temperature of PROCESSOR_ZONE
Here is my script at the moment:
#!/bin/bash
max_value=0
min_value=50
find $1 -name hp-temps.txt -exec grep "PROCESSOR_ZONE" {} + | sed -e 's/\ \+/,/g' | cut -d, -f3 | cut -dC -f1 | while read current_value ;
do
echo $current_value;
done
output after my script:
30
28
26
23
...
My script is not finished and it sets 10 minutes to show all the temperatures.
I think that to arrive there, I have to put the result of my command in a file, sort out it and get back the first line which is the max and the last one the min. But I do not know how to make it.
Instead of this bit:
... | while read current_value ;
do
echo $current_value;
done
just direct the output after cut to a file:
... > temperatures.txt
If you need them sorted, sort them first:
... | sort -n > temperatures.txt
Then the first line of the file will be the minimum temperature returned, and the last line will be the maximum temperature.
Performance suggestion:
This find command runs a new grep process on each file. If there are hundreds of thousands of these files in your directory, it will run grep hundreds of thousands of times. You can speed it up by telling find to run a grep command once for each batch of a few thousand files:
find $1 -name hp-temps.txt -print | xargs grep -h "PROCESSOR_ZONE" | sed ...
The find command prints the filenames out on standard output; the xargs command reads these in and runs grep on a batch of files at once. The -h option to grep means "don't include the filename in the output".
Running it this way should speed up your search considerably if there are thousands of files to be processed.
If your script is slow, you might want to analyse first which command is slow. Using find on for example Windows/Cygwin with a lot of files is going to be slow.
Perl is a perfect match for your problem:
find $1 -name hp-temps.txt -exec perl -ne '/PROCESSOR_ZONE\s+(\d+)C/ and print "$1\n"' {} +
In this way you do a (Perl) regular expression match on a lot of files at the same time. The brackets match the temperature digits (\d+) and $1 references that. The and makes sure print only is executed if the match succeeds.
You could even consider using opendir and readdir to recursively decend into directories in Perl to get rid of the find, but it is not going to be faster.
To get the minimum and maximum values:
find $1 -name hp-temps.txt -exec perl -ne 'if (/PROCESSOR_ZONE\s+(\d+)C/){ $min=$1 if $1<$min or $min == undef; $max=$1 if $1>$max }; sub END { print "$min - $max\n" }' {} +
With 100k+ lines of output on your terminal this should save quite a bit of time.
#!/bin/bash
max_value=0
min_value=50
find $1 -name file.txt -exec grep "PROCESSOR_ZONE" {} + | sed -e 's/\ \+/,/g' | cut -d, -f3 | cut -dC -f1 |
{
while read current_value ; do
#For maximum
if [[ $current_value -gt $max_value ]]; then
max_value=$current_value
fi
#For minimum
if [[ $current_value -lt $min_value ]]; then
min_value=$current_value
echo "new min $min_value"
fi
done
echo "NEW MAX : $max_value °C"
echo "NEW MIN : $min_value °C"
}
I'm trying to to wc -l an entire directory and then display the filename in an echo with the number of lines.
To add to my frustration, the directory has to come from a passed argument. So without looking stupid, can someone first tell me why a simple wc -l $1 doesn't give me the line count for the directory I type in the argument? I know i'm not understanding it completely.
On top of that I need validation too, if the argument given is not a directory or there is more than one argument.
wc works on files rather than directories so, if you want the word count on all files in the directory, you would start with:
wc -l $1/*
With various gyrations to get rid of the total, sort it and extract only the largest, you could end up with something like (split across multiple lines for readability but should be entered on a single line):
pax> wc -l $1/* 2>/dev/null
| grep -v ' total$'
| sort -n -k1
| tail -1l
2892 target_dir/big_honkin_file.txt
As to the validation, you can check the number of parameters passed to your script with something like:
if [[ $# -ne 1 ]] ; then
echo 'Whoa! Wrong parameteer count'
exit 1
fi
and you can check if it's a directory with:
if [[ ! -d $1 ]] ; then
echo 'Whoa!' "[$1]" 'is not a directory'
exit 1
fi
Is this what you want?
> find ./test1/ -type f|xargs wc -l
1 ./test1/firstSession_cnaiErrorFile.txt
77 ./test1/firstSession_cnaiReportFile.txt
14950 ./test1/exp.txt
1 ./test1/test1_cnaExitValue.txt
15029 total
so your directory which is the argument should go here:
find $your_complete_directory_path/ -type f|xargs wc -l
I'm trying to to wc -l an entire directory and then display the
filename in an echo with the number of lines.
You can do a find on the directory and use -exec option to trigger wc -l. Something like this:
$ find ~/Temp/perl/temp/ -exec wc -l '{}' \;
wc: /Volumes/Data/jaypalsingh/Temp/perl/temp/: read: Is a directory
11 /Volumes/Data/jaypalsingh/Temp/perl/temp//accessor1.plx
25 /Volumes/Data/jaypalsingh/Temp/perl/temp//autoincrement.pm
12 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless1.plx
14 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless2.plx
22 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr1.plx
27 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr2.plx
7 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee1.pm
18 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee2.pm
26 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee3.pm
12 /Volumes/Data/jaypalsingh/Temp/perl/temp//ftp.plx
14 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit1.plx
16 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit2.plx
24 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit3.plx
33 /Volumes/Data/jaypalsingh/Temp/perl/temp//persisthash.pm
Nice question!
I saw the answers. Some are pretty good. The find ...|xrags is my most preferred. It could be simplified anyway using find ... -exec wc -l {} + syntax. But there is a problem. When the command line buffer is full a wc -l ... is called and every time a <number> total line is printer. As wc has no arg to disable this feature wc has to be reimplemented. To filter out these lines with grep is not nice:
So my complete answer is
#!/usr/bin/bash
[ $# -ne 1 ] && echo "Bad number of args">&2 && exit 1
[ ! -d "$1" ] && echo "Not dir">&2 && exit 1
find "$1" -type f -exec awk '{++n[FILENAME]}END{for(i in n) printf "%8d %s\n",n[i],i}' {} +
Or using less temporary space, but a little bit larger code in awk:
find "$1" -type f -exec awk 'function pr(){printf "%8d %s\n",n,f}FNR==1{f&&pr();n=0;f=FILENAME}{++n}END{pr()}' {} +
Misc
If it should not be called for subdirectories then add -maxdepth 1 before -type to find.
It is pretty fast. I was afraid that it would be much slower then the find ... wc + version, but for a directory containing 14770 files (in several subdirs) the wc version run 3.8 sec and awk version run 5.2 sec.
awk and wc consider the not \n ended lines differently. The last line ended with no \n is not counted by wc. I prefer to count it as awk does.
It does not print the empty files
To find the file with most lines in the current directory and its subdirectories, with zsh:
lines() REPLY=$(wc -l < "$REPLY")
wc -l -- **/*(D.nO+lined[1])
That defines a lines function which is going to be used as a glob sorting function that returns in $REPLY the number of lines of the file whose path is given in $REPLY.
Then we use zsh's recursive globbing **/* to find regular files (.), numerically (n) reverse sorted (O) with the lines function (+lines), and select the first one [1]. (D to include dotfiles and traverse dotdirs).
Doing it with standard utilities is a bit tricky if you don't want to make assumptions on what characters file names may contain (like newline, space...). With GNU tools as found on most Linux distributions, it's a bit easier as they can deal with NUL terminated lines:
find . -type f -exec sh -c '
for file do
size=$(wc -c < "$file") &&
printf "%s\0" "$size:$file"
done' sh {} + |
tr '\n\0' '\0\n' |
sort -rn |
head -n1 |
tr '\0' '\n'
Or with zsh or GNU bash syntax:
biggest= max=-1
find . -type f -print0 |
{
while IFS= read -rd '' file; do
size=$(wc -l < "$file") &&
((size > max)) &&
max=$size biggest=$file
done
[[ -n $biggest ]] && printf '%s\n' "$max: $biggest"
}
Here's one that works for me with the git bash (mingw32) under windows:
find . -type f -print0| xargs -0 wc -l
This will list the files and line counts in the current directory and sub dirs. You can also direct the output to a text file and import it into Excel if needed:
find . -type f -print0| xargs -0 wc -l > fileListingWithLineCount.txt
I have a file structure that looks like this
./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin
and I would like to find the file path to the .bin file in each directory which have the highest number as filename. So the output I am looking for would be
./501.res/1.bin
./503.res/2.bin
./504.res/1.bin
The highest number a file can have is 9.
Question
How do I do that in BASH?
I have come as far as find .|grep bin|sort
Globs are guaranteed to be expanded in lexical order.
for dir in ./*/
do
files=($dir/*) # create an array
echo "${files[#]: -1}" # access its last member
done
What about using awk? You can get the FIRST occurrence really simply:
[ghoti#pc ~]$ cat data1
./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti#pc ~]$ awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' data1
./501.res/1.bin
./503.res/1.bin
./504.res/1.bin
[ghoti#pc ~]$
To get the last occurrence you could pipe through a couple of sorts:
[ghoti#pc ~]$ sort -r data1 | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort
./501.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti#pc ~]$
Given that you're using "find" and "grep", you could probably do this:
find . -name \*.bin -type f -print | sort -r | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort
How does this work?
The find command has many useful options, including the ability to select your files by glob, select the type of file, etc. Its output you already know, and that becomes the input to sort -r.
First, we sort our input data in reverse (sort -r). This insures that within any directory, the highest numbered file will show up first. That result gets fed into awk. FS is the field separator, which makes $2 into things like "/501", "/502", etc. Awk scripts have sections in the form of condition {action} which get evaluated for each line of input. If a condition is missing, the action runs on every line. If "1" is the condition and there is no action, it prints the line. So this script is broken out as follows:
a[$2] {next} - If the array a with the subscript $2 (i.e. "/501") exists, just jump to the next line. Otherwise...
{a[$2]=1} - set the array a subscript $2 to 1, so that in future the first condition will evaluate as true, then...
1 - print the line.
The output of this awk script will be the data you want, but in reverse order. The final sort puts things back in the order you'd expect.
Now ... that's a lot of pipes, and sort can be a bit resource hungry when you ask it to deal with millions of lines of input at the same time. This solution will be perfectly sufficient for small numbers of files, but if you're dealing with large quantities of input, let us know, and I can come up with an all-in-one awk solution (that will take longer than 60 seconds to write).
UPDATE
Per Dennis' sage advice, the awk script I included above could be improved by changing it from
BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1
to
BEGIN{FS="."} $2 in a {next} {a[$2]} 1
While this is functionally identical, the advantage is that you simply define array members rather than assigning values to them, which may save memory or cpu depending on your implementation of awk. At any rate, it's cleaner.
Tested:
find . -type d -name '*.res' | while read dir; do
find "$dir" -maxdepth 1 | sort -n | tail -n 1
done
I came up with someting like this:
for dir in $(find . -mindepth 1 -type d | sort); do
file=$(ls "$dir" | sort | tail -n 1);
[ -n "$file" ] && (echo "$dir/$file");
done
Maybe it can be simpler
If invoking a shell from within find is an option try this
find * -type d -exec sh -c "echo -n './'; ls -1 {}/*.bin | sort -n -r | head -n 1" \;
And here is one liner
find . -mindepth 1 -type d | sort | sed -e "s/.*/ls & | sort | tail -n 1 | xargs -I{} echo &\/{}/" | bash
If i have a folder of text files, how can I get the average words per file using Bash commands?
I know that I can use wc -w to get words per file, but i'm unsure of how to get the total number of words across all files, and then divide that number by the number of text files
This recursively traverses the filesystem and counts all words and files. In the end it divides the total number of words by the number of files:
find . -type f -exec wc -w {} \; | awk '{numfiles=numfiles+1;total += $1} END{print total/numfiles}'
You can get total word count by:
cat *.txt | wc -w
and file number by:
ls *.txt | wc -l
Then you can devide them.
It is just a piece of advice. You might use Loops and Variable Assignment.
Huang's solution is very good, but will emit errors on any directories. And division is a bit of a pain, since all arithmetic in the shell is with integers. Here's a script that does what you want:
#!/bin/sh
for file in *; do
test -f "$file" || continue
c=$( wc -w "$file" | awk '{print $1}' )
: $(( total += $c ))
: $(( count += 1 ))
done
echo $total $count 10k / p | dc | sed 's/0*$//'
But bunting's awk solution is the way to go.