Extracting filename & modification time from ls output

Extracting filename & modification time from ls output - linux

I have a set of files in a directory. I want to extract just the filename without the absolute path & the modification timestamp from the ls output.
/apps/dir/file1.txt
/apps/dir/file2.txt
now from the ls output i extract out the fields for filename & timestamp
ls -ltr /apps/dir | awk '{print $6 $7 $8 $9}'
Sep 25 2013 /apps/dir/file1.txt
Dec 20 2013 /apps/dir/file2.txt
Dec 20 2013 /apps/dir/file3.txt
whereas i want it to be like
Sep 25 2013 file1
Dec 20 2013 file2
Dec 20 2013 file3
one solution can be to cd to that directory and run the command from there, but is there a solution possible without cd? I also used substr() but since filenames are not of equal length so passing a constant value to substr() function didn't work out.

With GNU find, you can do the following to get the filenames without path:
find /apps/dir -type f -printf "%f\n"
and as Kojiro mentioned in the comments, you can use %t or %T(format) to get modification time.
or do as BroSlow suggested
find /apps/dir -printf "%Ab %Ad %AY %f\n"
Do not try to do the following (It will break on filenames with spaces and even across different OS where ls -l representation has fewer/more columns:
ls -ltr /apps/dir | awk '{n=split($9,f,/\//);print $6,$7,$8,f[n]}'

Dont parse output of ls command, rather use stat:
stat -c%y filename
This will print the last modification time in human readable format
Or if using GNU date you could use date with a format parameter and the reference flag
date '+%b %d %Y' -r filename
You can use basename to get just the filename portion of the path:
basename /path/to/filename
Or as Kojiro suggested with parameter expansion:
To get just the filename:
filename="${filename##*/}"
And then to strip of extension, if any:
filename="${filename%.*}"
Putting it all together:
#!/usr/bin/env bash
for filename in *; do
timestamp=$(stat -c%y "$filename")
#Uncomment below for a neater timestamp
#timestamp="${timestamp%.*}"
filename="${filename##*/}"
filename="${filename%.*}"
echo "$timestamp $filename"
done

#!/bin/bash
files=$(find /apps/dir -maxdepth 1 -type f)
for i in $files; do
file=$(basename $i)
timestamp=$(stat -c%y $i)
printf "%-50s %s\n" "$timestamp" "$file"
done

If you want to reproduce the ls time format:
dir=/apps/dir
now=$(date +%s)
sixmo=$(date -d '6 months ago' +%s)
find "$dir" -maxdepth 1 -print0 |
while read -d '' -r filename; do
mtime=$(stat -c %Y "$filename")
if (( sixmo <= mtime && mtime <= now )); then
fmt="%b %d %H:%M"
else
fmt="%b %d %Y"
fi
printf "%12s %s\n" "$(date -d "#$mtime" "+$fmt")" "$(basename "$filename")"
done |
sort -k 4
Assuming the GNU set of tools

Related

How to combine `stat` and `md5sum` output line by line?

stat part:
$ find * -depth -exec stat --format '%n %U %G' {} + | sort -d > acl_file
$ cat acl_file
xfce4/desktop/icons screen0-3824x1033.rc john john
Code/CachedData/f30a9b73e8ffc278e71575118b6bf568f04587c8/index-ec362010a4d520491a88088c200c853d.code john john
VirtualBox/selectorwindow.log.6 john john
md5sum part:
$ find * -depth -exec md5sum {} + | sort -d > md5_file
$ cat md5_file
3da180c2d9d1104a17db0749d527aa4b xfce4/desktop/icons screen0-3824x1033.rc
3de44d64a6ce81c63f9072c0517ed3b9 Code/CachedData/f30a9b73e8ffc278e71575118b6bf568f04587c8/index-ec362010a4d520491a88088c200c853d.code
3f85bb5b59bcd13b4fc63d5947e51294 VirtualBox/selectorwindow.log.6
How to combine stat --format '%n %U %G' and md5sum and output to file line by line,such as:
3da180c2d9d1104a17db0749d527aa4b xfce4/desktop/icons screen0-3824x1033.rc john john
3de44d64a6ce81c63f9072c0517ed3b9 Code/CachedData/f30a9b73e8ffc278e71575118b6bf568f04587c8/index-ec362010a4d520491a88088c200c853d.code john john
3f85bb5b59bcd13b4fc63d5947e51294 VirtualBox/selectorwindow.log.6 john john

This is really just a minor variation on #Zilog80's solution. My time testing had it a few seconds faster by skipping reads on a smallish dataset of a few hundred files running on a windows laptop under git bash. YMMV.
mapfile -t lst< <( find . -type f -exec md5sum "{}" \; -exec stat --format '%U %G' "{}" \; )
for ((i=0; i < ${#lst[#]}; i++)); do if (( i%2 )); then echo "${lst[i]}"; else printf "%s " "${lst[i]}"; fi done | sort -d
edit
My original solution was pretty broken. It was skipping files in hidden subdirectories, and the printf botched filenames with spaces. If you don't have hidden directories to deal with, or if you want to skip those (e.g., you're working in a git repo and would rather skip the .git tree...), here's a rework.
shopt -s dotglob # check hidden files
shopt -s globstar # process at arbitrary depth
for f in **/*; do # this properly handles odd names
[[ -f "$f" ]] && echo "$(md5sum "$f") $(stat --format "%U %G" "$f")"
done | sort -d

The quickest way should be :
find * -type f -exec stat --format '%n %U %G' "{}" \; -exec md5sum "{}" \; |
{ while read -r line1 && read -r line2; do printf "%s %s\n" "${line2/ */}" "${line1}";done; } |
sort -d
We use two -exec to apply stat and md5sum file by file, then we read both output lines and use printf to format one output line by file with both the output of stat/ md5sum. We finally pipe the whole output to sort.
Warning: As we pipe the whole output to sort, you may to wait that all the stat/md5sum had been done before getting any output on a console.
And if only md5sum and not stat fails on a file (or vice versa), the output will be trashed.
Edit: A way a little safer for the output :
find * -type f -exec md5sum "{}" \; -exec stat --format '%n %U %G' "{}" \; |
{ while read -r line; do
mdsum="${line/[0-9a-f]* /}";
[ "${mdsum}" != "${line}" ] &&
{ mdsumdisp="${line% ${mdsum}}"; mdsumfile="${mdsum}"; } ||
{ [ "${line#${mdsumfile}}" != "${line}" ] &&
printf "%s %s\n" "${mdsumdisp}" "${line}"; };
done; } | sort -d
Here, at least, we check we have something like a md5sum on the expected line matching the file in the line.

Delete files older than the epoch date of a file in a list

I have a file which contains a full path of a filename (space separated) and the last column I put the change date of the file in epoch.
/data/owncloud/c/files/Walkthrough 2019/#25 SEC-C03/Group Enterprise.jpg 1569314988
I want to delete all space separated files which epoch number is smaller then 1568187717.
The script looks like this at the moment, but this if with the space separation can't work :(
#!/bin/bash
IFS=$'\n'
while read i
do printf "%s " "$i"
stat --format=%Z $i
done < <(find /data/owncloud/*/files -type f) > filelistwithchangeddate
filetodelete=expr `date +'%s'` - 2592000
The awk '{print $(NF)}' has the last column number so somehow need to compare the awk output with the filetodelete and delete the space separated files.
Update:
Something like this what it should be I think:
for i in `cat filelistwithchangeddate `
do
if [ $(awk '{print $(NF)}' $i) -lt $filetodelete ]
then
echo "this will be deleted:"
awk '{$NF=""}1' $i
fi
done
But need to fix somehow the spaces and run the delete

Ok, Thank you triplee, I think this will work:
IFS=$'\n'
while read i
do printf "%s " "$i"
stat --format=%Z $i
done < <(find /data/owncloud/*/files -type f) > /root/script/newpurge/filelistwithchangeddate
filetodelete=$(expr `date +'%s'` - 2592000)
awk -v epoch="$filetodelete" '$NF<epoch' /root/script/newpurge/filelistwithchangeddate > oldf
iles
awk '{$NF=""}1' /root/script/newpurge/oldfiles > marktodelete
sed -i "s/^/'/g" /root/script/newpurge/marktodelete
sed -i "s/[ ]\+$/'/g" /root/script/newpurge/marktodelete
for i in $(cat /root/script/newpurge/marktodelete)
do
rm -f $i
done

This answer is based on this:
This can easily done using find. Normally you would do:
$ find . -type f ! -newermt "#1569314988" -delete
but if you want to pick the time from a file (example from OP):
$ t=$(awk '{print NF}' file)
$ [[ "$t" != "" ]] && find . -type f ! -newermt "#${t}" -delete
See man find for details on the meaning of the flags and for extra modifications which might be needed.

Filter for files with embedded timestamp in the file name

Let's say I have these files:
file-foo-1514764800.log
file-foo-1514851200.log
file-foo-1514937600.log
file-foo-1515024000.log
file-foo-1515110400.log
file-bar-1514764800.log
file-bar-1514851200.log
file-bar-1514937600.log
file-bar-1515024000.log
file-bar-1515110400.log
The timestamps in the file names correspond to Jan 1st to Jan 5th. If I want to filter for files which have the timestamps in the range Jan 2nd up to Jan 4th, I would need to write an expression such as >= 1514851200 && <= 1515024000 (>= Jan 2nd && <= Jan 4th), and use it to filter on the third item in the file name, if we use - as the delimiter.
Note that in my case I can't rely on the modification date of the files, as they may have been modified at an arbitrary time. In such a case the solution is rather simple:
find . -maxdepth 1 -newermt "2018-01-04" ! -newermt "2017-01-06"
What's a simple way to solve this using bash (zsh is fine too), and common linux tools?

I've one idea, not perfect but rather easy. Go to the directory of log files and run command below:
for f in *.log; do m=${f/*-*-}; n=${m/.log}; [[ "$n" -ge 1514851200 && "$n" -le 1515024000 ]] && echo "$f"; done
More details about bash parameter expansion: ${PARAMETER/PATTERN/STRING}
I checked this command line working in bash 4.4.12 and zsh 5.4.2.

Extract the date part, convert to regular date an touch the files using that date. Then use a regular find command
for f in *.log; do
fdate=$(basename $f .log | cut -d '-' -f3)
touch -d "$(date -d #$fdate)" $f
done
# as you wrote
find . -maxdepth 1 -newermt "2018-01-04" ! -newermt "2017-01-06"

This is a bit of a hack, but I think you can do it with Awk.
awk '{ split(FILENAME, a, "-");
if (a[3] >= 1514851200 && a[3] <= 1515024000) print FILENAME;
nextfile }' /path/to/*
This obviously hardcodes an assumption about the number of dashes in the file name. Maybe you can use some other pattern to easily extract the date stamp if that's problematic (substr with an index calculated from the end of the filename?)

Here's a version using no shell scripting, which does not alter the modification times of the files (assumes no pipe '|' characters in the filenames):
find . -maxdepth 1 -regextype egrep -regex '^.*-[0-9]{1,10}\.log$'
| sed -r 's/^(.*-([0-9]{1,10}))\.log$/&|\2/'
| awk -F '|' '("/bin/date -Is -d#"$2 | getline line)
{if(line >= "2017-01-06" && line <= "2018-01-04"){print $1}else{next}}'

Using AWK to sort lines and columns together

I'm doing an assignment where I've been asked to find files between a certain size and sha1sum them. I've specifically been asked to provide the output in the following format:
Filename Sha1 File Size
all on one line. I'm not able to create additional files.
I've tried the following:
for listed in $(find /my/output -type f -size -20000c):
do
sha1sum "$listed" | awk '{print $1}'
ls -l "$listed" | awk '{print $9 $5}'
done
which gives me the required output fields, but not in the requested format, i.e.
sha1sum
filename filesize
Could anyone suggest a manner in which I'd be able to get all of this on a single line?
Thank you :)

If you use the stat command to avoid needing to parse the output of ls, you can simply echo all the values you need:
while IFS= read -r -d '' listed
do
echo "$listed" $(sha1sum "$listed") $(stat -c "%s" "$listed")
done < <(find /my/output -type f -size -20000c -print0)
Check your version of stat, though, the above is GNU. On OS X, e.g., would be
stat -f "%z" $listed

With single pipeline:
find /my/path-type f -size -20000c -printf "%s " -exec sha1sum {} \; | awk '{ print $3,$2,$1 }'
An exemplary output (as a result of my local test) in the needed format FILENAME SHA1 FILESIZE:
./GSE11111/GSE11111_RAW.tar 9ed615fcbcb0b771fcba1f2d2e0aef6d3f0f8a11 25446400
./artwork_tmp 3a43f1be6648cde0b30fecabc3fa795a6ae6d30a 40010166

Linux Shell - String manipulation then calculating age of file in minutes

I am writing a script that calculates the age of the oldest file in a directory. The first commands run are:
OLDFILE=`ls -lt $DIR | grep "^-" | tail -1 `
echo $OLDFILE
The output contains a lot more than just the filename. eg
-rwxrwxr-- 1 abc abc 334 May 10 2011 ABCD_xyz20110510113817046.abc.bak
Q1/. How do I obtain the output after the last space of the above line? This would give me the filename. I realise some sort of string manipulation is required but am new to shell scripting.
Q2/. How do I obtain the age of this file in minutes?

To obtain just the oldest file's name,
ls -lt | awk '/^-/{file=$NF}END{print file}'
However, this is not robust if you have files with spaces in their names, etc. Generally, you should try to avoid parsing the output from ls.
With stat you can obtain a file's creation date in machine-readable format, expressed as seconds since Jan 1, 1970; with date +%s you can obtain the current time in the same format. Subtract and divide by 60. (More Awk skills would come in handy for the arithmetic.)
Finally, for an alternate solution, look at the options for find; in particular, its printf format strings allow you to extract a file's age. The following will directly get you the age in seconds and inode number of the oldest file:
find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1
Using the inode number avoids the issues of funny file names; once you have a single inode, converting that to a file name is a snap:
find . -maxdepth 1 -inum "$number"
Tying the two together, you might want something like this:
# set -- Replace $# with output from command
set -- $(find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1)
# now $1 is the timestamp and $2 is the inode
oldest_filename=$(find . -maxdepth 1 -inum "$2")
age_in_minutes=$(date +%s | awk -v d="$1" '{ print ($1 - d) / 60 }')

an awk solution, giving you how old the file is in minutes, (as your ls output does not contain the min of creation, so 00 is assumed by default). Also as tripleee pointed out, ls outputs are inherently risky to be parsed.
[[bash_prompt$]]$ echo $l; echo "##############";echo $l | awk -f test.sh ; echo "#############"; cat test.sh
-rwxrwxr-- 1 abc abc 334 May 20 2013 ABCD_xyz20110510113817046.abc.bak
##############
File ABCD_xyz20110510113817046.abc.bak is 2074.67 min old
#############
BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
date[d[o]]=sprintf("%02d",o)
}
}
{
month=date[$6];
old_time=mktime($8" "month" "$7" "00" "00" "00);
curr_time=systime();
print "File " $NF " is " (curr_time-old_time)/60 " min old";
}

For Q2 a bash one-liner could be:
let mtime_minutes=\(`date +%s`-`stat -c%Y "$file_to_inspect"`\)/60

You probably want ls -t. Including the '-l' option explicitly asks for ls to give you all that other info that you actually want to get rid of.
However what you probably REALLY want is find, first:
NEWEST=$(find . -maxdepth 1 -type f | cut -c3- |xargs ls -t| head -1)
this will get you the plain filename of the newest item, no other processing needed.
Directories will be correctly excluded. (thanks to tripleee for pointing out that's what you were aiming for.)
For the second question, you can use stat and bc:
TIME_MINUTES=$(stat --format %Y/60 "$NEWEST" |bc -l)
remove the '-l' option from bc if you only want a whole number of minutes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extracting filename & modification time from ls output - linux

#!/bin/bash files=$(find /apps/dir -maxdepth 1 -type f) for i in $files; do file=$(basename $i) timestamp=$(stat -c%y $i) printf "%-50s %s\n" "$timestamp" "$file" done

Related

How to combine `stat` and `md5sum` output line by line?

Delete files older than the epoch date of a file in a list

Filter for files with embedded timestamp in the file name

Using AWK to sort lines and columns together

Linux Shell - String manipulation then calculating age of file in minutes

Categories

Resources