Proper way to format timestamp in bash - linux

So I need to create an output file for an outside contractor containing a list of filenames from a directory and creation dates in a specific format like this:
FileName YYYYmmDDHHMMSS
So far I've come up with:
find ./ -type f -printf " %f %a\n"
which returns:
FIleName Fri Apr 21 18:21:15.0458585800 2017
Or:
ls -l | awk {'print $9" "$6" "$7" "$8'}
which returns:
FileNAme Apr 21 18:21
But neither is quite the output i need as it needs to be purely numerical and include seconds.
Keeping in mind that the list of files could be very large so efficiency is a priority, how can i get this output?

Something like this
find ./ -type f -printf " %f %AY%Am%Ad%AH%AM%AS\n" |sed -e 's/\.[0-9]*$//'
(sed is needed to remove fractional part after seconds)
(Edit) with ls it will be:
ls -l --time-style=+%Y%m%d%H%M%S |awk {'print $7" "$6'}

Related

How to write a Linux shell script that removes files older than X days, but leaves the first file of the day by modification time?

As the title says, how could that shell script be implemented. I know its easy to find files and delete files older than v.gr. 29 days using:
find /some_folder/ -name "file_prefix*" -mtime +30 -exec rm {} \;
But how to add exception, that first file of each day by modification time is not removed?
Not the most elegant - but is a combination of a few answers - something like this will work:
d=2020-01-01
end_date=2020-02-03
while [ "$d" != $end_date ]; do
d2=$(date -I -d "$d - 1 day")
d=$(date -I -d "$d + 1 day")
echo $d2
echo $d
find -type f -newerct "${d2}" ! -newerct "${d}" -printf "%T# %Tc %p\n" | sort -n | tail -n +2 | awk '{print $9}' | xargs rm
done
I'd suggest adding paths and hashing out the xargs rm bit (just to print and double check what you're removing).
There's probably a more elegant way to do this other than the print stuff but it works.
For common filenames
find /some_folder/ -name "file_prefix*" -mtime +30 -printf '%TD %TT %p\n' |
sort |
awk '{if ($1==prevdate) print $3; prevdate=$1}' |
xargs rm
The find command will print %TD %TT %p, i.e. the last modification date followed by the last modification time and then the filepath (folder and filename).
The list is sorted by sort. Because of the date/time/filepath structure, this will sort by date then time so that the oldest files are printed first, which is important afterward.
awk parses each line and calls {if ($1==prevdate) print $3; prevdate=$1}. Because of the date/time/filepath structure, the date is $1, the time is $2 and the filepath is $3. This prints the filepath whenever the date if similar to the previously parsed date. So, the first file of the day is not printed out because its date differs from the date of the previous day, and all subsequent files of the same day are printed. Please note that prevdate is initially unassigned which is roughly equivalent to the null string. You can call this if you find it more readable:
awk 'BEGIN{prevdate=""} {if ($1==prevdate) print $3; prevdate=$1}'
Finally, xargs rm will call rm for each line from the standard input which, at this moment, contains the list of files printed by awk.
Handling spaces
If your filenames contain space characters, the solution can be slightly adjusted:
find /some_folder/ -name "file_prefix*" -mtime +30 -printf '%TD %TT %p\n' |
sort |
awk '{if ($1==prevdate) print; prevdate=$1}' |
cut -d ' ' -f3- |
xargs rm
awk prints the whole line instead of printing the filepath only, then the filename is extracted with cut -d ' ' -f3- before calling xargs rm.
Handling weird filenames
The above solutions do not work with filenames containing newlines and possibly won’t work with backslashes either.
I assume you won’t run into these issues because if you want to clean up the directory, chances are you already know what’s inside this directory, and these are probably log files or other type of files created automatically.
However, should you need to handle all types of filenames, the command below will do the trick:
unset prevdate currentdate filepath
find /some_folder/ -name "file_prefix*" -mtime +30 -printf '%TD %TT %p\0' |
sort -z |
while IFS= read -r -d '' line
do
currentdate=${line%% *}
if [ "$currentdate" = "$prevdate" ]
then
filepath=$(cut -d ' ' -f3- <<< $line)
rm -- "$filepath"
fi
prevdate=$currentdate
done
It behaves essentially like the initial solution but strings are separated by the null character (which is a forbidden character in a filename) instead of the traditional newline separation.
find outputs results with %TD %TT %p\0 instead of %TD %TT %p\n which means results are null-separated, then sort -z makes use of this null-separated result, and finally the while loop is a rewrite of the awk script, but makes use of null-separated strings (which is almost impossible to do with awk). There is no call to xargs rm because rm is called directly inside the while loop.
While the ability to handle all types of filenames is tempting, please note that this solution is significantly less efficient than the other solutions. The code that I wrote is non-optimal for educational purposes, but it would still be slower even if I optimized it.
Same date and time
If several "first file of the day" occur at the exact same time within a same day, it will only skip the first file with the "lowest" file path, i.e. sorted by its alphanumeric characters. If you want to keep all first files of the day with the exact same time, this is slightly more complicated but this is doable.

Grep files with numeric extension

Consider a directory of 20 files numbered as follows:
ll *test*
> test.dat
> test.dat.1
> test.dat.2
...
> test.dat.20
A subset of the files that match to a given date can be found via
ll *test* | grep "Sep 29"
> test.dat
> test.dat.1
> test.dat.2
How can I search for a line pattern in ONLY this subset of files? I want to grep for the string WARNING in each line of the above three files. How can I tell grep to limit its search to only this subset?
-l option is made for that: list files that match
-L option does the opposite: list files that don't match
grep WARNING $(grep -l "Sep 29" *test.dat*)
EDIT
I misundrestood the question: you don't want to grep "WARNING" on files already containing "Sep 29", you want to grep "WARNING" on files last modified on Sep 29.
Therefore I suggest:
grep WARNING $(ll *test.dat* | grep "Sep 29")
But I wouldn't rely on ll output.
Use a subshell:
grep "WARNING" $(ll *test* | grep "Sep 29")
That way, the output of your command will become the <files_to_search_in> argument of your outer-most grep command.
Keep in mind that since you are using ll in your original command, the output of it will give you not only the file names you want, but other file details (permissions, date, etc). You might have to do further processing in your "inner" grep, so that the information passed to the outer-most grep command will be limited to file names.
While at it, consider doing your file filtering in your inner-most subshell with the find command (man page) instead of a combination of ll + grep: use the right tool for the job (:
Another way of doing this:
find . -type f -name "test.dat*" -newermt 2017-09-29 ! -newermt 2017-09-30 -exec grep WARNING {} \;
Details
-type f: searching file only
-name "test.dat*": only file beginning by "test.dat"
-newermt 2017-09-29 ! -newermt 2017-09-30: only file with modification date = 29 Spetember 2017
-exec grep WARNING {} \;: each time a file is found, execute grep WARNING on it

Get last modified file date in a folder structure

I'm trying to get the most recently modified file's (datetime - as a unixtimestamp) from a folder structure. There are many files but I only need the datetime of the most recently updated.
I'ved tried the following but I think I'm way of the mark:
stat --printf="%y %n\n" $(ls -tr $(find * -type f))
Try this:
ls -trF | grep -v '\/\|#' | tail -1 | xargs -i date +%s -r {}
ls -trF gives you symbols to filter out, '/' for directories and '#' for links. After that, grep out those files, pick the last one, and pass it to date command.
EDIT: Of note as well is the date -r option, which will display the last modified date of file given as argument.
something like this?
ls -ltr | tail -n1 | awk '{print "date -d\"" $6FS$7FS$8 "\" +%s"}' | sh
EDIT:
actually better yet,try the following
find -type f -exec ls -l --time-style +%s {} \+ | sort -n -k6 | tail -n1
this will iterate over the folder structure you desired, print the time as a unix timestamp and sort it so the newest is at the end. (hence tail -n1)

shell must parse ls -Al output and get last field (file or directory name) ANY SOLUTION

I must parse ls -Al output and get file or directory name
ls -Al output :
drwxr-xr-x 12 s162103 studs 12 march 28 2012 personal domain
drwxr-xr-x 2 s162103 studs 3 march 28 22:32 public_html
drwxr-xr-x 7 s162103 studs 8 march 28 13:59 WebApplication1
I should use only ls -Al | <something>
for example:
ls -Al | awk '{print $8}'
but this doesn't work because $8 is not name if there's spaces in directory name,it is a part of name. maybe there's some utilities that cut last name or delete anything before? I need to find any solution. Please, help!
EDITED: I know what parse ls -Al is bad idea but I should exactly parse it with construction above! No way to use some thing like this
for f in *; do
somecommand "$f"
done
Don't parse ls -Al, if all you need is the file name.
You can put all file names in an array:
files=( * )
or you can iterate over the files directly:
for f in *; do
echo "$f"
done
If there is something specific from ls that you need, update your question to specify what you need.
How about thisls -Al |awk '{$1=$2=$3=$4=$5=$6=$7=$8="";print $0}'
I know it's a cheap trick but since you don't want to use anything other than ls -Al I cant think anything better...
Based on #squiguy request on comments, I post my comment as an answer:
What about just this?
ls -1A
instead of l (L, the letter), a 1 (one, the number). It will only list the names of the files.
It's also worth noting that find can do what you're looking for:
Everything in this directory, equivalent to ls:
find . -maxdepth 1
Recursively, similar to ls -R:
find .
Only directories in a given directory :
find /path/to/some/dir -maxdepth 1 -type d
md5sum every regular file :
find . -type f -exec md5sum {} \;
Hope awk works for you:
ls -Al | awk 'NR>1{for(i=9;i<NF;i++)printf $i" ";print $i}'
In case you're interested in sed:
ls -Al | sed '1d;s/^\([^ ]* *\)\{8\}//'

Copy the three newest files under one directory (recursively) to another specified directory

I'm using bash.
Suppose I have a log file directory /var/myprogram/logs/.
Under this directory I have many sub-directories and sub-sub-directories that include different types of log files from my program.
I'd like to find the three newest files (modified most recently), whose name starts with 2010, under /var/myprogram/logs/, regardless of sub-directory and copy them to my home directory.
Here's what I would do manually
1. Go through each directory and do ls -lt 2010*
to see which files starting with 2010 are modified most recently.
2. Once I go through all directories, I'd know which three files are the newest. So I copy them manually to my home directory.
This is pretty tedious, so I wondered if maybe I could somehow pipe some commands together to do this in one step, preferably without using shell scripts?
I've been looking into find, ls, head, and awk that I might be able to use but haven't figured the right way to glue them together.
Let me know if I need to clarify. Thanks.
Here's how you can do it:
find -type f -name '2010*' -printf "%C#\t%P\n" |sort -r -k1,1 |head -3 |cut -f 2-
This outputs a list of files prefixed by their last change time, sorts them based on that value, takes the top 3 and removes the timestamp.
Your answers feel very complicated, how about
for FILE in find . -type d; do ls -t -1 -F $FILE | grep -v "/" | head -n3 | xargs -I{} mv {} ..; done;
or laid out nicely
for FILE in `find . -type d`;
do
ls -t -1 -F $FILE | grep -v "/" | grep "^2010" | head -n3 | xargs -I{} mv {} ~;
done;
My "shortest" answer after quickly hacking it up.
for file in $(find . -iname *.php -mtime 1 | xargs ls -l | awk '{ print $6" "$7" "$8" "$9 }' | sort | sed -n '1,3p' | awk '{ print $4 }'); do cp $file ../; done
The main command stored in $() does the following:
Find all files recursively in current directory matching (case insensitive) the name *.php and having been modified in the last 24 hours.
Pipe to ls -l, required to be able to sort by modification date, so we can have the first three
Extract the modification date and file name/path with awk
Sort these files based on datetime
With sed print only the first 3 files
With awk print only their name/path
Used in a for loop and as action copy them to the desired location.
Or use #Hasturkun's variant, which popped as a response while I was editing this post :)

Resources