how to find a last updated file with the prefix name in bash?

how to find a last updated file with the prefix name in bash? - linux

How can I find a last updated file with the specific prefix in bash?
For example, I have three files, and I just want to see a file that has "ABC" and where the last Last_updatedDateTime desc.
fileName Last_UpdatedDateTime
abc123 7/8/2020 10:34am
abc456 7/6/2020 10:34am
def123 7/8/2020 10:34am

You can list files sorted in the order they were modified with ls -t:
-t sort by modification time, newest first
You can use globbing (abc*) to match all files starting with abc.
Since you will get more than one match and only want the newest (that is first):
head -1
Combined:
ls -t abc* | head -1

If there are a lot of these files scattered across a variety of directories, find mind be better.
find -name abc\* -printf "%T# %f\n" |sort -nr|sed 's/^.* //; q;'
Breaking that out -
find -name 'abc*' -printf "%T# %f\n" |
find has a ton of options. This is the simplest case, assuming the current directory as the root of the search. You can add a lot of refinements, or just give / to search the whole system.
-name 'abc*' picks just the filenames you want. Quote it to protect any globs, but you can use normal globbing rules. -iname makes the search case-insensitive.
-printf defines the output. %f prints the filename, but you want it ordered on the date, so print that first for sorting so the filename itself doesn't change the order. %T accepts another character to define the date format - # is the unix epoch, seconds since 00:00:00 01/01/1970, so it is easy to sort numerically. On my git bash emulation it returns fractions as well, so it's great granularity.
$: find -name abc\* -printf "%T# %f\n"
1594219755.7741618000 abc123
1594219775.5162510000 abc321
1594219734.0162554000 abc456
find may not return them in the order you want, though, so -
sort -nr |
-n makes it a numeric sort. -r sorts in reverse order, so that the latest file will pop out first and you can ignore everything after that.
sed 's/^.* //; q;'
Since the first record is the one we want, sed can just use s/^.* //; to strip off everything up to the space, which we know will be the timestamp numbers since we controlled the output explicitly. That leaves only the filename. q explicitly quits after the s/// scrubs the record, so sed spits out the filename and stops without reading the rest, which prevents the need for another process (head -1) in the pipeline.

Related

How to sort by name then date modification in BASH

Lets say I have a folder of .txt files that have a dd-MM-yyyy_HH-mm-ss time followed by _name.txt. I want to be able to sort by name first then time after. Example:
BEFORE
15-2-2010_10-01-55_greg.txt
10-2-1999_10-01-55_greg.txt
10-2-1999_10-01-55_jason.txt
AFTER
greg_1_10-2-1999_10-01-55
greg_2_15-2-2010_10-01-55
jason_1_10-2-1999_10-01-55
Edit: Apologies, from my "cp" line I was meant to copy them into another directory with a different name to them.
Something I tried to do is make a copy with the count, but it doesn't sort the files with the same name properly in terms of dates:
cd data/unfilteredNames
for filename in *.txt; do
n=${filename%.*}
n=${filename##*_}
filteredName=${n%.*}
count=0
find . -type f -name "*_$n" | while read name; do
count=$(($count+1))
cp -p $name ../filteredNames/"$filteredName"_"$count"
done
done

Not sure that the renaming of files is one of your expectation. At least for only sorting file name, you don't need to.
You can do this by only using GNU sort command:
sort -t- -k5.4 -k3.1,3.4 -k2.1,2.1 -k1.1,1.2 -k3.6,3.13 <(printf "%s\n" *.txt)
-t sets the field separator to a dash -.
-k enables to sort based on fields. As explained in man sort page, the syntax is -k<start>,<stop> where <start> or is composed of <field number>.<position>. Adding several -k option to the command allows to sort on multiple fields; the first in he command line having more precedence than the other.
For example, the first -k5.4 tells to sort based on the 5th fields with an offset of 4 characters. There isn't a stop field because this is the end of the filename.
The -k3.1,3.4 option sorts based on the 3rd field starting from offset 1 to 4.
The same principle applies to other -k options.
In your example the month field only has 1 digit. If you have files with a month coded with 2 digits, you might want to pad with 0 all month filenames. This can be done by adding to the printf statement this <(... | sed 's/-0\?\([0-9]\)/-0\1/') and change the -k 2.1,2.1 by -k2.1,2.2.

Rename the most recent file in each group

i try to create a script that should detect the latest file of each group, and add prefix to its original name.
ll $DIR
asset_10.0.0.1_2017.11.19 #latest
asset_10.0.0.1_2017.10.28
asset_10.0.0.2_2017.10.02 #latest
asset_10.0.0.2_2017.08.15
asset_10.1.0.1_2017.11.10 #latest
...
2 questions:
1) how to find the latest file of each group?
2) how to rename adding only a prefix
I tried the following procedure, but it looks for the latest file in the entire directory, and doesn't keep the original name to add a prefix to it:
find $DIR -type f ! -name 'asset*' -print | sort -n | tail -n 1 | xargs -I '{}' cp -p '{}' $DIR...
What would be the best approach to achieve this? (keeping xargs if possible)

Selecting the latest entry in each group
You can use sort to select only the latest entry in each group:
find . -print0 | sort -r -z | sort -t_ -k2,2 -u -z | xargs ...
First, sort all files in reversed lexicographical order (so that the latest entry appears first for each group). Then, by sorting on group name only (that's second field -k2,2 when split on underscores via -t_) and printing unique groups we get only the first entry per each group, which is also the latest.
Note that this works because sort uses a stable sorting algorithm - meaning the order or already sorted items will not be altered by sorting them again. Also note we can't use uniq here because we can't specify a custom field delimiter for uniq (it's always whitespace).
Copying with prefix
To add prefix to each filename found, we need to split each path find produces to a directory and a filename (basename), because we need to add prefix to filename only. The xargs part above could look like:
... | xargs -0 -I '{}' sh -c 'd="${1%/*}"; f="${1##*/}"; cp -p "$d/$f" "$d/prefix_$f"' _ '{}'
Path splitting is done with shell parameter expansion, namely prefix (${1##*/}) and suffix (${1%/*}) substring removal.
Note the use of NUL-terminated output (paths) in find (-print0 instead of -print), and the accompanying use of -z in sort and -0 in xargs. That way the complete pipeline will properly handle filenames (paths) with "special" characters like newlines and similar.

If you want to do this in bash alone, rather than using external tools like find and sort, you'll need to parse the "fields" in each filename.
Something like this might work:
declare -A o=() # declare an associative array (req bash 4)
for f in asset_*; do # step through the list of files,
IFS=_ read -a a <<<"$f" # assign filename elements to an array
b="${a[0]}_${a[1]}" # define a "base" of the first two elements
if [[ "${a[2]}" > "${o[$b]}" ]]; then # compare the date with the last value
o[$b]="${a[2]}" # for this base and reassign if needed
fi
done
for i in "${!o[#]}"; do # now that we're done, step through results
printf "%s_%s\n" "$i" "${o[$i]}" # and print them.
done
This doesn't exactly sort, it just goes through the list of files and grabs the highest sorting value for each filename base.

Building a file index in Linux

I have a filesystem with deeply nested directories. Inside the bottom level directory for any node in the tree is a directory whose name is the guid of a record in a database. This folder contains the binary file(s) (pdf, jpg, etc) that are attached to that record.
Two Example paths:
/g/camm/MOUNT/raid_fs0/FOO/042014/27/123.456.789/04.20.30--27.04.2014--RJ123.pdf
/g/camm/MOUNT/raid_fs1/FOO/052014/22/321.654.987/04.20.30--27.04.2014--RJ123.pdf
In the above example, 123.456.789 and 321.654.987 are guids
I want to build an index of the complete filesystem so that I can create a lookup table in my database to easily map the guid of the record to the absolute path(s) of its attached file(s).
I can easily generate a straight list of files with:
find /g/camm/MOUNT -type f > /g/camm/MOUNT/files.index
but I want to parse the output of each file path into a CSV file which looks like:
GUID ABSOLUTEPATH FILENAME
123.456.789 /g/camm/MOUNT/raid_fs0/FOO/042014/27/123.456.789/04.20.30--27.04.2014--RJ123.pdf 04.20.30--27.04.2014--RJ123.pdf
321.654.987 /g/camm/MOUNT/raid_fs1/FOO/052014/22/321.654.987/04.20.30--27.04.2014--RJ123.pdf 04.20.30--27.04.2014--RJ123.pdf
I think I need to pipe the output of my find command into xargs and again into awk to process each line of the output into the desired format for the CSV output... but I can't make it work...

Wait for your long-running find to finish, then you
can pass the list of filenames through awk:
awk -F/ '{printf "%s,%s,%s\n",$(NF-1),$0,$NF}' /g/camm/MOUNT/files.index
and this will convert lines like
/g/camm/MOUNT/raid_fs0/FOO/042014/27/123.456.789/04.20.30--27.04.2014--RJ123.pdf
into
123.456.789,/g/camm/MOUNT/raid_fs0/FOO/042014/27/123.456.789/04.20.30--27.04.2014--RJ123.pdf,04.20.30--27.04.2014--RJ123.pdf
The -F/ splits the line into fields using "/" as separator, NF is the
number of fields, so $NF means the last field, and $(NF-1) the
next-to-last, which seems to be the directory you want in the first column
of the output. I used "," in the printf to separate the output columns, as
is typical in a csv; you can replace it by any character such as space or ";".

I dont think there can be anything much faster than your find command, but
you may be interested by the locate package. It uses the updatedb command, usually run each night by cron, to traverse the filesystem and creates a file holding all the filenames in a manner than can be easily searched by another command.
The locate command is used to read the database to find matching directories, files, and so on, even using glob wild-card or regex pattern matching. Once tried, it is hard to live without it.
For example, on my system locate -S lists the statistics:
Database /var/lib/mlocate/mlocate.db:
59945 directories
505330 files
30401572 bytes in file names
12809265 bytes used to store database
and I can do
locate rc-dib0700-nec.ko
locate -r rc-.*-nec.ko
locate '*/media/*rc-*-nec.ko*'
to find files like /usr/lib/modules/4.1.6-100.fc21.x86_64/kernel/drivers/media/rc/keymaps/rc-dib0700-nec.ko.xz in no time at all.

You can nearly do what you want with the find's -printf option.
The difficuty is on GUID.
Assuming prefixes have the same length as in your example, I would probably do:
find /g/camm/MOUNT -type f -printf "%h %p %f\n" | colrm 1 37 > /g/camm/MOUNT/files.index
Or if the number of / is constant
find /g/camm/MOUNT -type f -printf "%h %p %f\n" | cut -d '/' -f 9- > /g/camm/MOUNT/files.index
Otherwise, I would use sed:
find /g/camm/MOUNT -type f -printf "%h %p %f\n" | sed -e 's#^.*/\(.*\) #\1 #' > /g/camm/MOUNT/files.index

Listing entries in a directory using grep

I'm trying to list all entries in a directory whose names contain ONLY upper-case letters. Directories need "/" appended.
#!/bin/bash
cd ~/testfiles/
ls | grep -r *.*
Since grep by default looks for upper-case letters only (right?), I'm just recursively searching through the directories under testfiles for all names who contain only upper-case letters.
Unfortunately this doesn't work.
As for appending directories, I'm not sure why I need to do this. Does anyone know where I can start with some detailed explanations on what I can do with grep? Furthermore how to tackle my problem?

No, grep does not only consider uppercase letters.
Your question I a bit unclear, for example:
from your usage of the -r option, it seems you want to search recursively, however you don't say so. For simplicity I assume you don't need to; consider looking into #twm's answer if you need recursion.
you want to look for uppercase (letters) only. Does that mean you don't want to accept any other (non letter) characters, but which are till valid for file names (like digits or dashes, dots, etc.)
since you don't say th it i not permissible to have only on file per line, I am assuming it is OK (thus using ls -1).
The naive solution would be:
ls -1 | grep "^[[:upper:]]\+$"
That is, print all lines containing only uppercase letters. In my TEMP directory that prints, for example:
ALLBIG
LCFEM
WPDNSE
This however would exclude files like README.TXT or FILE001, which depending on your requirements (see above) should most likely be included.
Thus, a better solution would be:
ls -1 | grep -v "[[:lower:]]\+"
That is, print all lines not containing an lowercase letter. In my TEMP directory that prints for example:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM
WPDNSE
~DFA0214428CD719AF6.TMP
Finally, to "properly mark" directories with a trailing '/', you could use the -F (or --classify) option.
ls -1F | grep -v "[[:lower:]]\+"
Again, example output:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM/
WPDNSE/
~DFA0214428CD719AF6.TMP
Note a different option would to be use find, if you can live with the different output (e.g. find ! -regex ".*[a-z].*"), but that will have a different output.

The exact regular expression depend on the output format of your ls command. Assuming that you do not use an alias for ls, you can try this:
ls -R | grep -o -w "[A-Z]*"
note that with -R in ls you will recursively list directories and files under the current directory. The grep option -o tells grep to only print the matched part of the text. The -w options tell grep to consider as match only for whole words. The "[A-Z]*" is a regexp to filter only upper-cased words.
Note that this regexp will print TEST.txt as well as TEXT.TXT. In other words, it will only consider names that are formed by letters.

It's ls which lists the files, not grep, so that is where you need to specify that you want "/" appended to directories. Use ls --classify to append "/" to directories.
grep is used to process the results from ls (or some other source, generally speaking) and only show lines that match the pattern you specify. It is not limited to uppercase characters. You can limit it to just upper case characters and "/" with grep -E '^[A-Z/]*$ or if you also want numbers, periods, etc. you could instead filter out lines that contain lowercase characters with grep -v -E [a-z].
As grep is not the program which lists the files, it is not where you want to perform the recursion. ls can list paths recursively if you use ls -R. However, you're just going to get the last component of the file paths that way.
You might want to consider using find to handle the recursion. This works for me:
find . -exec ls -d --classify {} \; | egrep -v '[a-z][^/]*/?$'
I should note, using ls --classify to append "/" to the end of directories may also append some other characters to other types of paths that it can classify. For instance, it may append "*" to the end of executable files. If that's not OK, but you're OK with listing directories and other paths separately, this could be worked around by running find twice - once for the directories and then again for other paths. This works for me:
find . -type d | egrep -v '[a-z][^/]*$' | sed -e 's#$#/#'
find . -not -type d | egrep -v '[a-z][^/]*$'

How to tell how many files match description with * in unix

Pretty simple question: say I have a set of files:
a1.txt
a2.txt
a3.txt
b1.txt
And I use the following command:
ls a*.txt
It will return:
a1.txt a2.txt a3.txt
Is there a way in a bash script to tell how many results will be returned when using the * pattern. In the above example if I were to use a*.txt the answer should be 3 and if I used *1.txt the answer should be 2.

Comment on using ls:
I see all the other answers attempt this by parsing the output of
ls. This is very unpredictable because this breaks when you have
file names with "unusual characters" (e.g. spaces).
Another pitfall would be, it is ls implementation dependent. A
particular implementation might format output differently.
There is a very nice discussion on the pitfalls of parsing ls output on the bash wiki maintained by Greg Wooledge.
Solution using bash arrays
For the above reasons, using bash syntax would be the more reliable option. You can use a glob to populate a bash array with all the matching file names. Then you can ask bash the length of the array to get the number of matches. The following snippet should work.
files=(a*.txt) && echo "${#files[#]}"
To save the number of matches in a variable, you can do:
files=(a*.txt)
count="${#files[#]}"
One more advantage of this method is you now also have the matching files in an array which you can iterate over.
Note: Although I keep repeating bash syntax above, I believe the above solution applies to all sh-family of shells.

You can't know ahead of time, but you can count how many results are returned. I.e.
ls -l *.txt | wc -l
ls -l will display the directory entries matching the specified wildcard, wc -l will give you the count.
You can save the value of this command in a shell variable with either
num=$(ls * | wc -l)
or
num=`ls -l *.txt | wc -l`
and then use $num to access it. The first form is preferred.

You can use ls in combination with wc:
ls a*.txt | wc -l
The ls command lists the matching files one per line, and wc -l counts the number of lines.

I like suvayu's answer, but there's no need to use an array:
count() { echo $#; }
count *

In order to count files that might have unpredictable names, e.g. containing new-lines, non-printable characters etc., I would use the -print0 option of find and awk with RS='\0':
num=$(find . -maxdepth 1 -print0 | awk -v RS='\0' 'END { print NR }')
Adjust the options to find to refine the count, e.g. if the criteria is files starting with a lower-case a with .txt extension in the current directory, use:
find . -type f -name 'a*.txt' -maxdepth 1 -print0

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string