Separating the time from a date from a string - linux

I have some files that contain in their name the following string: "20171011095942", which is the date and time "2017/10/11 09:59:42".
text_text_20171011095937_155.DAT.gz
text_text_20171011095942_156.DAT.gz
I need to select all files that start at the hour 09 and put them in another folder. If I use the command:
date -d '20171011095942' +'%R'
It says "invalid date". How can I separate the time from that string so I can then select only those files?
Thank you!

With find + mv commands:
find . -type f -regextype posix-egrep -regex ".*_2017101109[0-9]{4}_.*\.gz" -exec mv {} dest_dir/ \;
In the above command change dest_dir to your "another folder".
.*_2017101109[0-9]{4}_.*\.gz - regex pattern to match all filenames containing the needed sequence.
.* - matches any character(s)
_2017101109 - matches the crucial numeric sequence (<year><month><day><hours>)
[0-9]{4}_ - ensures that the above mentioned sequence if followed by 4 digits which point to <minutes><seconds>
\.gz - ensures a file extension to be .gz

Related

Linux count files with a specific string at a specific position in filename

I have a directory which contains data for several years and several months.
Filenames have the format yy/mm/dd, f.e.
20150415,
20170831,
20121205
How can I find all data with month = 3?
F.e.
20150302,
20160331,
20190315
Thanks for your help!
ls -ltra ????03??
A question mark is a wildcard which stands for one character, so as your format seems to be YYYYmmDD, the regular expression ????03?? should stand for all files having 03 as mm.
Edit
Apparently the files have format YYYYmmDDxxx, where xxx is the rest of the filename, having an unknown length. This would correspond with regular expression *, so instead of ????03?? you might use ????03??*.
As far as the find is concerned: the same regular expression holds here, but as you seem to be working inside a directory (no subdirectories, at first sight), you might consider the -maxdepth switch):
find . -name "????03??*" | wc -l // including subdirectories
find . -maxdepth 1 -name "????03??*" | wc -l // only current directory
I would highly advise you to check without wc -l first for checking the results. (Oh, I just see the switch -type f, that one might still be useful too :-) )

Locate files older than 7 days and contain the word "t" in the third char of the file name

I am trying to figure out how to find all the files that are older than 7 days and contain the letter "t" as the third character (of the filename).
I only figure out how to find the files that are older that 7 days:
find /home -mtime +7 -print
To restrict to filenames having a "t" in the third position, like "25t.txt" or "data-19.doc", add this clause:
-name "??t*"
to the command. -name looks only the base name, i.e. with the path removed.
You need to specialize your find with a regex in this way:
find /home -mtime +7 -regextype posix-extended -regex '^.*\/.{2}T.*' -print
Explanation of the command:
You add a regular expression that filter all the result of the find for the first N character before the "/" character and after the "/" character have at third position the character "T". You need the first part of the regular expression ( ^.*\/ ) because the find return the result with fullpath so in the form "./dir/dir1/filename.extension". The last part of regular espression is to filter all the file with extension.
Annotation: you can substitute "T" with character you want.

'find' files containing an integer in a specified range (in bash)

You'd think I could find an answer to this already somewhere, but I am struggling to do so. I want to find some log files with names like
myfile_3.log
however I only want to find the ones with numbers in a certain range. I tried things like this:
find <path> -name myfile_{0..67}.log #error: find: paths must precede expression
find <path> -name myfile_[0-67].log #only return 0-7, not 67
find <path> -name myfile_[0,67].log #only returns 0,6,7
find <path> -name myfile_*([0,67]).log # returns only 0,6,7,60,66,67,70,76,77
Any other ideas?
If you want to match an integer range using regular expression, use the option -regex in the your find command.
For example to match all files from 0 to 67, use this:
find <path> -regextype egrep -regex '.*file([0-5][0-9]|6[0-7])\.txt'
There are 2 parts in the regex:
[0-5][0-9] matches the range 0-59
6[0-7] matches the range 60-67
Note the option -regextype egrep to have extended regular expression.
Note also the option -regex matches the whole filename, including path, that's the reason of .* at the beginning of the regex.
You can do this simply and concisely, but admittedly not very efficiently, with GNU Parallel:
parallel find . -name "*file{}.txt" ::: {0..67}
In case, you are wondering why I say it is not that efficient, it is because it starts 68 parallel instances of find - each looking for a different number in the filename... but that may be ok.
The following will find all files named myfile_X.log - whereby the X part is a digit ranging from 0-67.
find <path> -type f | grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Explanation:
-type f finds files whose type is file.
| pipes the filepath(s) to grep for filtering.
grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$" performs an extended (-E) regexp to find the last part of the path (i.e. the filename) which:
begins with myfile_
followed with a digit(s) ranging from 0-67.
ends with .log
Edit:
Alternatively, as suggested by #ghoti in the comments, you can utilize the -regex option in the find command instead of piping to grep. For example:
find -E <path> -type f -regex ".*/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Note: The regexp is very similar to the previous grep example shown previously. However, it begins with .*/ to match all parts of the filepath up to and including the final forward slash. For some reason, unknown to me, the .*/ part is not necessary with grep1.
Footnotes:
1If any readers know why the ERE utilized with find's -regex option requires the initial .* and the same ERE with grep does not - then please leave a comment. You'll make me sleep better at night ;)
One possibility is to build up the range from several ranges that can be matched by glob patterns. For example:
find . -name 'myfile_[0-9].log' -o -name 'myfile_[1-5][0-9].log' -o -name 'myfile_6[0-7].log'
You cannot represent a general range with a regular expression, although you can craft a regex for a specific range. Better use find to get files with a number and filter the output with another tool that perform the range checking, like awk.
START=0
END=67
while IFS= read -r -d '' file
do
N=$(echo "$file" | sed 's/file_\([0-9]\+\).log/\1/')
if [ "$N" -ge "$START" -a "$N" -le "$END" ]
then
echo "$file"
fi
done < <(find <path> -name "myfile_*.log" -print0)
In that script, you perform a find of all the files that have the desired pattern, then you loop through the found files and sed is used to capture the number in the filename. Finally, you compare that number with your range limits. If the comparisons succeed, the file is printed.
There are many other answers that give you a regex for the specific range in the example, but they are not general. Any of them allows for easy modification of the range involved.

Recursively remove pattern from filenames without changing paths

I have thousands of files in a directory tree with filenames like:
/Folder 0001 - 0500/0001 - Portrait - House.jpg
/Folder 2500 - 3000/2505 - Landscape - Mountain.jpg
Using linux command line I would like to remove everything up to the first word in the filenames, so "0001 - " and "2500 - ". The new filenames would look like:
/Folder 0001 - 0500/Portrait - House.jpg
/Folder 2500 - 3000/Landscape - Mountain.jpg
I have modified a script that kind of works:
find . -type f -name "*-*" -exec bash -c 'f="$1"; g="${f/[[:digit:]]/ -/ /}"; echo mv -- "$f" "$g"' _ '{}' \;
The problem here is that it butchers part of the path instead of the filename, so actual output generates filenames like:
/Folder -/ /001 - 0500/0001 - Portrait - House.jpg
/Folder -/ /500 - 3000/2505 - Landscape - Mountain.jpg
How can I modify this script to rename files using the pattern I described?
find . -mindepth 2 -type f -name "*-*" -exec bash -c '
shopt -s extglob
for arg do
dir=${arg%/*}
basename_old=${arg##*/}
basename_new=${basename_old##+([[:digit:]]) - }
[[ "$basename_new" = "$basename_old" ]] && continue # skip when no rename needed
printf "%q " mv -- "$dir/$basename_old" "$dir/$basename_new"
printf "\n"
done
' _ {} +
You can see this code running at https://ideone.com/YJNL9c
Using parameter expansions to split the directory name out from the filename allows these to be manipulated individually.
${arg%/*} removes everything after the last / from the variable in arg -- thus removing the filename, leaving the directories, when a path has at least one directory segment (providing this assurance is the reason for the -mindepth 2).
${arg##*/} removes the longest match to */ from the beginning -- thus removing the directories, leaving the basic filename.
By enabling the extglob shell option, we get regex-like capabilities in our fnmatch/glob-style expressions, including the ability to match one-or-more of a single digit; this is why +([[:digit:]]) - evaluates to "one or more digits, followed by -".
By using printf '%q ' instead of echo when generating shell commands, we generate safely-quoted output even without control of our filenames.
By using -exec ... {} +, we're passing multiple arguments to each bash instance, rather than invoking a separate interpreter for each file found. With for arg do, we iterate over all those arguments.

Delete files that don't match a particular string format

I have a set of files that are named similarly:
TEXT_TEXT_YYYYMMDD
Example file name:
My_House_20170426
I'm trying to delete all files that don't match this format. Every file should have a string of text followed by an underscore, followed by another string of text and another underscore, then a date stamp of YYYYMMDD.
Can someone provide some advice on how to build a find or a remove statement that will delete files that don't match this format?
Using find, add -delete to the end once you're sure it works.
# gnu find
find . -regextype posix-egrep -type f -not -iregex '.*/[a-z]+_[a-z]+_[0-9]{8}'
# OSX find
find -E . -type f -not -iregex '.*/[a-z]+_[a-z]+_[0-9]{8}'
Intentionally only matching alphabetical characters for TEXT. Add 0-9 to each TEXT area like this [a-z0-9] if you need numbers.
grep -v '(pattern)'
will filter out lines that match a pattern, leaving those that don't match. You might try piping in the output of ls. And if you're particularly brave, you could pipe the output to something like xargs rm. But deleting is kinda scary, so maybe save the output to a file first, look at it, then delete the files listed.

Resources