Delete files that don't match a particular string format - linux

I have a set of files that are named similarly:
TEXT_TEXT_YYYYMMDD
Example file name:
My_House_20170426
I'm trying to delete all files that don't match this format. Every file should have a string of text followed by an underscore, followed by another string of text and another underscore, then a date stamp of YYYYMMDD.
Can someone provide some advice on how to build a find or a remove statement that will delete files that don't match this format?

Using find, add -delete to the end once you're sure it works.
# gnu find
find . -regextype posix-egrep -type f -not -iregex '.*/[a-z]+_[a-z]+_[0-9]{8}'
# OSX find
find -E . -type f -not -iregex '.*/[a-z]+_[a-z]+_[0-9]{8}'
Intentionally only matching alphabetical characters for TEXT. Add 0-9 to each TEXT area like this [a-z0-9] if you need numbers.

grep -v '(pattern)'
will filter out lines that match a pattern, leaving those that don't match. You might try piping in the output of ls. And if you're particularly brave, you could pipe the output to something like xargs rm. But deleting is kinda scary, so maybe save the output to a file first, look at it, then delete the files listed.

Related

How to ignore a file using find command

I'm trying to find artifact using the command
name: Get path to Java artifact
run:echo JAVA_ARTIFACT=$(findbuild/libs/*.jar -type f) >>$GITHUB_ENV
The problem is I have 2 artifacts in that directory
build/libs/abc.jar
build/libs/abc-plain.jar
I want to pick only abc.jar file.
Can anyone suggest how can I achieve this ?
The find command can be used with regular expressions which makes it easy to get any kind of complex search results. How it works:
You have to use your find command with -regex instead of -name.
You have to generate a matching regular expression
How find passes the filename to the regular expression?
Assume we have the following directory structure:
/home/someone/build/libs/abc.jar
/home/someone/build/libs/abc-plain.jar
and we are sitting in someone
if we execute find . without any further arguments, we get:
./build/libs/abc.jar
./build/libs/abc-plain.jar
So we can for example search with regex for:
something starts with a single dot .
may have some additional path inside the file name
should NOT contain the - character in any number of character
ends with .jar
This results in:
'.'
'/*'
'[^-]+'
'.jar'
And all together:
find . -regex '.*/[^-]+.jar'
or if you ONLY want to search in build/libs/
find ./build/libs -regex '.*/[^-]+.jar'
You find a online regex tool there.
The find command support standard UNIX regex to match, include or exclude files. You can write complex queries easily with regex while finding the command recursively descends the directory tree for each /file/to/path listed, evaluating an expression.
Since you haven't clearly mentioned that you don't want the hyphen - in the filename, I'm assuming to find files without -.
I would try something like this. Matching lower-case, upper-case, numerical & .jar extension with regex.
find build/libs/ -regextype posix-egrep -regex '.*/[a-zA-Z0-9]+\.jar'
I got below output when tested locally.
touch abc.jar
touch abc-plain.jar
find . -regextype posix-egrep -regex '.*/[a-zA-Z0-9]+\.jar'
./abc.jar
You can try above commands here

Linux count files with a specific string at a specific position in filename

I have a directory which contains data for several years and several months.
Filenames have the format yy/mm/dd, f.e.
20150415,
20170831,
20121205
How can I find all data with month = 3?
F.e.
20150302,
20160331,
20190315
Thanks for your help!
ls -ltra ????03??
A question mark is a wildcard which stands for one character, so as your format seems to be YYYYmmDD, the regular expression ????03?? should stand for all files having 03 as mm.
Edit
Apparently the files have format YYYYmmDDxxx, where xxx is the rest of the filename, having an unknown length. This would correspond with regular expression *, so instead of ????03?? you might use ????03??*.
As far as the find is concerned: the same regular expression holds here, but as you seem to be working inside a directory (no subdirectories, at first sight), you might consider the -maxdepth switch):
find . -name "????03??*" | wc -l // including subdirectories
find . -maxdepth 1 -name "????03??*" | wc -l // only current directory
I would highly advise you to check without wc -l first for checking the results. (Oh, I just see the switch -type f, that one might still be useful too :-) )

Unix, search for string in multiple files. ( Case sensitive, and accept if the string is in a string )

I've been using this command:
find /path~ -type f | xargs grep -iR STRING1
to find strings in multiple files, but i was wondering me how can i find a string in multiple files, Case Sensitive, and even if the string is in other string.
For example:
I'm searching for: Encoder
if a file contains: abcdEncoder — should appear
if a file contains: abcdencoder — shouldn't appear
if a file contains: encoderEncoder — should appear
Maybe the question is a duplicate, but i haven't find it!
Remove the -i switch to make the matches case-sensitive. Your command already searches multiple files and doesn't care whether the string is inside another string, so that'll give you what you want.
Also note that using both find -type f and -R is redundant: as -type f ensures find will only print normal files for grep to examine, the -R (recurse through directories) option won't change anything. Alternatively, you can use -R to get rid of find and xargs: grep -R STRING1 /path~

Problem using 'find' in BASH

I'm following this guide to get some basic skills in Linux.
At the exercises of chapter 3 section, there are two exercises:
*Change to your home directory. Create a new directory and copy all
the files of the /etc directory into it. Make sure that you also copy
the files and directories which are in the subdirectories of /etc!
(recursive copy)
*Change into the new directory and make a directory for files starting
with an upper case character and one for files starting with a lower
case character. Move all the files to the appropriate directories. Use
as few commands as possible.
The first part was simple but I have encountered problems in the second part (although I thought it should be simple as well).
I did the first part successfully - that is, I have a copy of the /etc folder in ~/newetc - with all the files copied recursively into subdirectories.
I've created ~/newetc/upper and ~/newetc/lower directories.
My intention was to do something like mv 'find ... ' ./upper for example.
But first I thought I should make sure that I can find all the files with Upper/Lower case seperately. At this I failed.
I thought that find ~/newetc [A-Z].* (also tried: find ~/newetc -name [A-Z].*) to find all the upper case files - but it simply returns no results.
What's even stranger: find ~/newetc -name [a-z].*) returns only two files, although of course there are a lot more then that...
any idea what am I doing wrong?
Thank you for your time!
Edit: (I have tried to read the Man for find command btw, but didn't come up with anything)
The -name argument does not take a full regular expression by default. So [A-Z].* will match only if the second character is a dot.
Use the expression [A-Z]*, or use -regex and -regextype to match using a real regex.
You need to use quotes
find ~/new_etc -name "[A-Z]*"
find ~/new_etc -name "[a-z]*"
If you want to use regexp, then you must use -regex (or -iregex).
For finding stuff, the other answers tell you how to do it.
For moving the results of find, use the -exec flag (while being in newetc):
find -name "[A-Z]*" -exec mv {} upper/{} \;
find -name "[a-z]*" -exec mv {} lower/{} \;
The -name parameter takes a glob, not a regular expression (those are both very useful pages). So the dot does not have a special meaning for this parameter - It is interpreted as a literal dot character. Also, in a regular expression the * means "0 or more of the previous expression" while in a glob it means "any number of any character." So, as others have pointed out, the following should get you any files below the current directory which start with an uppercase character:
find . -name '[A-Z]*'
If you want to find all the name beginning with a capital letter you have to use
find . -name "[A-Z]*"
NOT
find [A-Z].*
otherwise yo will try to locate all the file that begin with a capital letter and have a . just after

find printf unable to print newlines or carriage returns

I am making a script to check and list the outputs of certain files in our job folder. I am wanting to check files that have been there today (or in the past 24 hours I guess).
Currently I am doing the following:
find /folder/jobfolder/rrz* -type f -mtime 0
so I get something like this:
/folder/jobfolder/rrzabc_1234.lis
/folder/jobfolder/rrzdef_4567.lis
/folder/jobfolder/rrzgre_8901.log
ideally I would like to use the printf command to add the date to it so I can see if it was today and separate it line by line. If I do
find /folder/jobfolder/rrz* -type f -mtime 0 -printf %p\t%AD\n It does not take use the escape characters.
/folder/jobfolder/rrzabc_1234.lis/folder/jobfolder/rrzdef_4567.lis/folder/jobfolder/rrzgre_8901.log
Is there a better way to approach this? also being able to use -iname to match capital or lowercase letters might be helpful
you are missing quotes; try
-printf '%p\t%AD\n'
so that bash does not interpret the \

Resources