grep files based on time stamp - linux

This should be pretty simple, but I am not figuring it out. I have a large code base more than 4GB under Linux. A few header files and xml files are generated during build (using gnu make). If it matters the header files are generated based on xml files.
I want to search for a keyword in header file that was last modified after a time instance ( Its my start compile time), and similarly xml files, but separate grep queries.
If I run it on all possible header or xml files, it take a lot of time. Only those that were auto generated. Further the search has to be recursive, since there are a lot of directories and sub-directories.

You could use the find command:
find . -mtime 0 -type f
prints a list of all files (-type f) in and below the current directory (.) that were modified in the last 24 hours (-mtime 0, 1 would be 48h, 2 would be 72h, ...). Try
grep "pattern" $(find . -mtime 0 -type f)

To find 'pattern' in all files newer than some_file in the current directory and its sub-directories recursively:
find -newer some_file -type f -exec grep 'pattern' {} +
You could specify the timestamp directly in date -d format and use other find tests e.g., -name, -mmin.
The file list could also be generate by your build system if find is too slow.
More specific tools such as ack, etags, GCCSense might be used instead of grep.

Use this. Because if find doesn't return a file, then grep will keep waiting for an input halting the script.
find . -mtime 0 -type f | xargs grep "pattern"

Related

Create ZIP of hundred thousand files based on date newer than one year on Linux

I have a /folder with over a half million files created in the last 10 years. I'm restructuring the process so that in the future there are subfolders based on the year.
For now, I need to backup all files modified within the last year. I tried
zip -r /backup.zip $(find /folder -type f -mtime -365
but get error: Argument list too long.
Is there any alternative to get the files compressed and archived?
Zip has an option to read the filelist from stdin. Below is from the zip man page
-# file lists. If a file list is specified as -# [Not on MacOS],
zip takes the list of input files from standard input instead of
from the command line. For example,
zip -# foo
will store the files listed one per line on stdin in foo.zip.
This should do what you need
find /folder -type f -mtime -365 | zip -# /backup.zip
Note that I've removed the -r option because it isn't doing anything - you are explicitly selecting standard files with the find command (-type f)
You'll have to switch from passing all the files at once to piping the files one at a time to the zip command.
find /folder -type f -mtime -365 | while read FILE;do zip -r /backup.zip $FILE;done
You can also work with the -exec parameter in find, like this:
find /folder -type f -mtime -365 -exec zip -r /backup.zip \;
(or whatever your command is). For every file, the given command is executed with the file passed as a last parameter.
Find the files and then execute the zip command on as many files as possible using + as opposed to ;
find /folder -type f -mtime -365 -exec zip -r /backup.zip '{}' +

list base files in a folder with numerous date stampped versions of a file

I've got a folder with numerous versions of files (thousands of them), each with a unique date/time stamp as the file extension. For example:
./one.20190422
./one.20190421
./one.20190420
./folder/two.txt.20190420
./folder/two.txt.20190421
./folder/folder/three.mkv.20190301
./folder/folder/three.mkv.20190201
./folder/folder/three.mkv.20190101
./folder/four.doc.20190401
./folder/four.doc.20190329
./folder/four.doc.20190301
I need to get a unique list of the base files. For example, for the above example, this would be the expected output:
./one
./folder/two.txt
./folder/folder/three.mkv
./folder/four.doc
I've come up with the below code, but am wondering if there is a better, more efficient way.
# find all directories
find ./ -type d | while read folder ; do
# go into that directory
# then find all the files in that directory, excluding sub-directories
# remove the extension (date/time stamp)
# sort and remove duplicates
# then loop through each base file
cd "$folder" && find . -maxdepth 1 -type f -exec bash -c 'printf "%s\n" "${#%.*}"' _ {} + | sort -u | while read file ; do
# and find all the versions of that file
ls "$file".* | customFunctionToProcessFiles
done
done
If it matters, the end goal is find all the versions of a specific file, in groups of the base file, and process them for something. So my plan was to get the base files, then loop through the list and find all the version files. So, using the above example again, I'd process all the one.* files first, then the two.* files, etc...
Is there a better, faster, and/or more efficient way to accomplish this?
Some notes:
There are potentially thousands of files. I know I could just search for all files from the root folder, remove the date/time extension, sort and get unique, but since there may be thousands of files I thought it might be more efficient to loop through the directories.
The date/time stamp extension of the file is not in my control and it may not always be just numbers. The only thing I can guarantee is it is on the end after a period. And, whatever format the date/time is in, all the files will share it -- there won't be some files with one format and other files with another format.
You can use find ./ -type f -regex to look for files directly
find ./ -type f -regex '.*\.[0-9]+'
./some_dir/asd.mvk.20190422
./two.txt.20190420
Also, pipe the result to your function through xargs whithout needing while loops
re='(.*)(\.[0-9]{8,8})'
find ./ -type f -regextype posix-egrep -regex "$re" | \
sed -re "s/$re/\1/" | \
xargs -r0 customFunctionToProcessFiles

Counting number of files in a directory with an OSX terminal command

I'm looking for a specific directory file count that returns a number. I would type it into the terminal and it can give me the specified directory's file count.
I've already tried echo find "'directory' | wc -l" but that didn't work, any ideas?
You seem to have the right idea. I'd use -type f to find only files:
$ find some_directory -type f | wc -l
If you only want files directly under this directory and not to search recursively through subdirectories, you could add the -maxdepth flag:
$ find some_directory -maxdepth 1 -type f | wc -l
Open the terminal and switch to the location of the directory.
Type in:
find . -type f | wc -l
This searches inside the current directory (that's what the . stands for) for all files, and counts them.
The fastest way to obtain the number of files within a directory is by obtaining the value of that directory's kMDItemFSNodeCount metadata attribute.
mdls -name kMDItemFSNodeCount directory_name -raw|xargs
The above command has a major advantage over find . -type f | wc -l in that it returns the count almost instantly, even for directories which contain millions of files.
Please note that the command obtains the number of files, not just regular files.
I don't understand why folks are using 'find' because for me it's a lot easier to just pipe in 'ls' like so:
ls *.png | wc -l
to find the number of png images in the current directory.
I'm using tree, this is the way :
tree ph

"find" specific contents [linux]

I would like to go through all the files in the current directory (or sub-directories) and echoes me back the name of files only if they contain certain words.
More detail:
find -type f -name "*hello *" will give me all file names that have "hello" in their names. But instead of that, I want to search through the files and if that file's content contains "hello" then prints out the name of the file.
Is there a way to approach this?
You can use GNU find and GNU grep as
find /path -type f -exec grep -Hi 'hello' {} +
This is efficient in a way that it doesn't invoke as many grep instances to as many files returned from find. This works in an underlying assumption that find returns a set of files for grep to search on. If you are unsure if the files may not be available, as a fool-proof way, you can use xargs with -r flag, in which case the commands following xargs are executed only if the piped commands return any results
find /path -type f | xargs -r0 grep -Hi 'hello'

Find file according to the access time using "grep" and "find" commands

My goal is to find all text files with extension .log, which have the last access more than 24 hours ago and contain the required text.
Here is what I have already tried:
find / *.log -mtime +1 -print | grep "next" *.log
but this doesn't work.
Question is: how can I reach the goal I have described above?Maybe some ways to modify my find expression?
The problem with your command is that you are running the grep on the output of the find command - which means you are running it on the file names, not content (actually, since you have the *.log at the end, you run it on all *.log files, completely ignoring what your find command found). also, you need -name in order to filter only the .log files.
you can use the -exec flag of find to execute a command on each of the files that matches your find criteria:
find / -name "*.log" -mtime +1 -exec grep 'next' \{};
Try with xargs:
find / -name "*.log" -mtime +1 | xargs grep "next"
But also, note what the find manual says about the arg to -atime which also applies to -mtime. That is, your mtime as specified probably doesn't get the time period you want.
When find figures out how many 24-hour periods ago the file was last
accessed, any fractional part is ignored, so to match -atime +1, a
file has to have been accessed at least two days ago.

Resources