use find to generate a list of files, then use md5sum - linux

I'd like to use find (or something else if that is more appropriate) to generate a file listing of an entire filesystem, except for /proc. I will the install some software, then run this command again to see what files have changed.
This is what I have tried, with no success:
find / -type f -xdev -exec md5sum {} + > md5sum.txt
That hangs on /proc.
find / \( -path /proc -o -path /var/run \) -prune -o -print -type f -exec md5sum {} + > md5sum.txt
I get a partial file listing, then a listing of files with an md5sum. (??)
Is there a way to use the updatedb command to do this? Thanks.

Related

Using linux find command to identify files that (A) match either of two names (with wildcards) and (B) that also contain a string

The find command is really useful to identify files with a given name that also contain a string somewhere inside of them.
For instance lets say I'm looking for the string "pacf(" in an R markdown file somewhere in my current directory.
find . -name "*.Rmd" -exec grep -ls "pacf(" {} \;
I get useful results.
However, sometimes, I'm not sure if the file I am looking for is an .R file or a .Rmd file so I might also run.
find . -name "*.R" -exec grep -ls "pacf(" {} \;
And lets say there are no R files containing this string so that returns nothing.
One think I'd like to do is look in both .R and .Rmd files for this string. I would think that I could run
find . -name "*.Rmd" -o -name "*.R" -exec grep -ls "pacf(" {} \
But that returns no results.
However if I run
find . -name "*.R" -o -name "*.Rmd" -exec grep -ls "pacf(" {} \
I get the same results as just searching the .Rmd files. So it seems like it is only running the stuff in exec for the second set of files.
Is there a way I could change these commands to look through both the .R and .Rmd files at once?
Add parentheses '()'
find . \( -name '*.R' -o -name '*.Rmd' \) -exec grep -ls "pacf(" {} \;
you can pass "*[.Rmd]" for -name
like this
find . -name "*[.Rmd]" -exec grep -ls "pacf(" {} \;

"find" command but it stops going deep if it finds a directory starting with "."

I have to make a script that goes through a whole folder (/home, in my case).
I have to save all the files except the ones that start with ., and also, if I find a directory that starts with ., I don't have to care what's inside, I don't have to read it.
For the first part we use the command
for path in $(find /home \! -name ".*");do
where path is a variable that contains the path. But we don't know how to do the directory part.
I thought I'd cut the path through the / and then see if there's any .. In that case, have an if that does not save the file, but I don't know how to cut a string and save it in a variable and then go through it.
You can prune all files starting with a ..
From the man page of GNU find:
-prune True; if the file is a directory, do not descend into it. If -depth is given, false; no effect. Because -delete implies -depth, you cannot usefully use -prune and -delete together.
You should not loop over the result from find. You will get unexpected results if you have filenames with spaces or newlines.
Use xargs or -exec, e.g.
find /home -path "*/.*" -prune -o -print0 | xargs -0I{} sh -c 'echo "doing something with $1"' sh {}
or
find /home -path "*/.*" -prune -o -exec sh -c 'for i; do echo "doing something with $i"; done' sh {} +
The -prune part removes all filenames (files and directories) starting with a dot and does not descend into directories starting with a dot.
All other filenames are printed with a NUL character instead of a newline (-o -print0) and piped to xargs or a shell script is executed with your action (as few times as possible).
To save all filenames into a file:
find /home -path "*/.*" -prune -o -print > allfiles.txt
Try this
for path in $(find /home -type d -name ".*" -prune -o -type f \! -name ".*" -print);do echo $path; done
I think I would do something like that:
for path in $(find . -type f | egrep -v '/\.[^\/]+\/'); do
...
Note that you may have to take extra steps if some of your files have spaces in their names.

Recursively recode all project files excluding some directories and preserving permissions

How to recursively recode all project files excluding some directories and preserving permissions?
Based on this question, but its solution does not preserve permissions, so I had to modify it.
WARNING: since the recursive removal is a part of the solution, use it on your own risk
Task:
Recursively recode all project files (iso8859-8 -> utf-8) excluding '.git' and '.idea' dirs and preserving permissions.
Solution (worked well in my case):
Backup your project's dir, then cd there. Run:
find . -not -path "./.git/*" -not -path "./.idea/*" -type f -print -exec iconv -f iso8859-8 -t utf-8 -o {}.converted {} \; -exec sh -c 'cat {}.converted > {}' \; -exec rm {}.converted \;
Binary and image files will fail to recode since they aren't text, so files like 'image.jpeg.converted' will be left along with 'image.jpeg'. To clean up this mess:
find . -not -path "./.git/*" -not -path "./.idea/*" -type f -regex '.*\.converted' -exec rm {} \;
Before you do that, you may want just print (without rm) to see that there are only those files listed that you'd really like to remove.

How to write a unix command or script to remove files of the same type in all sub-folders under current directory?

Is there a way to remove all temp files and executables under one folder AND its sub-folders?
All that I can think of is:
$rm -rf *.~
but this removes only temp files under current directory, it DOES NOT remove any other temp files under SUB-folders at all, also, it doesn't remove any executables.
I know there are similar questions which get very well answered, like this one:
find specific file type from folder and its sub folder
but that is a java code, I only need a unix command or a short script to do this.
Any help please?
Thanks a lot!
Perl from command line; should delete if file ends with ~ or it is executable,
perl -MFile::Find -e 'find(sub{ unlink if -f and (/~\z/ or (stat)[2] & 0111) }, ".")'
You can achieve the result with find:
find /path/to/directory \( -name '*.~' -o \( -perm /111 -a -type f \) \) -exec rm -f {} +
This will execute rm -f <path> for any <path> under (and including) /path/to/base/directory which:
matches the glob expression *.~
or which has an executable bit set (be it owner, group or world)
The above applies to the GNU version of find.
A more portable version is:
find /path/to/directory \( -name '*.~' -o \( \( -perm -01 -o -perm -010 -o -perm -0100 \) \
-a -type f \) \) -exec rm -f {} +
find . -name "*~" -exec rm {} \;
or whatever pattern is needed to match the tmp files.
If you want to use Perl to do it, use a specific module like File::Remove
This should do the job
find -type f -name "*~" -print0 | xargs -r -0 rm

cat files in subdirectories using linux commands

I have the following directories:
P922_101
P922_102
.
.
Each directory, for instance P922_101 has following subdirectories:
140311_AH8MHGADXX 140401_AH8CU4ADXX
Each subdirectory, for instance 140311_AH8MHGADXX has the following files:
1_140311_AH8MH_P922_101_1.fastq.gz 1_140311_AH8MH_P922_101_2.fastq.gz
2_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_2.fastq.gz
And files in 140401_AH8CU4ADXX are:
1_140401_AH8CU_P922_101_1.fastq.gz 1_140401_AH8CU_P922_4001_2.fastq.gz
2_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_4001_2.fastq.gz
I want to do 'cat' for the files in the subdirectories in the following way:
cat 1_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_1.fastq.gz
1_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_101_1.fastq.gz > P922_101_1.fastq.gz
which means that files ending with _1.fastq.gz should be concatenated into a single file and files ending with _2.fatsq.gz into another file.
It should be run for all files in subdirectories in all directories. Could someone give a linux solution to do this?
Since they're compressed, you should probably use gzip -dc (decompress and write to stdout) -
find /somePath -type f -name "*.fastq.gz" -exec gzip -dc {} \; | \
tee -a /someOutFolder/out.txt
You can use find for this:
find /top/path -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > one_file
find /top/path -mindepth 2 -type f -name "*_2.fastq.gz" -exec cat {} \; > another_file
This will look for all the files starting from /top/path and having a name matching the pattern _1.fastq.gz / _2.fastq.gz and cat them into the desired file. -mindepth 2 makes find look for files that are at least under the current directory; this way, files in /top/path won't be matched.
Note that you will probably need zcat instead of cat, for gz files.
As you keep adding details in comments, let's see what else we can do:
Say you have the list of directories in a file directories_list, each line containing one:
while read directory
do
find $directory -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > $directory/output
done < directories_list

Resources