How to tar certain file types in all subdirectories? - linux

I want to tar and all .php and .html files in a directory and its subdirectories. If I use
tar -cf my_archive *
it tars all the files, which I don't want. If I use
tar -cf my_archive *.php *.html
it ignores subdirectories. How can I make it tar recursively but include only two types of files?

find ./someDir -name "*.php" -o -name "*.html" | tar -cf my_archive -T -

If you're using bash version > 4.0, you can exploit shopt -s globstar to make short work of this:
shopt -s globstar; tar -czvf deploy.tar.gz **/Alice*.yml **/Bob*.json
this will add all .yml files that starts with Alice from any sub-directory and add all .json files that starts with Bob from any sub-directory.

One method is:
tar -cf my_archive.tar $( find -name "*.php" -or -name "*.html" )
There are some caveats with this method however:
It will fail if there are any files or directories with spaces in them, and
it will fail if there are so many files that the maximum command line length is full.
A workaround to these could be to output the contents of the find command into a file, and then use the "-T, --files-from FILE" option to tar.

This will handle paths with spaces:
find ./ -type f -name "*.php" -o -name "*.html" -exec tar uvf myarchives.tar {} +

If you want to produce a zipped tar file (.tgz) and want to avoid problems with spaces in filenames:
find . \( -name \*.php -o -name \*.html \) -print0 | xargs -0 tar -cvzf my_archive.tgz
The -print0 “primary” of find separates output filenames using the NULL (\0) byte, thus playing well with the -0 option of xargs, which appends its (NULL-separated, in this case) input as arguments to the command it precedes.
The parentheses around the two -name primaries are needed, because otherwise the -print0 would only output the filenames of the second -name (there is no implied printing if -print or -print0 is present, and these only have an effect if they are evaluated).
If you need to skip some filenames or directories (e.g., the node_modules directory if you work with Node.js), prepend one or more -prune primaries like this:
find . -name skipThisName -prune -o \
-name skipThisOtherName -prune -o \
\( -name \*.php -o -name \*.html \) -print0 | xargs -0 tar -cvzf my_archive.tgz

Put them in a file
find . \( -name "*.php" -o -name "*.html" \) -print > files.txt
Then use the file as input to tar, use -I or -T depending on the version of tar you use
Use h to copy symbolic links
tar cfh my.tar -I files.txt

Easy with zsh:
tar cvzf foo.tar.gz **/*.(php|html)

find ./ -type f -name "*.php" -o -name "*.html" -printf '%P\n' |xargs tar -I 'pigz -9' -cf target.tgz
for multicore or just for one core:
find ./ -type f -name "*.php" -o -name "*.html" -printf '%P\n' |xargs tar -czf target.tgz

Related

cscope for files which are symlinks

I have a source directory with several files. Some of them are symlinks to other files.
I created a cscope.files file. But when I execute cscope. It complains for the files that are symlinks:
cscope: cannot find file /home/bla/source/file.cc
I think it's not very good, but maybe the correct way to go is to change the "find" script, to just write the destination of the symlink instead?
Currently I'm using:
# Write only the files which are NOT symlinks
find `pwd` \( \( -iname "*.c" -o -iname "*.cc" -o -iname "*.h" \) -and \( -not -type l \) \) -print > cscope.files
# Add the target of the symlink for all files matching the right extension, and are symlinks
find `pwd` \( \( -iname "*.c" -o -iname "*.cc" -o -iname "*.h" \) -and -type l \) -printf "%l\n" >> cscope.files
But this seems like a terrible solution. Still looking for a better one
I think you can use the command to find all real paths in a folder that you searched
find -L [your searched folder] -name [your searched pattern] -exec realpath {} \; >> cscope.files
For example, if I would like to add developed folder and linux kernel header to cscope.files, I will the these commands:
find -L `pwd` -iname "*.c" -o -iname "*.h" > cscope.files
find -L /usr/src/linux-headers-3.19.0-15-generic/ -iname '*.h' -exec realpath {} \; >> cscope.files
I hope the answer can help you.
For example if you want to give / as your path for cscope, and want cscope to search files with extensions .c/.h/.x/.s/.S you can give the find command as:
find / -type f -name "*.[chxsS]" -print -exec readlink -f {} \;> cscope.files
This will include regular files, including targets of symbolic links.
I just do the following to avoid symbolic links, as well get the absolute path in the cscope.files. With absolute path you can search from any directory in your sandbox when cscope is integrated with the vim editor
find /"path-to-your-sandbox" -path .git -prune -o -name "*.[ch]" -exec readlink -f {} \; > cscope.files
Note: if you omit -print from the find it does not put the symbolic link path in your cscope.files only the resolved path.
Better in a bash script:
#!/bin/bash
#
# find_cscope_files.sh
extension_list=(c cpp cxx cc h hpp hxx hh)
for x in "${extension_list[#]}"; do
find . -name "*.$x" -print -exec readlink -f {} \;
done
For reference for others I'm currently using.
find "$(pwd)" \( -name "*.[chCS]" -o -name "*.[ch][ci]" -o -name "*.[ch]pp" -o -name "*.[ch]++" -o -name "*.[ch]xx" ) -not \( -ipath "*unittest*" -or -ipath "*regress*" \) \( \( -type l -xtype f -exec readlink -f {} \; \) -o \( -type f -print \) \) >cscope.files
cscope -q -R -b -i cscope.files

How to write a unix command or script to remove files of the same type in all sub-folders under current directory?

Is there a way to remove all temp files and executables under one folder AND its sub-folders?
All that I can think of is:
$rm -rf *.~
but this removes only temp files under current directory, it DOES NOT remove any other temp files under SUB-folders at all, also, it doesn't remove any executables.
I know there are similar questions which get very well answered, like this one:
find specific file type from folder and its sub folder
but that is a java code, I only need a unix command or a short script to do this.
Any help please?
Thanks a lot!
Perl from command line; should delete if file ends with ~ or it is executable,
perl -MFile::Find -e 'find(sub{ unlink if -f and (/~\z/ or (stat)[2] & 0111) }, ".")'
You can achieve the result with find:
find /path/to/directory \( -name '*.~' -o \( -perm /111 -a -type f \) \) -exec rm -f {} +
This will execute rm -f <path> for any <path> under (and including) /path/to/base/directory which:
matches the glob expression *.~
or which has an executable bit set (be it owner, group or world)
The above applies to the GNU version of find.
A more portable version is:
find /path/to/directory \( -name '*.~' -o \( \( -perm -01 -o -perm -010 -o -perm -0100 \) \
-a -type f \) \) -exec rm -f {} +
find . -name "*~" -exec rm {} \;
or whatever pattern is needed to match the tmp files.
If you want to use Perl to do it, use a specific module like File::Remove
This should do the job
find -type f -name "*~" -print0 | xargs -r -0 rm

Error exit delayed from previous error

I am trying to find the files in a directory and then gzip and then tar it .
The script :
find /home -type f -name "*.log" -newer /home/path/start_date \
! -newer /home/path/end_date | xargs -0 tar -cvzf files.tar.gz
The tar is still created but I am getting some errors :
tar:/home/path/filename.log\n Cannot stat : No such file or directory
tar:Error exit delayed from previous errors.
Can someone explain what are these errors? Thanks.
You forgot -print0.
-print0
True; print the full file name on the standard output, followed by a
null character (instead of the newline character that -print uses).
This allows file names that contain newlines or other types of white
space to be correctly inter‐ preted by programs that process the
find output. This option corresponds to the -0 option of xargs.
Also quote your exclamation mark to prevent history expansion just in case:
find /home -type f -name "*.log" -newer /home/path/start_date \! -newer /home/path/end_date -print0 | xargs -0 tar -cvzf files.tar.gz
It's not POSIX but if you can use -not, use -not instead:
... -not -newer ...

How can I make find pass file names to exec without the leading directory name?

Someone created directories with names like source.c. I am doing a find over all the directories in a tree. I do want find to search in the source.c directory, but I do not want source.c to be passed to the grep I am doing on what is found.
How can I make find not pass directory names to grep? Here is what my command line looks like:
find sources* \( -name "*.h" -o -name "*.cpp" -o -name "*.c" \) -exec grep -Hi -e "ThingToFind" {} \;
Add -a -type f to your find command. This will force find to only output files, not directories. (It will still search directories):
find sources* \( -name "*.h" -o -name "*.cpp" -o -name "*.c" \) -a -type f -exec grep -Hi -e "ThingToFind" {} \;

How to change encoding in many files?

I try this:
find . -exec iconv -f iso8859-2 -t utf-8 {} \;
but output goes to the screen, not to the same file. How to do it?
Try this:
find . -type f -print -exec iconv -f iso8859-2 -t utf-8 -o {}.converted {} \; -exec mv {}.converted {} \;
It will use temp file with '.converted' suffix (extension) and then will move it to original name, so be careful if you have files with '.converted' suffixes (I don't think you have).
Also this script is not safe for filenames containing spaces, so for more safety you should double-quote: "{}" instead of {} and "{}.converted" instead of {}.converted
read about enconv.
If you need to convert to your current terminal encoding you can do it like that:
find . -exec enconv -L czech {}\;
Or exactly what you wanted:
find . -exec enconv -L czech -x utf8 {}\;
I found this method worked well for me, especially where I had multiple file encodings and multiple file extensions.
Create a vim script called script.vim:
set bomb
set fileencoding=utf-8
wq
Then run the script on the file extensions you wish to target:
find . -type f \( -iname "*.html" -o -iname "*.htm" -o -iname "*.php" -o -iname "*.css" -o -iname "*.less" -o -iname "*.js" \) -exec vim -S script.vim {} \;
No one proposed a way to automatically detect encoding and recode.
Here is an example to recode to UTF-8 all HTM/HTML files from master branch of a GIT.
git ls-tree master -r --name-only | grep htm | xargs -n1 -I{} bash -c 'recode "$(file -b --mime-encoding {})..utf-8" {}'

Resources