How to change encoding in many files? - linux

I try this:
find . -exec iconv -f iso8859-2 -t utf-8 {} \;
but output goes to the screen, not to the same file. How to do it?

Try this:
find . -type f -print -exec iconv -f iso8859-2 -t utf-8 -o {}.converted {} \; -exec mv {}.converted {} \;
It will use temp file with '.converted' suffix (extension) and then will move it to original name, so be careful if you have files with '.converted' suffixes (I don't think you have).
Also this script is not safe for filenames containing spaces, so for more safety you should double-quote: "{}" instead of {} and "{}.converted" instead of {}.converted

read about enconv.
If you need to convert to your current terminal encoding you can do it like that:
find . -exec enconv -L czech {}\;
Or exactly what you wanted:
find . -exec enconv -L czech -x utf8 {}\;

I found this method worked well for me, especially where I had multiple file encodings and multiple file extensions.
Create a vim script called script.vim:
set bomb
set fileencoding=utf-8
wq
Then run the script on the file extensions you wish to target:
find . -type f \( -iname "*.html" -o -iname "*.htm" -o -iname "*.php" -o -iname "*.css" -o -iname "*.less" -o -iname "*.js" \) -exec vim -S script.vim {} \;

No one proposed a way to automatically detect encoding and recode.
Here is an example to recode to UTF-8 all HTM/HTML files from master branch of a GIT.
git ls-tree master -r --name-only | grep htm | xargs -n1 -I{} bash -c 'recode "$(file -b --mime-encoding {})..utf-8" {}'

Related

Sed and grep in multiple files

I want to use "sed" and "grep" to search and replace in multiples files, excluding some directories.
I run this command:
$ grep -RnI --exclude-dir={node_modules,install,build} 'chaine1' /projets/ | sed -i 's/chaine1/chaine2/'
I get this message:
sed: pas de fichier d'entrée
I also tried with these two commands:
$ grep -RnI --exclude-dir={node_modules,install,build} 'chaine1' . | xargs -0 sed -i 's/chaine2/chaine2/'
$ grep -RnI --exclude-dir={node_modules,install,build} 'chaine2' . -exec sed -i 's/chaine1/chaine2/g' {} \;
But,it doesn't work!!
Could you help me please?
Thanks in advance.
You want find with -exec. Don't bother running grep, sed will only change lines containing your pattern anyway.
find \( -name node_modules -o -name install -o -name build \) -prune \
-o -type f -exec sed -i 's/chaine1/chaine2/' {} +
First, the direct outputs of grep command are not file paths. They look like this {file_path}:{line_no}:{content}. So the first thing you need to do is to extract file paths. We can do this use cut command or use -l option of grep.
# This will print {file_path}
$ echo {file_path}:{line_no}:{content} | cut -f 1 -d ":"
# This is a better solution, because it only prints each file once even though
# the grep pattern appears at many lines of a file.
$ grep -RlI --exclude-dir={node_modules,install,build} "chaine1" /projets/
Second, sed -i does not read from stdin. We can use xargs to read each file path from stdin and then pass it to sed as its argument. You have already done this.
The complete command like this:
$ grep -RlI --exclude-dir={node_modules,install,build} "chaine1" /projets/ | xargs -i sed -i 's/chaine1/chaine2/' {}
Edit: Thanks to #EdMorton's comment, I dig into find. My previous solutions will dig into files not in exclusive directories once by grep, and then process files containing pattern string for another time by sed. However, we can first use find to filter files according to their path names, and then use sed to process files only once.
My find solution is almost the same as #knittl's, but with bug fixed. Besides, I try to explain why it gets the similar results with grep. Because I still not find how to skip binary files like -I option of grep.
$ find \( \( -name node_modules -o -name install -o -name build \) -prune -type f \
-o -type f \) -exec echo {} +
or
find \( \( -name node_modules -o -name install -o -name build \) -prune \
-o -type f \) -type f -exec echo {} +
\( -name pat1 -o -name pat2 \) gives paths matching pat1 or pat2 (include files and directories), where -o means logical or. -prune ignores a directory and the files under it. They combine to achieve similar function with exclude-dir in grep.
-type f gives paths of regular files.

Linux find reencode all files to subdirectory

I'm trying to reencode all files in a directory and put the results in a subdirectory.
I'm using
find . -type f -name '*.txt' -execdir iconv -f utf-16 -t utf-8 {} > reencoded/{} \;
But the filename does not replace the second occurrence of '{}', there is a result in reencoded/{} instead.
Wrap the command inside a call to sh -c, which can then reference the {} as $0:
find . -type f -name '*.txt' -execdir sh -c 'iconv -f utf-16 -t utf-8 "$0" > reencoded/"$0"' {} \;

How to write a unix command or script to remove files of the same type in all sub-folders under current directory?

Is there a way to remove all temp files and executables under one folder AND its sub-folders?
All that I can think of is:
$rm -rf *.~
but this removes only temp files under current directory, it DOES NOT remove any other temp files under SUB-folders at all, also, it doesn't remove any executables.
I know there are similar questions which get very well answered, like this one:
find specific file type from folder and its sub folder
but that is a java code, I only need a unix command or a short script to do this.
Any help please?
Thanks a lot!
Perl from command line; should delete if file ends with ~ or it is executable,
perl -MFile::Find -e 'find(sub{ unlink if -f and (/~\z/ or (stat)[2] & 0111) }, ".")'
You can achieve the result with find:
find /path/to/directory \( -name '*.~' -o \( -perm /111 -a -type f \) \) -exec rm -f {} +
This will execute rm -f <path> for any <path> under (and including) /path/to/base/directory which:
matches the glob expression *.~
or which has an executable bit set (be it owner, group or world)
The above applies to the GNU version of find.
A more portable version is:
find /path/to/directory \( -name '*.~' -o \( \( -perm -01 -o -perm -010 -o -perm -0100 \) \
-a -type f \) \) -exec rm -f {} +
find . -name "*~" -exec rm {} \;
or whatever pattern is needed to match the tmp files.
If you want to use Perl to do it, use a specific module like File::Remove
This should do the job
find -type f -name "*~" -print0 | xargs -r -0 rm

How to pipe the results of 'find' to mv in Linux

How do I pipe the results of a 'find' (in Linux) to be moved to a different directory? This is what I have so far.
find ./ -name '*article*' | mv ../backup
but its not yet right (I get an error missing file argument, because I didn't specify a file, because I was trying to get it from the pipe)
find ./ -name '*article*' -exec mv {} ../backup \;
OR
find ./ -name '*article*' | xargs -I '{}' mv {} ../backup
xargs is commonly used for this, and mv on Linux has a -t option to facilitate that.
find ./ -name '*article*' | xargs mv -t ../backup
If your find supports -exec ... \+ you could equivalently do
find ./ -name '*article*' -exec mv -t ../backup {} \+
The -t option is a GNU extension, so it is not portable to systems which do not have GNU coreutils (though every proper Linux I have seen has that, with the possible exception of Busybox). For complete POSIX portability, it's of course possible to roll your own replacement, maybe something like
find ./ -name '*article*' -exec sh -c 'mv "$#" "$0"' ../backup {} \+
where we shamelessly abuse the convenient fact that the first argument after sh -c 'commands' ends up as the "script name" parameter in $0 so that we don't even need to shift it.
Probably see also https://mywiki.wooledge.org/BashFAQ/020
I found this really useful having thousands of files in one folder:
ls -U | head -10000 | egrep '\.png$' | xargs -I '{}' mv {} ./png
To move all pngs in first 10000 files to subfolder png
mv $(find . -name '*article*') ../backup
Here are a few solutions.
find . -type f -newermt "2019-01-01" ! -newermt "2019-05-01" \
-exec mv {} path \;**
or
find path -type f -newermt "2019-01-01" ! -newermt "2019-05-01" \
-exec mv {} path \;
or
find /Directory/filebox/ -type f -newermt "2019-01-01" \
! -newermt "2019-05-01" -exec mv {} ../filemove/ \;
The backslash + newline is just for legibility; you can equivalently use a single long line.
xargs is your buddy here (When you have multiple actions to take)!
And using it the way I have shown will give great control to you as well.
find ./ -name '*article*' | xargs -n1 sh -c "mv {} <path/to/target/directory>"
Explanation:
-n1
Number of lines to consider for each operation ahead
sh -c
The shell command to execute giving it the lines as per previous condition
"mv {} /target/path"
The move command will take two arguments-
1) The line(s) from operation 1, i.e. {}, value substitutes automatically
2) The target path for move command, as specified
Note: the "Double Quotes" are specified to allow any number of spaces or arguments for the shell command which receives arguments from xargs

How to tar certain file types in all subdirectories?

I want to tar and all .php and .html files in a directory and its subdirectories. If I use
tar -cf my_archive *
it tars all the files, which I don't want. If I use
tar -cf my_archive *.php *.html
it ignores subdirectories. How can I make it tar recursively but include only two types of files?
find ./someDir -name "*.php" -o -name "*.html" | tar -cf my_archive -T -
If you're using bash version > 4.0, you can exploit shopt -s globstar to make short work of this:
shopt -s globstar; tar -czvf deploy.tar.gz **/Alice*.yml **/Bob*.json
this will add all .yml files that starts with Alice from any sub-directory and add all .json files that starts with Bob from any sub-directory.
One method is:
tar -cf my_archive.tar $( find -name "*.php" -or -name "*.html" )
There are some caveats with this method however:
It will fail if there are any files or directories with spaces in them, and
it will fail if there are so many files that the maximum command line length is full.
A workaround to these could be to output the contents of the find command into a file, and then use the "-T, --files-from FILE" option to tar.
This will handle paths with spaces:
find ./ -type f -name "*.php" -o -name "*.html" -exec tar uvf myarchives.tar {} +
If you want to produce a zipped tar file (.tgz) and want to avoid problems with spaces in filenames:
find . \( -name \*.php -o -name \*.html \) -print0 | xargs -0 tar -cvzf my_archive.tgz
The -print0 “primary” of find separates output filenames using the NULL (\0) byte, thus playing well with the -0 option of xargs, which appends its (NULL-separated, in this case) input as arguments to the command it precedes.
The parentheses around the two -name primaries are needed, because otherwise the -print0 would only output the filenames of the second -name (there is no implied printing if -print or -print0 is present, and these only have an effect if they are evaluated).
If you need to skip some filenames or directories (e.g., the node_modules directory if you work with Node.js), prepend one or more -prune primaries like this:
find . -name skipThisName -prune -o \
-name skipThisOtherName -prune -o \
\( -name \*.php -o -name \*.html \) -print0 | xargs -0 tar -cvzf my_archive.tgz
Put them in a file
find . \( -name "*.php" -o -name "*.html" \) -print > files.txt
Then use the file as input to tar, use -I or -T depending on the version of tar you use
Use h to copy symbolic links
tar cfh my.tar -I files.txt
Easy with zsh:
tar cvzf foo.tar.gz **/*.(php|html)
find ./ -type f -name "*.php" -o -name "*.html" -printf '%P\n' |xargs tar -I 'pigz -9' -cf target.tgz
for multicore or just for one core:
find ./ -type f -name "*.php" -o -name "*.html" -printf '%P\n' |xargs tar -czf target.tgz

Resources