Only list files; iconv and directories? - linux

I want to convert the coding of some csv-files with iconv. It has to be a script so I am working with while; do done. The script lists every item in a specific directory and converts them into another coding (utf-8).
Currently, my script lists EVERY item, including directories... So here are my questions
Does iconv has a problem with directories or does it ignore them?
And if there is a problem, how can I only list/search only for files?
I tried How to list only files in Bash? a ***./*** at the beginning of every item and that's kinda annoying (and my program doesn't like it, too).
Another possibility is ls -p | grep -v / but this would also affect files with / in the name, wouldn't it?
I hope you can help me. Thank you.
Here is the code:
for item in $(ls directory/); do
FileName=$item
iconv -f "windows-1252" -t "UTF-8" FileName -o FileName
done
Yea, i know, the input and output file cannot be the same^^

Use find directly:
find . -maxdepth 1 -type f -exec bash -c 'iconv -f "windows-1252" -t "UTF-8" $1 > $1.converted && mv $1.converted $1' -- {} \;
find . -maxdepth 1 -type f finds all files in the working directory
-exec ... executes a command on each such file (including correct handling of e.g. spaces or newlines in the filename)
bash -c '...' executes the command in '...' in a subshell (easier to do the subsequent steps, involving multiple expansions of the filename, this way)
-- terminates option processing, and treats anything after the -- as arguments to the call.
{} is replaced by find with the file name(s) found
$1 in the bash command is replaced with the first (and only) argument, which is the {} replaced by the filename (see above)
\; tells find where the -exec'ed command ends.

Building upon the existing question that you referenced, Why don't you just remove the first 2 characters i.e. ./?
find . -maxdepth 1 -type f | cut -c 3-
Edit: I agree with #DevSolar about the space-based problem in the for-loop. While I think that his solution is better for this problem, I just want to give an alternative way to get out of the space-based for-loop issue.
OLD_IFS=$IFS
IFS=$'\n'
for item in $(find . -maxdepth 1 -type f | cut -c 3-); do
FileName=$item
iconv -f "windows-1252" -t "UTF-8" FileName -o FileName
done
IFS=$OLD_IFS

Related

How to read out a file line by line and for every line do a search with find and copy the search result to destination?

I hope you can help me with the following problem:
The Situation
I need to find files in various folders and copy them to another folder. The files and folders can contain white spaces and umlauts.
The filenames contain an ID and a string like:
"2022-01-11-02 super important file"
The filenames I need to find are collected in a textfile named ids.txt. This file only contains the IDs but not the whole filename as a string.
What I want to achieve:
I want to read out ids.txt line by line.
For every line in ids.txt I want to do a find search and copy cp the result to destination.
So far I tried:
for n in $(cat ids.txt); do find /home/alex/testzone/ -name "$n" -exec cp {} /home/alex/testzone/output \; ;
while read -r ids; do find /home/alex/testzone -name "$ids" -exec cp {} /home/alex/testzone/output \; ; done < ids.txt
The output folder remains empty. Not using -exec also gives no (search)results.
I was thinking that -name "$ids" is the root cause here. My files contain the ID + a String so I should search for names containing the ID plus a variable string (star)
As argument for -name I also tried "$ids *" "$ids"" *" and so on with no luck.
Is there an argument that I can use in conjunction with find instead of using the star in the -name argument?
Do you have any solution for me to automate this process in a bash script to read out ids.txt file, search the filenames and copy them over to specified folder?
In the end I would like to create a bash script that takes ids.txt and the search-folder and the output-folder as arguments like:
my-id-search.sh /home/alex/testzone/ids.txt /home/alex/testzone/ /home/alex/testzone/output
EDIT:
This is some example content of the ids.txt file where only ids are listed (not the whole filename):
2022-01-11-01
2022-01-11-02
2020-12-01-62
EDIT II:
Going on with the solution from tripleee:
#!/bin/bash
grep . $1 | while read -r id; do
echo "Der Suchbegriff lautet:"$id; echo;
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/ausgabe \;
done
In case my ids.txt file contains empty lines the -name "$id*" will be -name * which in turn finds all files and copies all files.
Trying to prevent empty line to be read does not seem to work. They should be filtered by the expression grep . $1 |. What am I doing wrong?
If your destination folder is always the same, the quickest and absolutely most elegant solution is to run a single find command to look for all of the files.
sed 's/.*/-o\n—name\n&*/' ids.txt |
xargs -I {} find -false {} -exec cp {} /home/alex/testzone/output +
The -false predicate is a bit of a hack to allow the list of actual predicates to start with -o (as in "or").
This could fail if ids.txt is too large to fit into a single xargs invocation, or if your sed does not understand \n to mean a literal newline.
(Here's a fix for the latter case:
xargs printf '-o\n-name\n%s*\n' <ids.txt |
...
Still the inherent problem with using xargs find like this is that xargs could split the list between -o and -name or between -name and the actual file name pattern if it needs to run more than one find command to process all the arguments.
A slightly hackish solution to that is to ensure that each pair is a single string, and then separately split them back out again:
xargs printf '-o_-name_%s*\n' <ids.txt |
xargs bash -c 'arr=("$#"); find -false ${arr[#]/-o_-name_/-o -name } -exec cp {} "$0"' /home/alex/testzone/ausgabe
where we temporarily hold the arguments in an array where each file name and its flags is a single item, and then replace the flags into separate tokens. This still won't work correctly if the file names you operate on contain literal shell metacharacters like * etc.)
A more mundane solution fixes your while read attempt by adding the missing wildcard in the -name argument. (I also took the liberty to rename the variable, since read will only read one argument at a time, so the variable name should be singular.)
while read -r id; do
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/output \;
done < ids.txt
Please try the following bash script copier.sh
#!/bin/bash
IFS=$'\n' # make newlines the only separator
set -f # disable globbing
file="files.txt" # name of file containing filenames
finish="finish" # destination directory
while read -r n ; do (
du -a | awk '{for(i=2;i<=NF;++i)printf $i" " ; print " "}' | grep $n | sed 's/ *$//g' | xargs -I '{}' cp '{}' $finish
);
done < $file
which copies recursively all the files named in files.txt from . and it's subfiles to ./finish
This new version works even if there are spaces in the directory names or file names.

Passing filename as variable from find's exec into a second exec command

From reading this stackoverflow answer I was able to remove the file extension from the files using find:
find . -name "S4*" -execdir basename {} .fastq.gz ';'
returned:
S9_S34_R1_001
S9_S34_R2_001
I'm making a batch script where I want to extract the filename with the above prefix to pass as arguments into a program. At the moment I'm currently doing this with a loop but am wondering if it can be achieved using find.
for i in $(ls | grep 'S9_S34*' | cut -d '.' -f 1); do echo "$i"_trim.log "$i"_R1_001.fastq.gz "$i"_R2_001.fastq.gz; done; >> trim_script.sh
Is it possible to do something as follows:
find . -name "S4*" -execdir basename {} .fastq.gz ';' | echo {}_trim.log {}_R1_001.fastq.gz {}_R2_001.fastq.gz {}\ ; >> trim_script.sh
You don't need basename at all, or -exec, if all you're doing is generating a series of strings that contain your file's basenames within them; the -printf action included in GNU find can do all that for you, as it provides a %P built-in to insert the basename of your file:
find . -name "S4*" \
-printf '%P_trim.log %P_R1_001.fastq.gz %P_R2_001.fastq.gz %P\n' \
>trim_script.sh
That said, be sure you only do this if you trust your filenames. If you're truly running the result as a script, there are serious security concerns if someone could create a S4$(rm -rf ~).txt file, or something with a similarly malicious name.
What if you don't trust your filenames, or don't have the GNU version of find? Then consider making find pass them into a shell (like bash or ksh) that supports the %q extension, to generate a safely-escaped version of those names (note that you should run the script with the same interpreter you used for this escaping):
find . -name "S4*" -exec bash -c '
for file do # iterates over "$#", so processes each file in turn
file=${file##*/} # get the basename
printf "%q_trim.log %q_R1_001.fastq.gz %q_R2_001.fastq.gz %q\n" \
"$file" "$file" "$file" "$file"
done
' _ {} + >trim_script.sh
Using -exec ... {} + invokes the smallest possible number of subprocesses -- not one per file found, but instead one per batch of filenames (using the largest possible batch that can fit on a command line).

Embedding a bash command inside the mv command

I have a directory that contains a list of files having the following format:
240-timestamp1.ts
240-timestamp2.ts
...
360-timestamp1.ts
360-timestamp2.ts
Now, I want to implement a bash command which matches the files that start with '240' and renames them so that instead of '240-timestampX.ts' the files look like '240-human-readable-timestampX.ts'.
I have tried the following:
find . -maxdepth 1 -mmin +5 -type f -name "240*"
-exec mv $0 {$0/240-***and here I want to insert
either stat -c %y filename or date -d #timestampX***} '{}' \;
I stuck here because I don't know if I can embed a bash command inside the mv command. I know the task may look a bit confusing and over-complicated, but I would like to know if it is possible to do so. Of course I can create a bash script that would go through all the files in the directory and while loop them with changing their respective names, but somehow I think that a single command would be more efficient (even if less readable).
The OS is Linux Ubuntu 12.04.5
The shell is bash
Thank you both Kenavoz and Kurt Stutsman for the proposed solutions. Both your answers perform the task; however, I marked Kenavoz's answer as the accepted one because of the degree of similarity between my question and Kenavoz's answer. Even if it is indeed possible to do it in a cleaner way with omitting the find command, it is necessary in my case to use the respective command because I need to find files older than X units of time. So thank you both once again!
In case you want to keep your mmin option, your can use find and process found files with a bash command using xargs :
find . -maxdepth 1 -mmin +5 -type f -name "240*.ts" | xargs -L 1 bash -c 'mv "${1}" "240-$(stat -c %y ${1}).ts"' \;
In bash if all your files are in a single directory, you don't need to use find at all. You can do a for loop:
for file in 240-*; do
hr_timestamp=$(date -d $(echo "$file" | sed 's/.*-\([0-9]*\)\.ts/\1/'))
mv "$file" "240-$hr_timestamp.ts"
done

No such file or directory when piping. Each command works separately, but not when piping

I have 2 folders: folder_a & folder_b. In each of these folders there are a bunch of files. I am trying to use sed to move all of these files out of these folders and into my current working directory I am currently in.
My folder structure looks like this:
mytest:
a:
1.txt
2.txt
3.txt
b:
4.txt
5.txt
The command I am trying to use is:
find . -type d ! -iname '*.*' # find all folders other than root
| sed -r 's/.*/&\/*/' # add '/*' to each of the arguments
| sed -r 'p;s/.*/./' # output: a/* . b/* .
| xargs -n 2 mv # should be creating two commands: 'mv a/* .' and 'mv b/* .'
Unfortunately I get an error:
mv: cannot stat './aaa/*': No such file or directory
I also get the same error when I try this other strategy (using ls instead of mv):
for dir in */; do
ls $dir;
done;
Even if I use sed to replace the spaces in each directory name with '\ ', or surround the directory names with quotes I get the same error.
I'm not sure if these 2 examples are related in my misunderstanding of bash but they both seem to demonstrate my ignorance of how bash translates the output from one command into the input of another command.
Can anyone shed some light on this?
Update: Completely rewritten.
As #EtanReisner and #melpomene have noted, mv */* . or, more specifically, mv a/* b/* . is the most straightforward solution, but you state that this is in part a learning exercise, so the remainder of the answer shows an efficient find-based solution and explains the problem with the original command.
An efficient find-based solution
Generally, if feasible, it's best and most efficient to let find itself do the work, without involving additional tools; find's -exec action is like a built-in xargs, with {} representing the path at hand (with terminator \;) / all paths (with +):
find . -type f -exec echo mv -t . {} +
To be safe, his will just print the mv commands that would be executed; remove the echo to actually execute them.
This will execute a single[1] mv command to which all matching files are passed, and -t . moves them all to the current dir.
[1] If the resulting command line is too long (which is unlikely), it is split up into multiple commands, just as with xargs.
Operating on files (-type f) bypasses the need for globbing, as find will then enumerate all files for you (it also bypasses the need to exclude . explicitly).
Note that this solution works on entire subtrees, not just (immediate) subdirectories.
It's tempting to consider turning on Bash 4's globstar option and using mv */** ., but that won't work, because it will attempt to move directories as well, not just the files in them.
A caveat re -exec with +: it only works if {} - the placeholder for all paths - is the token immediately before the +.
Since you're on Linux, we can satisfy this condition by specifying the target folder for mv with option -t before the {}; on BSD-based systems such as OSX, you could not do that, because mv doesn't support -t there, so you'd have to use terminator \;, which means that mv is called once for every path, which is obviously much slower.
Why your command didn't work:
As #EtanReisner points out in a comment, xargs invokes the command specified without (implicitly) involving a shell, so globbing won't work; you can verify this with the following command:
echo '*' | xargs echo # -> '*' - NO globbing
If we leave the globbing issue aside, additional work would have been necessary to make your xargs command work correctly with folder names with embedded spaces (or other shell metacharacters):
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -n 2 echo mv # NOTE: still won't work due to lack of globbing
Note how the (combined) sed command now produces a single output line '<input-path>'/* ., with the input path enclosed in embedded single-quotes, which is required for xargs to recognize <input-path> as a single argument, even if it contains embedded spaces.
(If your filenames contain single-quotes, you'd have to do more work; also note that since now all arguments for a given dir. are on a single line, you could use xargs -L 1 ....)
Also note how -mindepth 1 (only process paths at the subdirectory level or below) is used to skip processing of . itself.
The only way to make globbing happen is to get the shell involved:
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -I {} sh -c 'echo mv {}' # works, but is inefficient
Note the use of xargs' -I option to treat each input line as its own argument ({} is a self-chosen placeholder for the input).
sh -c invokes the (default) shell to execute the resulting command, at which globbing does happen.
However, overall, this is quite inefficient:
A pipeline with 3 segments is used.
A shell instance is invoked for every input path, which in turn calls the mv utility.
Compare this to the efficient find-only solution above, which (typically) creates only 2 processes in total.

find and copy all images in directory using terminal linux mint, trying to understand syntax

OS Linux Mint
Like the title says finally I would like to find and copy all images in a directory.
I found:
find all jpg (or JPG) files in a directory and copy them into the folder /home/joachim/neu2:
find . -iname \*.jpg -print0 | xargs -I{} -0 cp -v {} /home/joachim/neu2
and
find all image files in a direcotry:
find . -name '*' -exec file {} \; | grep -o -P '^.+: \w+ image'
My problem is first of all, I don't really understand the syntax. Could someone explain the code?
And secondly can someone connect the two codes for generating a code that does what I want ;)
Greetings and thanks in advance!
First, understand that the pipe "|" links commands piping the output of the first into the second as an argument. Your two shell codes both pipe output of the find command into other commands (grep and xargs). Let's look at those commands one after another:
First command: find
find is a program to "search for files in a directory hierarchy" (that is the explanation from find's man page). The syntax is (in this case)
find <search directory> <search pattern> <action>
In both cases the search directory is . (that is the current directory). Note that it does not just search the current directory but all its subdirectories as well (the directory hierarchy).
The search pattern accepts options -name (meaning it searches for files the name of which matches the pattern given as an argument to this option) or -iname (same as name but case insensitive) among others.
The action pattern may be -print0 (print the exact filename including its position in the given search directory, i.e. the relative or absolute path to the file) or -exec (execute the given command on the file(s), the command is to be ended with ";" and every instance of "{}" is replaced by the filename).
That is, the first shell code (first part, left of the pipe)
find . -iname \*.jpg -print0
searches all files with ending ".jpg" in the current directory hierarchy and prints their paths and names. The second one (first part)
find . -name '*' -exec file {} \;
finds all files in the current directory hierarchy and executes
file <filename>
on them. File is another command that determines and prints the file type (have a look at the man page for details, man file).
Second command: xargs
xargs is a command that "builds and exectues command lines from standard input" (man xargs), i.e. from the find output that is piped into xargs. The command that it builds and executes is in this case
cp -v {} /home/joachim/neu2"
Option -I{} defines the replacement string, i.e. every instance of {} in the command is to be replaced by the input it gets from file (that is, the filenames). Option -0 defines that input items are not terminated (seperated) by whitespace or newlines but only by a null character. This seems to be necessary when using and the standard way to deal with find output as xargs input.
The command that is built and executed is then of course the copy command with option -v (verbose) and it copies each of the filenames it gets from find to the directory.
Third command: grep
grep filters its input giving only those lines or strings that match a particular output pattern. Option -o tells grep to print only the matching string, not the entire line (see man grep), -P tells it to interpret the following pattern as a perl regexp pattern. In perl regex, ^ is the start of the line, .+ is any arbitrary string, this arbitrary should then be followed by a colon, a space, a number of alphanumeric characters (in perl regex denoted \w+) a space and the string "image". Essentially this grep command filters the file output to only output the filenames that are image files. (Read about perl regex's for instance here: http://www.comp.leeds.ac.uk/Perl/matching.html )
The command you actually wanted
Now what you want to do is (1) take the output of the second shell command (which lists the image files), (2) bring it into the appropriate form and (3) pipe it into the xargs command from the first shell command line (which then builds and executes the copy command you wanted). So this time we have a three (actually four) stage shell command with two pipes. Not a problem. We already have stages (1) and (3) (though in stage (3) we need to leave out the -0 option because the input is not find output any more; we need it to treat newlines as item seperators).
Stage (2) is still missing. I suggest using the cut command for this. cut changes strings py splitting them into different fields (seperated by a delimiter character in the original string) that can then be rearranged. I will choose ":" as the delimiter character (this ends the filename in the grep output, option -d':') and tell it to give us just the first field (option -f1, essentialls: print only the filename, not the part that comes after the ":"), i.e. stage (2) would then be
cut -d':' -f1
And the entire command you wanted will then be:
find . -name '*' -exec file {} \; | grep -o -P '^.+: \w+ image' | cut -d':' -f1 | xargs -I{} cp -v {} /home/joachim/neu2
Note that you can find all the man pages for instance here: http://www.linuxmanpages.com
I figured out a command only using awk that does the job as well:
find . -name '*' -exec file {} \; |
awk '{
if ($3=="image"){
print substr($1, 0, length($1)-1);
system("cp " substr($1, 0, length($1)-1) " /home/joachim/neu2" )
}
}'
the substr($1, 0, length($1)-1) is needed because in first column file returns name;
The above answer is really good. but it could take longer if it a huge directory.
here is a shorter version of it , if you already know your file extension
find . -name \*.jpg | cut -d':' -f1 | xargs -I{} cp --parents -v {} ~/testimage/
Here's another one which works like a charm.
It adds the EPOCH time to prevent overwriting files with the same name.
cd /media/myhome/'Local station'/
find . -path ./jpg -prune -o -type f -iname '*.jpg' -exec sh -c '
for file do
newname="${file##*/}"
newname="${newname%.jpg}"
mv -T -- "$file" "/media/myhome/Local station/jpg/$newname-$(date +%s).jpg"
done
' find-sh {} +
cd ~/
It's been designed by Kamil in this post here.
Find a specific type file from a directory:
find /home/user/find/data/ -name '*' -exec file {} \; | grep -o -P '^.+: \w+ image'
Copy specific type of file from one directory to another directory:
find /home/user/find/data/ -name '*' -exec file {} \; | grep -o -P '^.+: \w+ image' | cut -d':' -f1 | xargs -I{} cp -v {} /home/user/copy/data/

Resources