How to rename multiple files with the same name when moving them from all subdirectories into one new location - linux

I wanted to group some log files from multiple subdirectories into one new.
The problem is with moving multiple files with exact same names from multiple location into one.
I wanted to use something as below, but I need to add something to change the names of the files when copying.
find . -name "raml-proxy.log" -exec mv {} raml_log_files/ \;

A bit of find and awk could produce the list of mv commands you need. Let's first assume that your log files have quiet names without newline characters:
find . -type f -name "raml-proxy.log" |
awk -v dir="raml_log_files" '{s=$0; sub(/.*\//,""); n[$0]++;
printf("mv \"%s\" \"%s/%s.%d\"\n", s, dir, $0, n[$0])}' > rename.sh
And then, after careful inspection of file rename.sh, just execute it. Explanation: sub(/.*\//,"") removes the directory part, if any, from the current record, including the last / character. n is an associative array where the keys are the log file names and the values are a counter that increments each time a log file with that name is encountered. Demo:
$ mkdir -p a b c d
$ touch a/a b/a c/a d/b
$ find . -type f | awk -v dir="raml_log_files" '{s=$0; sub(/.*\//,""); n[$0]++;
printf("mv \"%s\" \"%s/%s.%d\"\n", s, dir, $0, n[$0])}'
mv "./b/a" "raml_log_files/a.1"
mv "./a/a" "raml_log_files/a.2"
mv "./d/b" "raml_log_files/b.1"
mv "./c/a" "raml_log_files/a.3"
If there can be newline characters in the names of your log files we can use the NUL character as record separator, instead of the newline:
find . -type f -name "raml-proxy.log" -print0 |
awk -v dir="raml_log_files" -v RS=$'\\0' '{s=$0; sub(/.*\//,""); n[$0]++;
printf("mv \"%s\" \"%s/%s.%d\"\n", s, dir, $0, n[$0])}' > rename.sh

Related

How to read out a file line by line and for every line do a search with find and copy the search result to destination?

I hope you can help me with the following problem:
The Situation
I need to find files in various folders and copy them to another folder. The files and folders can contain white spaces and umlauts.
The filenames contain an ID and a string like:
"2022-01-11-02 super important file"
The filenames I need to find are collected in a textfile named ids.txt. This file only contains the IDs but not the whole filename as a string.
What I want to achieve:
I want to read out ids.txt line by line.
For every line in ids.txt I want to do a find search and copy cp the result to destination.
So far I tried:
for n in $(cat ids.txt); do find /home/alex/testzone/ -name "$n" -exec cp {} /home/alex/testzone/output \; ;
while read -r ids; do find /home/alex/testzone -name "$ids" -exec cp {} /home/alex/testzone/output \; ; done < ids.txt
The output folder remains empty. Not using -exec also gives no (search)results.
I was thinking that -name "$ids" is the root cause here. My files contain the ID + a String so I should search for names containing the ID plus a variable string (star)
As argument for -name I also tried "$ids *" "$ids"" *" and so on with no luck.
Is there an argument that I can use in conjunction with find instead of using the star in the -name argument?
Do you have any solution for me to automate this process in a bash script to read out ids.txt file, search the filenames and copy them over to specified folder?
In the end I would like to create a bash script that takes ids.txt and the search-folder and the output-folder as arguments like:
my-id-search.sh /home/alex/testzone/ids.txt /home/alex/testzone/ /home/alex/testzone/output
EDIT:
This is some example content of the ids.txt file where only ids are listed (not the whole filename):
2022-01-11-01
2022-01-11-02
2020-12-01-62
EDIT II:
Going on with the solution from tripleee:
#!/bin/bash
grep . $1 | while read -r id; do
echo "Der Suchbegriff lautet:"$id; echo;
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/ausgabe \;
done
In case my ids.txt file contains empty lines the -name "$id*" will be -name * which in turn finds all files and copies all files.
Trying to prevent empty line to be read does not seem to work. They should be filtered by the expression grep . $1 |. What am I doing wrong?
If your destination folder is always the same, the quickest and absolutely most elegant solution is to run a single find command to look for all of the files.
sed 's/.*/-o\n—name\n&*/' ids.txt |
xargs -I {} find -false {} -exec cp {} /home/alex/testzone/output +
The -false predicate is a bit of a hack to allow the list of actual predicates to start with -o (as in "or").
This could fail if ids.txt is too large to fit into a single xargs invocation, or if your sed does not understand \n to mean a literal newline.
(Here's a fix for the latter case:
xargs printf '-o\n-name\n%s*\n' <ids.txt |
...
Still the inherent problem with using xargs find like this is that xargs could split the list between -o and -name or between -name and the actual file name pattern if it needs to run more than one find command to process all the arguments.
A slightly hackish solution to that is to ensure that each pair is a single string, and then separately split them back out again:
xargs printf '-o_-name_%s*\n' <ids.txt |
xargs bash -c 'arr=("$#"); find -false ${arr[#]/-o_-name_/-o -name } -exec cp {} "$0"' /home/alex/testzone/ausgabe
where we temporarily hold the arguments in an array where each file name and its flags is a single item, and then replace the flags into separate tokens. This still won't work correctly if the file names you operate on contain literal shell metacharacters like * etc.)
A more mundane solution fixes your while read attempt by adding the missing wildcard in the -name argument. (I also took the liberty to rename the variable, since read will only read one argument at a time, so the variable name should be singular.)
while read -r id; do
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/output \;
done < ids.txt
Please try the following bash script copier.sh
#!/bin/bash
IFS=$'\n' # make newlines the only separator
set -f # disable globbing
file="files.txt" # name of file containing filenames
finish="finish" # destination directory
while read -r n ; do (
du -a | awk '{for(i=2;i<=NF;++i)printf $i" " ; print " "}' | grep $n | sed 's/ *$//g' | xargs -I '{}' cp '{}' $finish
);
done < $file
which copies recursively all the files named in files.txt from . and it's subfiles to ./finish
This new version works even if there are spaces in the directory names or file names.

using grep in single-line files to find the number of occurrences of a word/pattern

I have json files in the current directory, and subdirectories. All the files have a single line of content.
I want to a list of all files that contain the word XYZ, and the number of times it occurs in that file.
I want to print the list according to the following format:
file_name pattern_occurence_times
It should look something like:
.\x1\x2\file1.json 3
.\x1\file3.json 2
The problem is that grep counts the NUMBER of lines containing XYZ, not the number of occurrences.
Since the whole content of the files is always contained in a single line, the count is always 1 (if the pattern occurs in the file).
I used this command for that:
find . -type f -name "*.json" -exec grep --files-with-match -i 'xyz' {} \; -exec grep -wci 'xyz' {} \;
I wrote a python code, and it works, but I would like to know if there is any way of doing that using find and grep or any other command line tools.
Thanks
The classical approach to this problem is the pipeline grep -o regex file | wc -l. However, to execute a pipeline in find's -exec you have to run a shell (e.g. sh -c ... ). But all these things together will only print the number of matches, not the file names. Also, files with no matches have to be filtered out.
Because of all of this I think a single awk command would be preferable:
find ... -type f -exec awk '{$0=tolower($0); c+=gsub(/xyz/,"")}
END {if(c>0) print FILENAME " " c}' {} \;
Here the tolower($0) emulates grep's -i option. Make sure to write your search pattern xyz only in lowercase.
If you want to combine this with subsequent filters in find you can add else exit 1 at the end of the last awk block to continue (inside find) only with the printed files.
Use the -o option of grep, e.g. in conjunction with wc, e.g.
find . -name "*.json" | while read -r f ; do
echo $f : $(grep -ow XYZ "$f" | wc -l)
done

How to iterate on files in a directory robustly and portably, oldest first?

"Robustly" -- I'm trying to handle files with spaces and newlines and other special characters in the file names -- only because I've seen some special characters despite being told my target environment wouldn't have them.
"Portably" -- needs to run on a wide variety of Linux machines, including BusyBox.
"Oldest" -- by modification time, with names as a tiebreaker would suffice for my purpose.
Normally I'd use find -type f -printf '%T+ %p\n' 2>/dev/null | sort | cut -d' ' -f2 | while read filename; do mything "$filename"; done
But on this particular BusyBox v1.18.4 find does not support -printf so I'm not sure I can use find to expose the modification times for sorting.
Usage: find [PATH]... [EXPRESSION]
Search for files. The default PATH is the current directory,
default EXPRESSION is '-print'
EXPRESSION may consist of:
-follow Follow symlinks
-xdev Don't descend directories on other filesystems
-maxdepth N Descend at most N levels. -maxdepth 0 applies
tests/actions to command line arguments only
-mindepth N Don't act on first N levels
-name PATTERN File name (w/o directory name) matches PATTERN
-iname PATTERN Case insensitive -name
-path PATTERN Path matches PATTERN
-regex PATTERN Path matches regex PATTERN
-type X File type is X (X is one of: f,d,l,b,c,...)
-perm NNN Permissions match any of (+NNN), all of (-NNN),
or exactly NNN
-mtime DAYS Modified time is greater than (+N), less than (-N),
or exactly N days
-mmin MINS Modified time is greater than (+N), less than (-N),
or exactly N minutes
-newer FILE Modified time is more recent than FILE's
-inum N File has inode number N
-user NAME File is owned by user NAME (numeric user ID allowed)
-group NAME File belongs to group NAME (numeric group ID allowed)
-depth Process directory name after traversing it
-size N[bck] File size is N (c:bytes,k:kbytes,b:512 bytes(def.))
+/-N: file size is bigger/smaller than N
-links N Number of links is greater than (+N), less than (-N),
or exactly N
-print Print (default and assumed)
-print0 Delimit output with null characters rather than
newlines
-exec CMD ARG ; Run CMD with all instances of {} replaced by the
matching files
-prune Stop traversing current subtree
-delete Delete files, turns on -depth option
(EXPR) Group an expression
And I read that looping over the output of ls is a bad idea.
My environment also does not support stat -c format but I can use stat -t ...
So my current idea is to
cd "$1"
if [ $? -ne 0 ]; then
printf 'failed cd'
exit 1
fi
while read name
do
mything "$name"
done <<< "$(stat -t ./* \
| awk -F' ' '{print $13 " " $1}' \
| sort -r \
| awk -F' ' '{print $2}')"
Is there a better way? I'm relying on stat's 'terse' format being consistent.

How do I classify files in Linux server by their names?

How can use the ls command and options to list the repetitious filenames that are in different directories?
You can't use a single, basic ls command to do this. You'd have to use a combination of other POSIX/Unix/GNU utilities. For example, to find the duplicate filenames first:
find . -type f -exec basename "\{}" \; | sort | uniq -d > dupes
This means find all the files (-type f) through the entire directory hierarchy in the current directory (.), and execute (-exec) the command basename (which strips the directory portion) on the found file (\{}), end of command (\;). These files then sort and print out duplicate lines (uniq -d). The result goes in the file dupes. Now you have the filenames that are duplicated, but you don't know what directory they are in. Use find again to find them. Using bash as your shell:
while read filename; do find . -name "$filename" -print; done < dupes
This means loop through (while) all contents of file dupes and read into the variable filename each line. For each line, execute find again and search for the specific -name of the $filename and print it out (-print, but it's implicit so this is redundant).
Truth be told you can combine these without using an intermediate file:
find . -type f -exec basename "\{}" \; | sort | uniq -d | while read filename; do find . -name "$filename" -print; done
If you're not familiar with it, the | operator means, execute the following command using the output of the previous command as the input to the following command. Example:
eje#EEWANCO-PC:~$ mkdir test
eje#EEWANCO-PC:~$ cd test
eje#EEWANCO-PC:~/test$ mkdir 1 2 3 4 5
eje#EEWANCO-PC:~/test$ mkdir 1/2 2/3
eje#EEWANCO-PC:~/test$ touch 1/0000 2/1111 3/2222 4/2222 5/0000 1/2/1111 2/3/4444
eje#EEWANCO-PC:~/test$ find . -type f -exec basename "\{}" \; | sort | uniq -d | while read filename; do find . -name "$filename" -print; done
./1/0000
./5/0000
./1/2/1111
./2/1111
./3/2222
./4/2222
Disclaimer: The requirement stated that the filenames were all numbers. While I have tried to design the code to handle filenames with spaces (and in tests on my system, it works), the code may break when it encounters special characters, newlines, nuls, or other unusual situations. Please note that the -exec parameter has special security considerations and should not be used by root over arbitrary user files. The simplified example provided is intended for illustrative and didactic purposes only. Please consult your man pages and relevant CERT advisories for full security implications.
I have a function in my bash profile (bash 4.4) for duplicate files.
It is true that find is the correct tool.
I use find combined with -print0 options which separates the find results with null char instead of new lines (default find action). Now i can catch all files under current directory and subdirectories.
This will ensure that results will be correct no matter if filenames contain special chars like spaces or new lines (in some very rare cases). Instead of double running find against find, you can built an array and just locate the duplicate files in this array. Then you grep the whole array using the "duplicates" as pattern.
So something like this works ok for my function:
$ IFS= readarray -t -d '' fn< <(find . -name 'file*' -print0)
$ dupes=$(LC_ALL=C sort <(printf '\<%s\>$\n' "${fn[#]##*/}") |uniq -d)
$ grep -e "$dupes" <(printf '%s\n' "${fn[#]}") |awk -F/ '{print $NF,"==>",$0}' |LC_ALL=C sort
This is a test:
$ IFS= readarray -t -d '' fn< <(find . -name 'file*' -print0)
# find all files and load them in an array using null delimiter
$ printf '%s\n' "${fn[#]}" #print the array
./tmp/file7
./tmp/file14
./tmp/file11
./tmp/file8
./tmp/file9
./tmp/tmp2/file09 99
./tmp/tmp2/file14.txt
./tmp/tmp2/file15.txt
./tmp/tmp2/file$100
./tmp/tmp2/file14.txt.bak
./tmp/tmp2/file15.txt.bak
./tmp/file1
./tmp/file4
./file09 99
./file14
./file$100
./file1
$ dupes=$(LC_ALL=C sort <(printf '\<%s\>$\n' "${fn[#]##*/}") |uniq -d)
#Locate duplicate files
$ echo "$dupes"
\<file$100\>$ #Mind this one with special char $ in filename
\<file09 99\>$ #Mind also this one with spaces
\<file14\>$
\<file1\>$
#I have on purpose enclose the results between \<...\> to force grep later to capture full words and avoid file1 to match file1.txt or file11
$ grep -e "$dupes" <(printf '%s\n' "${fn[#]}") |awk -F/ '{print $NF,"==>",$0}' |LC_ALL=C sort
file$100 ==> ./file$100 #File with special char correctly captured
file$100 ==> ./tmp/tmp2/file$100
file09 99 ==> ./file09 99 #File with spaces in name also correctly captured
file09 99 ==> ./tmp/tmp2/file09 99
file1 ==> ./file1
file1 ==> ./tmp/file1
file14 ==> ./file14 #other files named file14 like file14.txt and file14.txt.bak not captured since they are not duplicates.
file14 ==> ./tmp/file14
Tips:
This one <(printf '\<%s\>$\n' "${fn[#]##*/}") uses process substitution on the basename of the find results using bash built in parameter expansion techniques.
LC_ALL=C is required on sorting in order filenames to be sorted correctly.
In bash versions before 4.4 , the readarray does not accept -d option (delimiter). In this case you can transform find results to an array with
while IFS= read -r -d '' res;do fn+=( "$res" );done < <(find.... -print0)

grep filenames matching a pattern and move to desired folder

I have a list of patterns in a .txt file. [list.txt]. Foreach line in list.txt, I want to find all the files at a location which begin with the specified pattern in list.txt, and then move these files to another location.
Consider an example case.
at ~/home/ana/folder_a I have list.txt, which looks like this...
list.txt
1abc
2def
3xyz
At this location i.e /home/ana/folder_a/, there are multiple files which are beginning with the patterns in list.txt. So, there are files like 1abc_a.txt, 1abc_c.txt, 1abc_f.txt, 2def_g.txt, 3xyz_a.txt
So what I want to achieve is this:
for i in cat list.txt; do
ls | grep '^$i' [thats the pattern] |
mv [files containing the pattern] to /home/ana/folder_b/
Please note that at the other location, i.e /home/ana/folder_b/ I have already created directories, specific for each pattern.
So /home/ana/folder_b/ contains subdirectories like 1abc/ , 2def/ , 3xyz/
In effect, I wish to move all the files matching pattern '1abc', '2def' and '3xyz' from /home/ana/folder_a/ to their respective sub-directories in /home/ana/folder_b/, such that /home/ana/folder_b/1abc will have 1abc_a.txt , 1abc_c.txt , and 1abc_f.txt ; /home/ana/folder_b/2def/ will have 2def_g.txt and /home/ana/folder_b/3xyz/ will have 3xyz_a.txt
Grep's -f option matches patterns from a file so you don't have to loop over each line in the file in shell:
$ ls # List all files in dir, some match, some don't
1abc_a.txt 1abc_c.txt 1abc_f.txt 2def_g.txt 3xyz_a.txt file1 file2 list.txt
$ cat list.txt # List patterns to match against
1abc
2def
3xyz
$ ls | grep -f list.txt # grep for files that only match pattern
1abc_a.txt
1abc_c.txt
1abc_f.txt
2def_g.txt
3xyz_a.txt
Pipe to xargs to do the move:
ls | grep -f list.txt | xargs -i -t mv {} ../folder_B
mv 1abc_a.txt ../folderB
mv 1abc_c.txt ../folderB
mv 1abc_f.txt ../folderB
mv 2def_g.txt ../folderB
mv 3xyz_a.txt ../folderB
Edit: Realised I missed the subdirectory part of the question, #Thor's answers is the best approach for this, still I think you might find some use from this answer.
I think glob expansion is the way to go here:
while read pattern; do
mv "${pattern}"* ../folder_b/"$pattern"
done < list.txt
Start with an echo in front of the mv command, and remove it when you're happy with the output.
i'd suggest using the -exec action of find to call mv in your loop.
beginning file structure: (as you can see, i'm calling this from the parent of folder_a and folder_b)
$ find
.
./folder_a
./folder_a/1abc_a.txt
./folder_a/1abc_c.txt
./folder_a/1abc_f.txt
./folder_a/2def_g.txt
./folder_a/3xyz_a.txt
./folder_b
./folder_b/1abc
./folder_b/2def
./folder_b/3xyz
./list.txt
$ cat list.txt
1abc
2def
3xyz
command:
while read pattern
do
find ./folder_a -type f -name "$pattern*" -exec mv "{}" "./folder_b/$pattern" \;
done <list.txt
alternate command (same thing, just all on one line):
while read pattern; do find ./folder_a -type f -name "$pattern*" -exec mv "{}" "./folder_b/$pattern" \;; done <list.txt
resulting file structure:
$ find
.
./folder_a
./folder_b
./folder_b/1abc
./folder_b/1abc/1abc_a.txt
./folder_b/1abc/1abc_c.txt
./folder_b/1abc/1abc_f.txt
./folder_b/2def
./folder_b/2def/2def_g.txt
./folder_b/3xyz
./folder_b/3xyz/3xyz_a.txt
./list.txt

Resources