copy files along with their (last) folders - linux

I can find and copy all the files to a given folder using find -exec command.
But what I need to do is to find and copy all the files within a given path along with its folder in which it has been saved. So ....
/path/to/file/is/abc.txt
/another/file/is/here/xyz.txt
I need to copy these 2 files along with their path to the following folder:
/mysql/data/
The new file structure will look like this...
/mysql/data/is/abc.txt
/mysql/data/here/xyz.txt
This is done in order to avoid possible overwrite of duplicate file names. The last folder names will be unique but file names may be the same.
What is the best way to do this?

Here's a concise script with a rather long explanation* to accompany it.
for oldpath in $your_file_list; do
mv ${oldpath} /mysql/data${oldpath##$(dirname $(dirname $oldpath))}
done
How it works
The dirname utility removes everything up to and including the last forward slash (/) from a path. Invoking it twice will remove everything up to and including the second-to-last slash.
The idiom $(command with params) executes command with the parameters with params and returns the output.
The idiom ${var##prefix} returns the contents of the variable var with prefix removed.
Step-by-step Analysis
If oldpath is /path/to/file/is/abc.txt, then:
dirname $oldpath is /path/to/file/is
dirname $(dirname $oldpath) is /path/to/file
${oldpath##$(dirname $(dirname $oldpath))} is /is/abc.txt
which is the portion of the original path that will be appended to the new path.
* Elegant (adj.) software: any software that implements an algorithm, whose explanation is longer than the implementation itself.

You're going to have to script/program this solution.
Quick python example follows:
import os
import shutils
src_root = '/path/to/walk/'
dst_root = '/mysql/data/'
for root,dirs,files in os.walk(src_root):
for file in files:
dst_path = os.path.split(root)[1]
dst_path = os.path.join(dst_root, dst_path)
os.makedirs(dst_path)
src = os.path.join(root,i file)
dst = os.path.join(dst_path, file)
shutils.copyfile(srd, dst)

This might work for you:
a=/mysql/data
sed 's|.*\(/[^/]*/[^/]*\)|mv -v & '"$a"'\1|' file
mv -v /path/to/file/is/abc.txt /mysql/data/is/abc.txt
mv -v /another/file/is/here/xyz.txt /mysql/data/here/xyz.txt
Study the output and if all OK, then run:
sed 's|.*\(/[^/]*/[^/]*\)|mv -v & '"$a"'\1|' file | bash

If you want to copy a file first you need to create a directory for it. You can do that using single find command, but I'm not sure about efficiency of this solution?
#!/bin/bash
# $1 - destination
find $1 -type f -exec bash -c '
dest="$2"; # $2 is second argument passed to the script
dir=$(basename $(dirname $1));
mkdir $dest/$dir 2>/dev/null;
cp $1 "$dest/$dir/";
' -- {} $2 \; # {} = $1 in bash '..' and $2=$2
usage: ./copy copy_from copy_to
edit:
that looks better:
#!/bin/bash
dest=$2
from=$1
# copy_file from dest
copy_file() {
dir=$(basename $(dirname $from))
mkdir $dest/$dir
cp $from $dest/$dir
}
find $from -type f | while read file; do copy_file $file $dest; done

Related

Linux bash How to use a result of a wildcard as a file name in a copy command

I'm writing a Linux script to copy files from a folder structure in to one folder. I want to use a varying folder name as the prefix of the file name.
My current script looks like this. But, I can't seem to find a way to use the folder name from the wildcard as the file name;
for f in /usr/share/storage/*/log/myfile.log*; do cp "$f" /myhome/docs/log/myfile.log; done
My existing folder structure/files as follows and I want the files copied as;
>/usr/share/storage/100/log/myfile.log --> /myhome/docs/log/100.log
>/usr/share/storage/100/log/myfile.log.1 --> /myhome/docs/log/100.log.1
>/usr/share/storage/102/log/myfile.log --> /myhome/docs/log/102.log
>/usr/share/storage/103/log/myfile.log --> /myhome/docs/log/103.log
>/usr/share/storage/103/log/myfile.log.1 --> /myhome/docs/log/103.log.1
>/usr/share/storage/103/log/myfile.log.2 --> /myhome/docs/log/103.log.2
You could use a regular expression match to extract the desired component, but it is probably easier to simply change to /usr/share/storage so that the desired component is always the first one on the path.
Once you do that, it's a simple matter of using various parameter expansion operators to extract the parts of paths and file names that you want to use.
cd /usr/share/storage
for f in */log/myfile.log*; do
pfx=${f%%/*} # 100, 102, etc
dest=$(basename "$f")
dest=$pfx.${dest#*.}
cp -- "$f" /myhome/docs/log/"$pfx.${dest#*.}"
done
One option is to wrap the for loop in another loop:
for d in /usr/share/storage/*; do
dir="$(basename "$d")"
for f in "$d"/log/myfile.log*; do
file="$(basename "$f")"
# test we found a file - glob might fail
[ -f "$f" ] && cp "$f" /home/docs/log/"${dir}.${file}"
done
done
for f in /usr/share/storage/*/log/myfile.log*; do cp "$f" "$(echo $f | sed -re 's%^/usr/share/storage/([^/]*)/log/myfile(\.log.*)$%/myhome/docs/log/\1\2%')"; done

How to rename file based on parent and child folder name in bash script

I would like to rename file based on parent/subparent directories name.
For example:
test.xml file located at
/usr/local/data/A/20180101
/usr/local/data/A/20180102
/usr/local/data/B/20180101
how to save test.xml file in /usr/local/data/output as
A_20180101_test.xml
A_20180102_test.xml
b_20180101_test.xml
tried shall script as below but does not help.
#!/usr/bin/env bash
target_dir_path="/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
l1="${file%%/*}"
l2="${file#*/}"
l2="${l2%%/*}"
filename="${file##*/}"
target_file_name="${l1}_${l2}_${filename}"
echo cp "$file" "${target_dir_path}/${target_file_name}"
done
Anything i am doing wrong in this shall script?
You can use the following command to do this operation:
source_folder="usr/local/data/";target_folder="target"; find $source_folder -type f -name test.xml | awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' | xargs -n2 cp;
or on several lines for readibility:
source_folder="usr/local/data/";
target_folder="target";
find $source_folder -type f -name test.xml |\
awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' |\
xargs -n2 cp;
where
target_folder is your target folder
source_folder is your source folder
the find command will search for all the test.xml named files present under this source folder
then the awk command will receive the target folder as a variable to be able to use it, then in the BEGIN bloc you define the field separator and output field separator, then you just print the initial filename as well as the new one
you use xargs to pass the result output grouped by 2 to the cp command and the trick is done
TESTED:
TODO:
you will just need to set up your source_folder and target_folder variables with what is on your environment and eventually put it in a script and you are good to go!
I've modified your code a little to get it to work. See comments in code
target_dir_path=""/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
tmp=${file%/*/*/*}
curr="${file#"$tmp/"}" # Extract wanted part of the filename
mod=${curr//[\/]/_} # Replace forward slash with underscore
mv "$file" "$target_dir_path$mod" # Move the file
done
if you have perl based rename command
$ for f in tst/*/*/test.xml; do
rename -n 's|.*/([^/]+)/([^/]+)/(test.xml)|./$1_$2_$3|' "$f"
done
rename(tst/A/20180101/test.xml, ./A_20180101_test.xml)
rename(tst/A/20180102/test.xml, ./A_20180102_test.xml)
rename(tst/B/20180101/test.xml, ./B_20180101_test.xml)
-n option is for dry run, remove it after testing
change tst to /usr/local/data and ./ to /usr/local/data/output/ for your usecase
.*/ to ignore file path
([^/]+)/([^/]+)/(test.xml) capture required portions
$1_$2_$3 re-arrange as required

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.
You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage
You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma
I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

Append directory name to the end of the files and move them

I am finding some files in a directory using this command:
find /Users/myname -type f
output is:
/Users/myname/test01/logs1/err.log
/Users/myname/test01/logs1/std
/Users/myname/test01/logs2/std
/Users/myname/test02/logs2/velocity.log
/Users/myname/test03/logs3/err.log
/Users/myname/test03/logs3/invalid-arg
I need to move this files to a different directory by appending the test directory name to the end of the files. Like below:
err.log-test01
std-test01
std-test01
velocity.log-test02
err.log-test03
invalid-arg-test03
I am trying with the cut command but not getting the desired output.
find /Users/myname -type f | cut -d'/' -f6,4
plus, I also need to move the files to a different directory. I guess a suitable way could be there using sed command, but I am not proficient with sed. How this can be achieved in an efficient way?
You can let find create the mv command, use sed to modify it and then have it run by the shell:
find /Users/myname -type f -printf "mv %p /other/dir/%f\n" |
sed 's,/\(test[0-9]*\)/\(.*\),/\1/\2-\1,' | sh
This assumes there are no spaces in any argument, otherwise liberally add ' or ". Also run it without the final | sh to see what it actually wants to do. If you need to anchor the test[0-9]* pattern better you can include part of the left or right string to match:
's,myname/\(test[0-9]*\)/\(.*\),myname/\1/\2-\1,'
You can move it from the dst to the dst_dir appending the directory, using awk, and the target name would be awk -F/ '{print $5 "-" $4}'. The full command could be as simple as:
for i in `find . -type f`
do mv $i /dst_dir/`echo $i| awk -F/ '{print $5 "-" $4}' `
done
There are a number of things going on that you may want to use a helper script with find to insure you can validate the existence of the directory to move the files to, etc.. A script might take the form of:
#!/bin/bash
[ -z $1 -o -z $2 ] && { # validate at least 2 arguments
printf "error: insufficient input\n"
exit 1
}
ffn="$1" # full file name provided by find
newdir="$2" # the target directory
# validate existence of 'newdir' or create/exit on failure
[ -d "$newdir" ] || mkdir -p "$newdir"
[ -d "$newdir" ] || { printf "error: uname to create '$newdir'\n"; exit 1; }
tmp="${ffn##*test}" # get the test## number
num="${tmp%%/*}"
fn="${ffn##*/}" # remove existing path from ffn
mv "$ffn" "${newdir}/${fn}-test${num}" # move to new location
exit 0
Save it in a location where it is accessible under a name like myscript and make it executable (e.g. chmod 0755 myscript) You may also choose to put it in a directory within your path. You can then call the script for every file returned by find with:
find /Users/myname -type f -exec ./path/to/myscript '{}' somedir \;
Where somedir is the target directory for the renamed file. Helper scripts generally provide the ability to do required validation that would otherwise not be done in one-liners.

Move files and rename - one-liner

I'm encountering many files with the same content and the same name on some of my servers. I need to quarantine these files for analysis so I can't just remove the duplicates. The OS is Linux (centos and ubuntu).
I enumerate the file names and locations and put them into a text file.
Then I do a for statement to move the files to quarantine.
for file in $(cat bad-stuff.txt); do mv $file /quarantine ;done
The problem is that they have the same file name and I just need to add something unique to the filename to get it to save properly. I'm sure it's something simple but I'm not good with regex. Thanks for the help.
Since you're using Linux, you can take advantage of GNU mv's --backup.
while read -r file
do
mv --backup=numbered "$file" "/quarantine"
done < "bad-stuff.txt"
Here's an example that shows how it works:
$ cat bad-stuff.txt
./c/foo
./d/foo
./a/foo
./b/foo
$ while read -r file; do mv --backup=numbered "$file" "./quarantine"; done < "bad-stuff.txt"
$ ls quarantine/
foo foo.~1~ foo.~2~ foo.~3~
$
I'd use this
for file in $(cat bad-stuff.txt); do mv $file /quarantine/$file.`date -u +%s%N`; done
You'll get everyfile with a timestamp appended (in nanoseconds).
You can create a new file name composed by the directory and the filename. Thus you can add one more argument in your original code:
for ...; do mv $file /quarantine/$(echo $file | sed 's:/:_:g') ; done
Please note that you should replace the _ with a proper character which is special enough.

Resources