recursively copy files with stripping prefix - linux

I'm trying to create a GNU Makefile rule that copies files (found via VPATH) from one directory to another, preserving their directory structure.
There are zillions of ways to do this (starting with cp -r) but it seems that none of them work in the context of make, where the copying is initiated in the target directory.
E.g.
cp ../src/foo.c ../src/bar.c .
All the source files share a common directory (only known at runtime), and this common directory should be stripped away.
E.g.
$ srcdir=../../knurgl
$ cp ${srcdir}/src/foo.c ${srcdir}/src/bar.c .
$ find . -type f
./src/foo.c
./src/bar.c
even though the common directory is known at runtime, it can be arbitrary and even include the current directory . (in which case the operation should be a nop).
This is what i tried:
cp
cp --parent ${srcdir}/src/foo.c ${srcdir}/src/bar.c .
but rightfully this refuses to work when called from the target directory (as it would always copy the files onto themselves).
tar
tar c ${srcdir}/src/foo.c ${srcdir}/src/bar.c | tar x
this strips away any relative directories, but keeps the rest (so I end up with ./knurgl/src/foo.c instead of ./src/foo.c.
The --strip-components option doesn't help me much, as i don't know the depth of ${srcdir}.

Instead of
cp --parent ${srcdir}/src/foo.c ${srcdir}/src/bar.c .
(which doesn't work because it doesn't strip $srcdir) you can write
(wd=$PWD; cd $srcdir; cp --parent src/foo.c src/bar.c $wd)

make has built-in functions for handling strings. To replace old_base_dir with new_base_dir in the variable path, call:
$(path:old_base_dir/%=new_base_dir/%)
You can also let it perform the substitution on a list:
$(foreach path,$(path_list),$(path:old_base_dir/%=new_base_dir/%)
Here, the variable path_list contains multiple files. Note though that this will break if the file names contain spaces.
The manual of GNU make describes many more useful functions.

Related

moving files from a folder into subfolders based on the prefix number with Linux

I'm relatively new to bash and I have tried multiples solutions that I could find here but none of them seem to be working in my case. It's pretty simple, I have a folder that looks like this:
- images/
- 0_image_1.jpg
- 0_image_2.jpg
- 0_image_3.jpg
- 1_image_1.jpg
- 1_image_2.jpg
- 1_image_3.jpg
and I would like to move these jpg files into subfolders based on the prefix number like so:
- images_0/
- 0_image_1.jpg
- 0_image_2.jpg
- 0_image_3.jpg
- images_1/
- 1_image_1.jpg
- 1_image_2.jpg
- 1_image_3.jpg
Is there a bash command that could do that in a simple way ?
Thank you
for src in *_*.jpg; do
dest=images_${src%%_*}/
echo mkdir -p "$dest"
echo mv -- "$src" "$dest"
done
Remove both echos if the output looks good.
I would do this with rename a.k.a. Perl rename. It is extremely powerful and performant. Here's a command for your use case:
rename --dry-run -p '$_="images_" . substr($_,0,1) . "/" . $_' ?_*jpg
Let's dissect that. At the right end, we specify we only want to work on files that start with a single character/digit before an underscore so we don't do damage trying to apply the command to files it wasn't meant for. Then --dry-run means it doesn't actually do anything, it just shows you what it would do - this is a very useful feature. Then -p which handily means "create any necessary directories for me as you go". Then the meat of the command. It passes you the current filename in a variable called $_ and we then need to create a new variable called $_ to say what we want the file to be called. In this case we just want the word images_ followed by the first digit of the existing filename and then a slash and the original name. Simples!
Sample Output
'0_image_1.jpg' would be renamed to 'images_0/0_image_1.jpg'
'0_image_2.jpg' would be renamed to 'images_0/0_image_2.jpg'
'1_image_3.jpg' would be renamed to 'images_1/1_image_3.jpg'
Remove the --dry-run and run again for real, if the output looks good.
Using rename has several benefits:
that it will warn and avoid any conflicts if two files rename to the same thing,
that it can rename across directories, creating any necessary intermediate directories on the way,
that you can do a dry run first to test it,
that you can use arbitrarily complex Perl code to specify the new name.
Note: On macOS, you can install rename using homebrew:
brew install rename
Note: On some Ones, rename is referred to as prename for Perl rename.

Using bash to loop through nested folders to run script in current working directory

I've got (what feels like) a fairly simple problem but my complete lack of experience in bash has left me stumped. I've spent all day trying to synthesize a script from many different SO threads explaining how to do specific things with unintuitive commands, but I can't figure out how to make them work together for the life of me.
Here is my situation: I've got a directory full of nested folders each containing a file with extension .7 and another file with extension .pc, plus a whole bunch of unrelated stuff. It looks like this:
Folder A
Folder 1
Folder x
data_01.7
helper_01.pc
...
Folder y
data_02.7
helper_02.pc
...
...
Folder 2
Folder z
data_03.7
helper_03.pc
...
...
Folder B
...
I've got a script that I need to run in each of these folders that takes in the name of the .7 file as an input.
pc_script -f data.7 -flag1 -other_flags
The current working directory needs to be the folder with the .7 file when running the script and the helper.pc file also needs to be present in it. After the script is finished running, there are a ton of new files and directories. However, I need to take just one of those output files, result.h5, and copy it to a new directory maintaining the same folder structure but with a new name:
Result Folder/Folder A/Folder 1/Folder x/new_result1.h5
I then need to run the same script again with a different flag, flag2, and copy the new version of that output file to the same result directory with a different name, new_result2.h5.
The folders all have pretty arbitrary names, though there aren't any spaces or special characters beyond underscores.
Here is an example of what I've tried:
#!/bin/bash
DIR=".../project/data"
for d in */ ; do
for e in */ ; do
for f in */ ; do
for PFILE in *.7 ; do
echo "$d/$e/$f/$PFILE"
cd "$DIR/$d/$e/$f"
echo "Performing operation 1"
pc_script -f "$PFILE" -flag1
mkdir -p ".../results/$d/$e/$f"
mv "results.h5" ".../project/results/$d/$e/$f/new_results1.h5"
echo "Performing operation 2"
pc_script -f "$PFILE" -flag 2
mv "results.h5" ".../project/results/$d/$e/$f/new_results2.h5"
done
done
done
done
Obviously, this didn't work. I've also tried using find with -execdir but then I couldn't figure out how to insert the name of the file into the script flag. I'd appreciate any help or suggestions on how to carry this out.
Another, perhaps more flexible, approach to the problem is to use the find command with the -exec option to run a short "helper-script" for each file found below a directory path that ends in ".7". The -name option allows find to locate all files ending in ".7" below a given directory using simple file-globbing (wildcards). The helper-script then performs the same operation on each file found by find and handles moving the result.h5 to the proper directory.
The form of the command will be:
find /path/to/search -type f -name "*.7" -exec /path/to/helper-script '{}` \;
Where the -f option tells find to only return files (not directories) ending in ".7". Your helper-script needs to be executable (e.g. chmod +x helper-script) and unless it is in your PATH, you must provide the full path to the script in the find command. The '{}' will be replaced by the filename (including relative path) and passed as an argument to your helper-script. The \; simply terminates the command executed by -exec.
(note there is another form for -exec called -execdir and another terminator '+' that can be used to process the command on all files in a given directory -- that is a bit safer, but has additional PATH requirements for the command being run. Since you have only one ".7" file per-directory -- there isn't much benefit here)
The helper-script just does what you need to do in each directory. Based on your description it could be something like the following:
#!/bin/bash
dir="${1%/*}" ## trim file.7 from end of path
cd "$dir" || { ## change to directory or handle error
printf "unable to change to directory %s\n" "$dir" >&2
exit 1
}
destdir="/Result_Folder/$dir" ## set destination dir for result.h5
mkdir -p "$destdir" || { ## create with all parent dirs or exit
printf "unable to create directory %s\n" "$dir" >&2
exit 1
}
ls *.pc 2>/dev/null || exit 1 ## check .pc file exists or exit
file7="${1##*/}" ## trim path from file.7 name
pc_script -f "$file7" -flags1 -other_flags ## first run
## check result.h5 exists and non-empty and copy to destdir
[ -s "result.h5" ] && cp -a "result.h5" "$destdir/new_result1.h5"
pc_script -f "$file7" -flags2 -other_flags ## second run
## check result.h5 exists and non-empty and copy to destdir
[ -s "result.h5" ] && cp -a "result.h5" "$destdir/new_result2.h5"
Which essentially stores the path part of the file.7 argument in dir and changes to that directory. If unable to change to the directory (due to read-permissions, etc..) the error is handled and the script exits. Next the full directory structure is created below your Result_Folder with mkdir -p with the same error handling if the directory cannot be created.
ls is used as a simple check to verify that a file ending in ".pc" exits in that directory. There are other ways to do this by piping the results to wc -l, but that spawns additional subshells that are best avoided.
(also note that Linux and Mac have files ending in ".pc" for use by pkg-config used when building programs from source -- they should not conflict with your files -- but be aware they exists in case you start chasing why weird ".pc" files are found)
After all tests are performed, the path is trimmed from the current ".7" filename storing just the filename in file7. The file7 variabli is then used in your pc_script command (which should also include the full path to the script if not in you PATH). After the pc_script is run [ -s "result.h5" ] is used to verify that result.h5 exists and is non-empty before moving that file to your Result_Folder location.
That should get you started. Using find to locate all .7 files is a simple way to let the tool designed to find the files for you do its job -- rather than trying to hand-roll your own solution. That way you only have to concentrate on what should be done for each file found. (note: I don't have pc_script or the files, so I have not testes this end-to-end, but it should be very close if not right-on-the-money)
There is nothing wrong in writing your own routine, but using find eliminates a lot of area where bugs can hide in your own solution.
Let me know if you have further questions.

How to compress multiple folders with certain name?

I have the following folder,
(Project) [Usr#hpc FOB]$ ls
exec_train.sh FOB_RE2250_BS4ES025.py network_checkpoint_FOB_RE2250_BS2ES05
FOB_RE1150.py FOB_RE2250_BS4ES05.py network_checkpoint_FOB_RE2250_BS2ES1
FOB_RE1200.py FOB_RE2250_BS4ES1.py network_checkpoint_FOB_RE2250_BS4ES025
FOB_RE2250_BS05ES1.py FOB_RE2250.py network_checkpoint_FOB_RE2250_BS4ES05
FOB_RE2250_BS05ES2.py FOB_RE50.py network_checkpoint_FOB_RE2250_BS4ES1
FOB_RE2250_BS1ES1.py network_checkpoint_FOB_RE2250_BS05ES1
FOB_RE2250_BS2ES05.py network_checkpoint_FOB_RE2250_BS05ES2
FOB_RE2250_BS2ES1.py network_checkpoint_FOB_RE2250_BS1ES1
How do I compress the all the network_checkpoint_FOB.... into one .tar.gz archive?
I know I could manually use $ tar -czf FOB.tar.gz network_checkpoint_FOB_RE2250_BS1ES1 network_checkpoint_FOB_RE2250_BS05ES1 ... but this seams cumbersome. I think there should be a way to use string matching but I haven't been able to find a clear concise solution.
You can use wildcard character * in Bash:
$ tar -czf FOB.tar.gz network_checkpoint_FOB*
Bash automatically expands network_checkpoint_FOB* expression to space separated matching file/folder names.

Changing suffix on bash file backup

I have been trying to change the suffix on my backup files using the --suffix function but I'm not quite sure how to do it. Currently this line of code
find ./$1 -name "IMG_****.JPG" -exec cp --backup=t {} ./$2 \;
searches the first command line argument directory for images in the IMG_****.JPG format and copies them to the directory entered second, making copies of any files with duplicate names and adding the =t suffix to the end giving IMG_****.JPG.~1~ etc. Instead of .~1~ I would like to add something like .JPG, any ideas on how to use the --suffix to do this?
Read the man page:
The backup suffix is '~', unless set with --suffix or SIMPLE_BACKUP_SUFFIX.
It should be pretty obvious from this sentence that supplying --suffix is equivalent to setting SIMPLE_BACKUP_SUFFIX, which as its name suggests only applies to simple backups (i.e., --backup=simple or --backup=never). E.g.,
> touch src dst
> cp --backup=simple --suffix=.bak src dst
> ls src* dst*
dst dst.bak src
However, you are requesting numbered backups through --backup=t, so the suffixes you will get will always be .~1~, .~2~, etc., unaffected by --suffix.

Using Rsync include and exclude options to include directory and file by pattern

I'm having problems getting my rsync syntax right and I'm wondering if my scenario can actually be handled with rsync. First, I've confirmed that rsync is working just fine between my local host and my remote host. Doing a straight sync on a directory is successful.
Here's what my filesystem looks like:
uploads/
1260000000/
file_11_00.jpg
file_11_01.jpg
file_12_00.jpg
1270000000/
file_11_00.jpg
file_11_01.jpg
file_12_00.jpg
1280000000/
file_11_00.jpg
file_11_01.jpg
file_12_00.jpg
What I want to do is run rsync only on files that begin with "file_11_" in the subdirectories and I want to be able to run just one rsync job to sync all of these files in the subdirectories.
Here's the command that I'm trying:
rsync -nrv --include="**/file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
This results in 0 files being marked for transfer in my dry run. I've tried various other combinations of --include and --exclude statements, but either continued to get no results or got everything as if no include or exclude options were set.
Anyone have any idea how to do this?
The problem is that --exclude="*" says to exclude (for example) the 1260000000/ directory, so rsync never examines the contents of that directory, so never notices that the directory contains files that would have been matched by your --include.
I think the closest thing to what you want is this:
rsync -nrv --include="*/" --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
(which will include all directories, and all files matching file_11*.jpg, but no other files), or maybe this:
rsync -nrv --include="/[0-9][0-9][0-9]0000000/" --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
(same concept, but much pickier about the directories it will include).
rsync include exclude pattern examples:
"*" means everything
"dir1" transfers empty directory [dir1]
"dir*" transfers empty directories like: "dir1", "dir2", "dir3", etc...
"file*" transfers files whose names start with [file]
"dir**" transfers every path that starts with [dir] like "dir1/file.txt", "dir2/bar/ffaa.html", etc...
"dir***" same as above
"dir1/*" does nothing
"dir1/**" does nothing
"dir1/***" transfers [dir1] directory and all its contents like "dir1/file.txt", "dir1/fooo.sh", "dir1/fold/baar.py", etc...
And final note is that simply dont rely on asterisks that are used in the beginning for evaluating paths; like "**dir" (its ok to use them for single folders or files but not paths) and note that more than two asterisks dont work for file names.
Here's my "teach a person to fish" answer:
Rsync's syntax is definitely non-intuitive, but it is worth understanding.
First, use -vvv to see the debug info for rsync.
$ rsync -nr -vvv --include="**/file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
[sender] hiding directory 1280000000 because of pattern *
[sender] hiding directory 1260000000 because of pattern *
[sender] hiding directory 1270000000 because of pattern *
The key concept here is that rsync applies the include/exclude patterns for each directory recursively. As soon as the first include/exclude is matched, the processing stops.
The first directory it evaluates is /Storage/uploads. Storage/uploads has 1280000000/, 1260000000/, 1270000000/ dirs/files. None of them match file_11*.jpg to include. All of them match * to exclude. So they are excluded, and rsync ends.
The solution is to include all dirs (*/) first. Then the first dir component will be 1260000000/, 1270000000/, 1280000000/ since they match */. The next dir component will be 1260000000/. In 1260000000/, file_11_00.jpg matches --include="file_11*.jpg", so it is included. And so forth.
$ rsync -nrv --include='*/' --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
./
1260000000/
1260000000/file_11_00.jpg
1260000000/file_11_01.jpg
1270000000/
1270000000/file_11_00.jpg
1270000000/file_11_01.jpg
1280000000/
1280000000/file_11_00.jpg
1280000000/file_11_01.jpg
https://download.samba.org/pub/rsync/rsync.1

Resources