How to copy the contents of a folder to multiple folders based on number of files? - linux

I want to copy the files from a folder (named: 1) to multiple folders based on the number of files (here: 50).
The code given below works. I transferred all the files from the folder to the subfolders based on number of files and then copied back all the files in the directory back to the initial folder.
However, I need something cleaner and more efficient. Apologies for the mess below, I'm a nube.
bf=1 #breakfolder
cd 1 #the folder from where I wanna copy stuff, contains 179 files
flies_exist=$(ls -1q * | wc -l) #assign the number of files in folder 1
#move 50 files from 1 to various subfolders
while [ $flies_exist -gt 50 ]
do
mkdir ../CompiledPdfOutput/temp/1-$bf
set --
for f in .* *; do
[ "$#" -lt 50 ] || break
[ -f "$f" ] || continue
[ -L "$f" ] && continue
set -- "$#" "$f"
done
mv -- "$#" ../CompiledPdfOutput/temp/1-$bf/
flies_exist=$(ls -1q * | wc -l)
bf=$(($bf + 1))
done
#mover the rest of the files into one final subdir
mkdir ../CompiledPdfOutput/temp/1-$bf
set --
for f in .* *; do
[ "$#" -lt 50 ] || break
[ -f "$f" ] || continue
[ -L "$f" ] && continue
set -- "$#" "$f"
done
mv -- "$#" ../CompiledPdfOutput/temp/1-$bf/
#get out of 1
cd ..
# copy back the contents from subdir to 1
find CompiledPdfOutput/temp/ -exec cp {} 1 \;
The required directory structure is:
parent
________|________
| |
1 CompiledPdfOutput
| |
(179) temp
|
---------------
| | | |
1-1 1-2 1-3 1-4
(50) (50) (50) (29)
The number inside "()" denotes the number of files.
BTW, the final step of my code gives this warning, would be glad if anyone can explain what's happening and a solution.
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-4'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-3'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-1'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-2'
I dont wnt to copy the directory as well, just the files so giving -r would be bad.

Assuming that you need something more compact/efficient, you can leverage existing tools (find, xargs) to create a pipeline, eliminating the need to program each step using bash.
The following will move the files into the split folder. It will find the files, group them, 50 into each folder, use awk to generate output folder, and move the files. Solution not as elegant as original one :-(
find 1 -type f |
xargs -L50 echo |
awk '{ print "CompliedOutput/temp/1-" NR, $0 }' |
xargs -L1 echo mv -t
As a side note, current script moves the files from the '1' folder, to the numbered folders, and then copy the file back to the original folder. Why not just copy the files to the numbered folders. You can use 'cp -p' to to preserve timestamp, if that's needed.
Supporting file names with new lines (and spaces)
Clarification to question indicate a solution should work with file names with embedded new lines (and while spaces). This require minor change to use NUL character as separator.
# Count number of output folders
DIR_COUNT=$(find 1 -type f -print0 | xargs -0 -I{} echo X | wc -l)
# Remove previous tree, and create folder
OUT=CompiledOutput/temp
rm -rf $OUT
eval mkdir -p $OUT/1-{1..$DIR_COUNT}
# Process file, use NUL as separator
find 1 -type f -print0 |
awk -vRS="\0" -v"OUT=$OUT" 'NR%50 == 1 { printf "%s/1-%d%s",OUT,1+int(NR/50),RS } { printf "%s", ($0 RS) }' |
xargs -0 -L51 -t mv -t
Did limited testing with both space and new lines in the file. Looks OK on my machine.

I find a couple of issues with the posted script:
The logic of copying maximum 50 files per folder is overcomplicated, and the code duplication of an entire loop is error-prone.
It uses reuses the $# array of positional parameters for internal storage purposes. This variable was not intended for that, it would be better to use a new dedicated array.
Instead of moving files to sub-directories and then copying them back, it would be simpler to just copy them in the first step, without ever moving.
Parsing the output of ls is not recommended.
Consider this alternative, simpler logic:
Initialize an empty array to_copy, to keep files that should be copied
Initialize a folder counter, to use to compute the target folder
Loop over the source files
Apply filters as before (skip if not file)
Add file to to_copy
If to_copy contains the target number of files, then:
Create the target folder
Copy the files contained in to_copy
Reset the content of to_copy to empty
Increment folder_counter
If to_copy is not empty
Create the target folder
Copy the files contained in to_copy
Something like this:
#!/usr/bin/env bash
set -euo pipefail
distribute_to_folders() {
local src=$1
local target=$2
local max_files=$3
local to_copy=()
local folder_counter=1
for file in "$src"/* "$src/.*"; do
[ -f "$file" ] || continue
to_copy+=("$file")
if (( ${#to_copy[#]} == max_files )); then
mkdir -p "$target/$folder_counter"
cp -v "${to_copy[#]}" "$target/$folder_counter/"
to_copy=()
((++folder_counter))
fi
done
if (( ${#to_copy[#]} > 0 )); then
mkdir -p "$target/$folder_counter"
cp -v "${to_copy[#]}" "$target/$folder_counter/"
fi
}
distribute_to_folders "$#"
To distribute files in path/to/1 into directories of maximum 50 files under path/to/compiled-output, you can call this script with:
./distribute.sh path/to/1 path/to/compiled-output 50
BTW, the final step of my code gives this warning, would be glad if anyone can explain what's happening and a solution.
Sure. The command find CompiledPdfOutput/temp/ -exec cp {} 1 \; finds files and directories, and tries to copy them. When cp encounters a directory and the -r parameter is not specified, it issues the warning you saw. You could add a filter for files, with -type f. If there are not excessively many files then a simple shell glob will do the job:
cp -v CompiledPdfOutput/temp/*/* 1

This will copy files to multiple folders of fixed size. Change source, target, and folderSize as per your requirement. This also works with filenames with special character (e.g. 'file 131!##$%^&*()_+-=;?').
source=1
target=CompiledPDFOutput/temp
folderSize=50
find $source -type f -printf "\"%p\"\0" \
| xargs -0 -L$folderSize \
| awk '{system("mkdir -p '$target'/1-" NR); printf "'$target'/1-" NR " %s\n", $0}' \
| xargs -L1 cp -t

Related

How to iterate through folders and subfolders to delete n number of files randomly?

I have 4 folders (named W1, W3, W5, W7) and each one of those folders has approximately 30 subfolders (named M1 - M30). Each subfolder contains 24 .tif files (named Image_XX.tif).
I need to randomly "sample" each subfolder, more specifically, I need to get rid of 14 .tif files while keeping 10 .tif files in each subfolder.
I figure that deleting 14 files at random is easier than choosing 10 files at random and copying them to new subfolders within folders.
I thought that writing a bash script to do so would be the way, but I'm fairly new to programming and I'm stuck.
Below is one of the several scripts I've tried:
#!/bin/bash
for dir in /Users/Fer/Subsets/W1/; do
if [ -d "$dir" ]; then
cd "$dir"
gshuf -zn14 -e *.tif | xargs -0 rm
cd ..
fi
done
It runs for a second, but nothing seems to happen. Any help is appreciated.
For every subdirectory.
Find all files.
Choose a random number of files from the list.
Delete.
I think something along:
for dir in /Users/Fer/Subsets/W*/M*/; do
printf "%s\n" "$dir"/*.tif |
shuf -z -n 14 |
xargs -0 -t echo rm -v
done
Used some of the suggestions above and the code below worked:
for dir in /Users/Fer/Subsets/W*/M*; do
gshuf -zn14 -e "$dir"/*.tif | xargs -0 rm
done

How can I make a bash script where I can move certain files to certain folders which are named based on a string in the files?

This is the script that I'm using to move files with the string "john" in them (124334_john_rtx.mp4 , 3464r64_john_gty.mp4 etc) to a certain folder
find /home/peter/Videos -maxdepth 1 -type f -iname '*john' -print0 | \
xargs -0 --no-run-if-empty echo mv --target-directory=/home/peter/Videos/john/
Since I have a large amount of videos with various names written in the files, I want to make a bash script which moves videos with a string between the underscores to a folder named based on the string between the underscores. So for example if a file is named 4345655_ben_rts.mp4 the script would identify the string "ben" between the underscores, create a folder named as the string between the underscores which in this case is "ben" and move the file to that folder. Any advice is greatly appreciated !
My way to do it :
cd /home/peter/Videos # Change directory to your start directory
for name in $(ls *.mp4 | cut -d'_' -f2 | sort -u) # loops on a list of names after the first underscore
do
mkdir -p /home/peter/Videos/${name} # create the target directory if it doesn't exist
mv *_${name}_*.mp4 /home/peter/Videos/${name} # Moving the files
done
This bash loop should do what you need:
find dir -maxdepth 1 -type f -iname '*mp4' -print0 | while IFS= read -r -d '' file
do
if [[ $file =~ _([^_]+)_ ]]; then
TARGET_DIR="/PARENTPATH/${BASH_REMATCH[1]}"
mkdir -p "$TARGET_DIR"
mv "$file" "$TARGET_DIR"
fi
done
It'll only move the files if it finds a directory token.
I used _([^_]+)_ to make sure there is no _ in the dir name, but you didn't specify what you want if there are more than two _ in the file name. _(.+)_ will work if foo_bar_baz_buz.mp4 is meant to go into directory bar_baz.
And this answer to a different question explains the find | while logic: https://stackoverflow.com/a/64826172/3216427 .
EDIT: As per a question in the comments, I added mkdir -p to create the target directory. The -p means recursively create any part of the path that doesn't already exist, and will not error out if the full directory already exists.

How to remove all but a few selected files in a directory?

I want to remove all files in a directory except some through a shell script. The name of files will be passed as command line argument and number of arguments may vary.
Suppose the directory has these 5 files:
1.txt, 2.txt, 3.txt. 4.txt. 5.txt
I want to remove two files from it through a shell script using file name. Also, the number of files may vary.
There are several ways this could be done, but the one that's most robust and highest performance with large directories is probably to construct a find command.
#!/usr/bin/env bash
# first argument is the directory name to search in
dir=$1; shift
# subsequent arguments are filenames to absolve from deletion
find_args=( )
for name; do
find_args+=( -name "$name" -prune -o )
done
if [[ $dry_run ]]; then
exec find "$dir" -mindepth 1 -maxdepth 1 "${find_args[#]}" -print
else
exec find "$dir" -mindepth 1 -maxdepth 1 "${find_args[#]}" -exec rm -f -- '{}' +
fi
Thereafter, to list files which would be deleted (if the above is in a script named delete-except):
dry_run=1 delete-except /path/to/dir 1.txt 2.txt
or, to actually delete those files:
delete-except /path/to/dir 1.txt 2.txt
A simple, straightforward way could be using the GLOBIGNORE variable.
GLOBIGNORE is a colon-separated list of patterns defining the set of filenames to be ignored by pathname expansion. If a filename matched by a pathname expansion pattern also matches one of the patterns in GLOBIGNORE, it is removed from the list of matches.
Thus, the solution is to iterate through the command line args, appending file names to the list. Then call rm *. Don't forget to unset GLOBIGNORE var at the end.
#!/bin/bash
for arg in "$#"
do
if [ $arg = $1 ]
then
GLOBIGNORE=$arg
else
GLOBIGNORE=${GLOBIGNORE}:$arg
fi
done
rm *
unset GLOBIGNORE
*In case you had set GLOBIGNORE before, you can just store the val in a tmp var then reset it at the end.
We can accomplish this in pure Bash, without the need for any external tools:
#!/usr/bin/env bash
# build an associative array that contains all the filenames to be preserved
declare -A skip_list
for f in "$#"; do
skip_list[$f]=1
done
# walk through all files and build an array of files to be deleted
declare -a rm_list
for f in *; do # loop through all files
[[ -f "$f" ]] || continue # not a regular file
[[ "${skip_list[$f]}" ]] && continue # skip this file
rm_list+=("$f") # now it qualifies for rm
done
# remove the files
printf '%s\0' "${rm_list[#]}" | xargs -0 rm -- # Thanks to Charles' suggestion
This solution will also work for files that have whitespaces or glob characters in them.
Thanks all for your answers, I have figured out my solution. Below is the solution worked for me:
find /home/mydir -type f | grep -vw "goo" | xargs rm

How to prefix folders and files within?

I'm stuck looking for a one-liner to add a prefix to all subfolder names and file names in a directory
eg "AAA" in the examples below
/folder/AAAfile.txt
/folder/AAAread/AAAdoc.txt
/folder/AAAread/AAAfinished/AAAread.txt
I've tried using xargs and find, but can't get them to go recursively through the subdirectories and their contents. Any suggestions?
James
You could use something like that
find . -mindepth 1 | sort -r | xargs -l -I {} bash -c 'mv $1 ${1%/*}/AAA${1##*/}' _ {}
Tested with your folder structure, executed from the root (same as AAAfile.txt).
The following script should meet your need (ran it from inside your folder directory):
for i in `ls -R`;do
dname=`dirname $i`
fname=AAA`basename $i`
if [ -f $i ]
then
mv $i $dname/$fname
fi
#this could be merged with previous condition but have been kept just to avoid invalid directory warning
if [ -d $i ]
then
mv $i $dname/$fname
fi
done

How can I delete the directory with the highest number name?

I have a directory containing sub-directories, some of whose names are numbers. Without looking, I don't know what the numbers are. How can I delete the sub-directory with the highest number name? I reckon the solution might sort the sub-directories into reverse order and select the first sub-directory that begins with a number but I don't know how to do that. Thank you for your help.
cd $yourdir #go to that dir
ls -q -p | #list all files directly in dir and make directories end with /
grep '^[0-9]*/$' | #select directories (end with /) whose names are made of numbers
sort -n | #sort numerically
tail -n1 | #select the last one (largest)
xargs -r rmdir #or rm -r if nonempty
Recommend running it first without the xargs -r rmdir or xargs -r rm -r part to make sure your deleting the right thing.
A pure Bash solution:
#!/bin/bash
shopt -s nullglob extglob
# Make an array of all the dir names that only contain digits
dirs=( +([[:digit:]])/ )
# If none found, exit
if ((${#dirs[#]}==0)); then
echo >&2 "No dirs found"
exit
fi
# Loop through all elements of array dirs, saving the greatest number
max=${dirs[0]%/}
for i in "${dirs[#]%/}"; do
((10#$max<10#$i)) && max=$i
done
# Finally, delete the dir with largest number found
echo rm -r "$max"
Note:
This will have an unpredictable behavior when there are dirs with same number but written differently, e.g., 2 and 0002.
Will fail if the numbers overflow Bash's numbers.
Doesn't take into account negative numbers and non-integer numbers.
Remove the echo in the last line if you're happy with it.
To be run from within your directory.
Let's make some directories to test the script:
mkdir test; cd test; mkdir $(seq 100)
Now
find -mindepth 1 -maxdepth 1 -type d | cut -c 3- | sort -k1n | tail -n 1 | xargs -r echo rm -r
Result:
rm -r 100
Now, remove the word echo from the command and xargs will execute rm -r 100.

Resources