How to iterate through folders and subfolders to delete n number of files randomly? - linux

I have 4 folders (named W1, W3, W5, W7) and each one of those folders has approximately 30 subfolders (named M1 - M30). Each subfolder contains 24 .tif files (named Image_XX.tif).
I need to randomly "sample" each subfolder, more specifically, I need to get rid of 14 .tif files while keeping 10 .tif files in each subfolder.
I figure that deleting 14 files at random is easier than choosing 10 files at random and copying them to new subfolders within folders.
I thought that writing a bash script to do so would be the way, but I'm fairly new to programming and I'm stuck.
Below is one of the several scripts I've tried:
#!/bin/bash
for dir in /Users/Fer/Subsets/W1/; do
if [ -d "$dir" ]; then
cd "$dir"
gshuf -zn14 -e *.tif | xargs -0 rm
cd ..
fi
done
It runs for a second, but nothing seems to happen. Any help is appreciated.

For every subdirectory.
Find all files.
Choose a random number of files from the list.
Delete.
I think something along:
for dir in /Users/Fer/Subsets/W*/M*/; do
printf "%s\n" "$dir"/*.tif |
shuf -z -n 14 |
xargs -0 -t echo rm -v
done

Used some of the suggestions above and the code below worked:
for dir in /Users/Fer/Subsets/W*/M*; do
gshuf -zn14 -e "$dir"/*.tif | xargs -0 rm
done

Related

How to randomly distribute the files across 3 folders using Bash script?

I have many subdirectories and files in the folder mydata/files. I want to take files and copy them randomly into 3 folders:
train
test
dev
For example, mydata/files/ss/file1.wav could be copied into train folder:
train
file1.wav
And so on and so forth, until all files from mydata/files are copied.
How can I do it using Bash script?
Steps to solve this:
Need to gather all the files in the directory
Assign directories to a map
Generate random number for each file
Move the file to the corresponding directory
The script:
#!/bin/bash
original_dir=test/
## define 3 directories to copy into
# define an associative array (like a map)
declare -A target_dirs
target_dirs[0]="/path/to/train/"
target_dirs[1]="/path/to/test/"
target_dirs[2]="/path/to/dev/"
# recursively find all the files, and loop through them
find $original_dir -type f | while read -r file ; do
# find a random number 0 - (size of target_dirs - 1)
num=$(($RANDOM % ${#target_dirs[#]}))
# get that index in the associative array
target_dir=${target_dirs[$num]}
# copy the file to that directory
echo "Copying $file to $target_dir"
cp $file $target_dir
done
Things you'll need to change:
Change the destination of the directories to match the path in your system
Add executable priviledges to the file so that you can run it.
chmod 744 copy_script_name
./copy_script_name
Notes:
This script should easily be extendable to any number of directories if needed (just add the new directories, and the script will adjust the random numbers.
If you need to only get the files in the current directory (not recursively), you can add
-maxdepth 1 (see How to list only files and not directories of a directory Bash?).
Was able to leverage previous bash experience plus looking at bash documentation (it's generally pretty good). If you end up writing any scripts, be very careful about spaces
You can create a temp file, echo your destination folder to it, then use the shuf command.
dest=$(mktemp)
echo -e "test\ndev\ntrain" >> $dest
while IFS= read -r file; do
mv "$file" "$(shuf -n1 < $dest)/."
done < <(find mydata/files -type f 2>/dev/null)
rm -f "$dest"

How can I make a bash script where I can move certain files to certain folders which are named based on a string in the files?

This is the script that I'm using to move files with the string "john" in them (124334_john_rtx.mp4 , 3464r64_john_gty.mp4 etc) to a certain folder
find /home/peter/Videos -maxdepth 1 -type f -iname '*john' -print0 | \
xargs -0 --no-run-if-empty echo mv --target-directory=/home/peter/Videos/john/
Since I have a large amount of videos with various names written in the files, I want to make a bash script which moves videos with a string between the underscores to a folder named based on the string between the underscores. So for example if a file is named 4345655_ben_rts.mp4 the script would identify the string "ben" between the underscores, create a folder named as the string between the underscores which in this case is "ben" and move the file to that folder. Any advice is greatly appreciated !
My way to do it :
cd /home/peter/Videos # Change directory to your start directory
for name in $(ls *.mp4 | cut -d'_' -f2 | sort -u) # loops on a list of names after the first underscore
do
mkdir -p /home/peter/Videos/${name} # create the target directory if it doesn't exist
mv *_${name}_*.mp4 /home/peter/Videos/${name} # Moving the files
done
This bash loop should do what you need:
find dir -maxdepth 1 -type f -iname '*mp4' -print0 | while IFS= read -r -d '' file
do
if [[ $file =~ _([^_]+)_ ]]; then
TARGET_DIR="/PARENTPATH/${BASH_REMATCH[1]}"
mkdir -p "$TARGET_DIR"
mv "$file" "$TARGET_DIR"
fi
done
It'll only move the files if it finds a directory token.
I used _([^_]+)_ to make sure there is no _ in the dir name, but you didn't specify what you want if there are more than two _ in the file name. _(.+)_ will work if foo_bar_baz_buz.mp4 is meant to go into directory bar_baz.
And this answer to a different question explains the find | while logic: https://stackoverflow.com/a/64826172/3216427 .
EDIT: As per a question in the comments, I added mkdir -p to create the target directory. The -p means recursively create any part of the path that doesn't already exist, and will not error out if the full directory already exists.

How to list last 10 files in all the subdirectories in Linux

I have a directory and there a multiple sub directories under that, i want to display the last 10 files recursively from all the subdirectories or if i can mention some date parameters to list also will be helpful
Save the name of all directories.
ls -R $PWD/* | grep ./ > allDirectories
The next line show 10 files of each directory (only if the directory don't have spaces in their name). You can add more options to the ls command (i.e sort by time using -c)
for directory in $(cat allDirectories); do echo '\n\n\n'$directory; ls $directory[1,-2] | head -n 10; done 2>> /dev/null

How to copy the contents of a folder to multiple folders based on number of files?

I want to copy the files from a folder (named: 1) to multiple folders based on the number of files (here: 50).
The code given below works. I transferred all the files from the folder to the subfolders based on number of files and then copied back all the files in the directory back to the initial folder.
However, I need something cleaner and more efficient. Apologies for the mess below, I'm a nube.
bf=1 #breakfolder
cd 1 #the folder from where I wanna copy stuff, contains 179 files
flies_exist=$(ls -1q * | wc -l) #assign the number of files in folder 1
#move 50 files from 1 to various subfolders
while [ $flies_exist -gt 50 ]
do
mkdir ../CompiledPdfOutput/temp/1-$bf
set --
for f in .* *; do
[ "$#" -lt 50 ] || break
[ -f "$f" ] || continue
[ -L "$f" ] && continue
set -- "$#" "$f"
done
mv -- "$#" ../CompiledPdfOutput/temp/1-$bf/
flies_exist=$(ls -1q * | wc -l)
bf=$(($bf + 1))
done
#mover the rest of the files into one final subdir
mkdir ../CompiledPdfOutput/temp/1-$bf
set --
for f in .* *; do
[ "$#" -lt 50 ] || break
[ -f "$f" ] || continue
[ -L "$f" ] && continue
set -- "$#" "$f"
done
mv -- "$#" ../CompiledPdfOutput/temp/1-$bf/
#get out of 1
cd ..
# copy back the contents from subdir to 1
find CompiledPdfOutput/temp/ -exec cp {} 1 \;
The required directory structure is:
parent
________|________
| |
1 CompiledPdfOutput
| |
(179) temp
|
---------------
| | | |
1-1 1-2 1-3 1-4
(50) (50) (50) (29)
The number inside "()" denotes the number of files.
BTW, the final step of my code gives this warning, would be glad if anyone can explain what's happening and a solution.
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-4'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-3'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-1'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-2'
I dont wnt to copy the directory as well, just the files so giving -r would be bad.
Assuming that you need something more compact/efficient, you can leverage existing tools (find, xargs) to create a pipeline, eliminating the need to program each step using bash.
The following will move the files into the split folder. It will find the files, group them, 50 into each folder, use awk to generate output folder, and move the files. Solution not as elegant as original one :-(
find 1 -type f |
xargs -L50 echo |
awk '{ print "CompliedOutput/temp/1-" NR, $0 }' |
xargs -L1 echo mv -t
As a side note, current script moves the files from the '1' folder, to the numbered folders, and then copy the file back to the original folder. Why not just copy the files to the numbered folders. You can use 'cp -p' to to preserve timestamp, if that's needed.
Supporting file names with new lines (and spaces)
Clarification to question indicate a solution should work with file names with embedded new lines (and while spaces). This require minor change to use NUL character as separator.
# Count number of output folders
DIR_COUNT=$(find 1 -type f -print0 | xargs -0 -I{} echo X | wc -l)
# Remove previous tree, and create folder
OUT=CompiledOutput/temp
rm -rf $OUT
eval mkdir -p $OUT/1-{1..$DIR_COUNT}
# Process file, use NUL as separator
find 1 -type f -print0 |
awk -vRS="\0" -v"OUT=$OUT" 'NR%50 == 1 { printf "%s/1-%d%s",OUT,1+int(NR/50),RS } { printf "%s", ($0 RS) }' |
xargs -0 -L51 -t mv -t
Did limited testing with both space and new lines in the file. Looks OK on my machine.
I find a couple of issues with the posted script:
The logic of copying maximum 50 files per folder is overcomplicated, and the code duplication of an entire loop is error-prone.
It uses reuses the $# array of positional parameters for internal storage purposes. This variable was not intended for that, it would be better to use a new dedicated array.
Instead of moving files to sub-directories and then copying them back, it would be simpler to just copy them in the first step, without ever moving.
Parsing the output of ls is not recommended.
Consider this alternative, simpler logic:
Initialize an empty array to_copy, to keep files that should be copied
Initialize a folder counter, to use to compute the target folder
Loop over the source files
Apply filters as before (skip if not file)
Add file to to_copy
If to_copy contains the target number of files, then:
Create the target folder
Copy the files contained in to_copy
Reset the content of to_copy to empty
Increment folder_counter
If to_copy is not empty
Create the target folder
Copy the files contained in to_copy
Something like this:
#!/usr/bin/env bash
set -euo pipefail
distribute_to_folders() {
local src=$1
local target=$2
local max_files=$3
local to_copy=()
local folder_counter=1
for file in "$src"/* "$src/.*"; do
[ -f "$file" ] || continue
to_copy+=("$file")
if (( ${#to_copy[#]} == max_files )); then
mkdir -p "$target/$folder_counter"
cp -v "${to_copy[#]}" "$target/$folder_counter/"
to_copy=()
((++folder_counter))
fi
done
if (( ${#to_copy[#]} > 0 )); then
mkdir -p "$target/$folder_counter"
cp -v "${to_copy[#]}" "$target/$folder_counter/"
fi
}
distribute_to_folders "$#"
To distribute files in path/to/1 into directories of maximum 50 files under path/to/compiled-output, you can call this script with:
./distribute.sh path/to/1 path/to/compiled-output 50
BTW, the final step of my code gives this warning, would be glad if anyone can explain what's happening and a solution.
Sure. The command find CompiledPdfOutput/temp/ -exec cp {} 1 \; finds files and directories, and tries to copy them. When cp encounters a directory and the -r parameter is not specified, it issues the warning you saw. You could add a filter for files, with -type f. If there are not excessively many files then a simple shell glob will do the job:
cp -v CompiledPdfOutput/temp/*/* 1
This will copy files to multiple folders of fixed size. Change source, target, and folderSize as per your requirement. This also works with filenames with special character (e.g. 'file 131!##$%^&*()_+-=;?').
source=1
target=CompiledPDFOutput/temp
folderSize=50
find $source -type f -printf "\"%p\"\0" \
| xargs -0 -L$folderSize \
| awk '{system("mkdir -p '$target'/1-" NR); printf "'$target'/1-" NR " %s\n", $0}' \
| xargs -L1 cp -t

How can I generate a folder with the last X files added?

So I have a huge folder full subfolders with tons of files, and I add files to it all the time.
I need a subfolder in the root of that folder with a symlink of the last 10-20 files added so that I can quickly find the things I recently added. This is located on a NAS, but I have a linux box running Arch connected through NFS, so I assume the best way is to run a bash script with a find command followed by a loop of ln -sf, but I can't do it safely without help.
Something like this is required:
mkdir -p subfolder
find /dir/ -type f -printf '%T# %p\n' | sort -n | tail -n 10 | cut -d' ' -f2- | while IFS= read -r file ; do ln -s "$file" subfolder ; done
Which will create symlinks in subfolder pointing to the 10 most recently modified files in the directory tree rooted at /dir/
You could just create a shell function like:
recent() { ls -lt ${1+"$#"} | head -n 20; }
which will give you a listing of the 20 most recent items in the specified directories, or the current directory if no arguments are given.

Resources