How to randomly distribute the files across 3 folders using Bash script? - linux

I have many subdirectories and files in the folder mydata/files. I want to take files and copy them randomly into 3 folders:
train
test
dev
For example, mydata/files/ss/file1.wav could be copied into train folder:
train
file1.wav
And so on and so forth, until all files from mydata/files are copied.
How can I do it using Bash script?

Steps to solve this:
Need to gather all the files in the directory
Assign directories to a map
Generate random number for each file
Move the file to the corresponding directory
The script:
#!/bin/bash
original_dir=test/
## define 3 directories to copy into
# define an associative array (like a map)
declare -A target_dirs
target_dirs[0]="/path/to/train/"
target_dirs[1]="/path/to/test/"
target_dirs[2]="/path/to/dev/"
# recursively find all the files, and loop through them
find $original_dir -type f | while read -r file ; do
# find a random number 0 - (size of target_dirs - 1)
num=$(($RANDOM % ${#target_dirs[#]}))
# get that index in the associative array
target_dir=${target_dirs[$num]}
# copy the file to that directory
echo "Copying $file to $target_dir"
cp $file $target_dir
done
Things you'll need to change:
Change the destination of the directories to match the path in your system
Add executable priviledges to the file so that you can run it.
chmod 744 copy_script_name
./copy_script_name
Notes:
This script should easily be extendable to any number of directories if needed (just add the new directories, and the script will adjust the random numbers.
If you need to only get the files in the current directory (not recursively), you can add
-maxdepth 1 (see How to list only files and not directories of a directory Bash?).
Was able to leverage previous bash experience plus looking at bash documentation (it's generally pretty good). If you end up writing any scripts, be very careful about spaces

You can create a temp file, echo your destination folder to it, then use the shuf command.
dest=$(mktemp)
echo -e "test\ndev\ntrain" >> $dest
while IFS= read -r file; do
mv "$file" "$(shuf -n1 < $dest)/."
done < <(find mydata/files -type f 2>/dev/null)
rm -f "$dest"

Related

How to iterate through folders and subfolders to delete n number of files randomly?

I have 4 folders (named W1, W3, W5, W7) and each one of those folders has approximately 30 subfolders (named M1 - M30). Each subfolder contains 24 .tif files (named Image_XX.tif).
I need to randomly "sample" each subfolder, more specifically, I need to get rid of 14 .tif files while keeping 10 .tif files in each subfolder.
I figure that deleting 14 files at random is easier than choosing 10 files at random and copying them to new subfolders within folders.
I thought that writing a bash script to do so would be the way, but I'm fairly new to programming and I'm stuck.
Below is one of the several scripts I've tried:
#!/bin/bash
for dir in /Users/Fer/Subsets/W1/; do
if [ -d "$dir" ]; then
cd "$dir"
gshuf -zn14 -e *.tif | xargs -0 rm
cd ..
fi
done
It runs for a second, but nothing seems to happen. Any help is appreciated.
For every subdirectory.
Find all files.
Choose a random number of files from the list.
Delete.
I think something along:
for dir in /Users/Fer/Subsets/W*/M*/; do
printf "%s\n" "$dir"/*.tif |
shuf -z -n 14 |
xargs -0 -t echo rm -v
done
Used some of the suggestions above and the code below worked:
for dir in /Users/Fer/Subsets/W*/M*; do
gshuf -zn14 -e "$dir"/*.tif | xargs -0 rm
done

Linux - How to zip files per subdirectory separately

I have directory structure like this.
From this I want to create different zip files such as
data-A-A_1-A_11.zip
data-A-A_1-A_12.zip
data-B-B_1-B_11.zip
data-C-C_1-C_11.zip
while read line;
do
echo "zip -r ${line//\//-}.zip $line";
# zip -r "${line//\//-}.zip" "$line"
done <<< "$(find data -maxdepth 3 -mindepth 2 -type d)"
Redirect the result of a find command into a while loop. The find command searches the directory data for directories only, searching 3 directories deep only. In the while loop with use bash expansion to convert all forward slashes to "-" and add ".zip" in such a way that we can build a zip command on each directory. Once you are happy that the zip command looks fine when echoed for each directory, comment in the actual zip command

Add name of each directory to files inside the corresponding directory in linux

I have a directory containing multiple directories. here is an example of the list of directories:
dir1_out
dir2_out
dir3_out
dir4_out
Each directory contains multiple files.
For example folder1_out contains the following files:
file1
file2
file3
In the same fashion other directories contain several folders.
I would like to add the name of each directory to file name in the corresponding directory.
I would like to have the following result in first directory(dir1_out):
dir1.file1
dir1.file2
dir1.file3
Since I have around 50 directories I would like to write a loop that takes the name of each directory and add the name to the beginning of all subfiles.
Do you have any idea how can I do that in linux.
A simple bash onliner if there aren't too many files is:
for p in */*; do [ -f "$p" ] && mv -i "$p" "${p%/*}/${p/\//.}"; done
This uses parameter expansions to generate new filenames, after checking that we are trying to rename an actual file - See bash manpage descriptions of ${parameter%word} and ${parameter/pattern/string}
If there may be too many files to safely expand them all into a single list:
#!/bin/bash
find . -maxdepth 2 -print |\
while read p; do
p="${p#./}"
mv -i "$p" "${p%/*}/${p/\//.}"
done

Moving a file and renaming it after the directory which contains it on Bash

I'm trying to learn bash on Linux, just for fun. I thought it would be pretty useful to have a .sh that would group together similar files. For example, let's say we have the directory
/home/docs/
Inside the directory we have /mathdocs/, /codingdocs/, etc.
Inside those sub-directories we have doc.txt, in all of them. Same name for all the files on the subdirectories.
Let's say I want to group them together, and I want to move all the files to /home/allthedocs/ and rename them after the directories they were in. (mathdocs.txt, codingdocs.txt, etc.)
How could I do that?
I've tried to create a script based on the ls and cp commmands, but I don't know how I can take the name of the directories to rename the files in it after I moved them. I guess it has to be some sort of iterative sentence (for X on Y directories) but I don't know how to do it.
You can move and rename your file in one shot with mv, with a loop that grabs all your files through a glob:
#!/bin/bash
dest_dir=/home/allthedocs
cd /home/docs
for file in */doc.txt; do
[[ -f "$file" ]] || continue # skip if not a regular file
dir="${file%/*}" # get the dir name from path
mv "$file" "$dest_dir/$dir.txt"
done
See this post for more info:
Copying files from multiple directories into a single destination directory
Here is a one liner solution that treats whitespaces in filenames, just as #codeforester 's solution does with the glob.
Note that white spaces are treated with the "-print0" option passed to "find", the internal field separator (IFS) in while loop and the wrapping of file3 variable with quotes.
The parameter substitution from file2 into file3 gets rid of the leading "./".
The parameter substition inside the move command turns the path into a filename (run under /home/docs/):
find . -maxdepth 2 -mindepth 2 -print0 | while IFS= read -r -d '' file; \
do file2=$(printf '%s\n' "$file"); file3=${file2#*\/*}; \
mv "$file2" ../allsil/"${file3//\//}"; done

Linux Bash: Move multiple different files into same directory

As a rather novice Linux user, I can't seem to find how to do this.
I am trying to move unique files all in one directory into another directory.
Example:
$ ls
vehicle car.txt bicycle.txt airplane.html train.docx (more files)
I want car.txt, bicycle.txt, airplane.html, and train.docx inside vehicle.
Right now I do this by moving the files individually:
$ mv car.txt vehicle
$ mv bicycle.txt vehicle
...
How can I do this in one line?
You can do
mv car.txt bicycle.txt vehicle/
(Note that the / above is unnecessary, I include it merely to ensure that vehicle is a directory.)
You can test this as follows:
cd #Move to home directory
mkdir temp #Make a temporary directory
touch a b c d #Make test (empty) files ('touch' also updates the modification date of an existing file to the current time)
ls #Verify everything is there
mv a b c d temp/ #Move files into temp
ls #See? They are gone.
ls temp/ #Oh, there they are!
rm -rf temp/ #DESTROY (Be very, very careful with this command)
Shorthand command to move all .txt file
You can try using a wildcard. In the code below, * will match all the files which have any name ending with .txt or .docx, and move them to the vehicle folder.
mv *.txt *.docx vehicle/
If you want to move specific files to a directory
mv car.txt bicycle.txt vehicle/
Edit: As mentioned in a comment, If you are moving files by hand, I suggest using mv -i ... which will warn you in case the destination file already exists, giving you a choice of not overwriting it. Other 'file destroyer' commands like cp & rm too have a -i option
mv command in linux allow us to move more than one file into another directory. All you have to do is write the name of each file you want to move, seperated by a space.
Following command will help you:
mv car.txt bicycle.txt airplane.html train.docx vehicle
or
mv car.txt bicycle.txt airplane.html train.docx vehicle/
both of them will work.
You can move multiple files to a specific directory by using mv command.
In your scenario it can be done by,
mv car.txt bicycle.txt airplane.html train.docx vehicle/
The point you must note is that the last entry is the destination and rest everything except mv is source.
One another scenario is that the destination is not present in our directory,then we must opt for absolute path in place of vehicles/.
Note: Absolute path always starts from / ,which means we are traversing from root directory.
I have written a small bash script that will move multiple files(matched using pattern) present in multiple directories(matched using pattern) to a single location using mv and find command in bash
#!/bin/bash
for i in $(find /path/info/*/*.fna -type f) # find files and return their path
do
mv -iv $i -t ~/path/to/destination/directory # move files
done
$() is for command substitution(in other words it expand the expression inside it)
/*/ wild card for matching any directory, you can replace this with any wild card expression
*.fna is for finding any file with.fna extension
-type f is for getting the full path info of the located file
-i in mv is for prompt before overwrite( extra caution in case the wild card exp was wrong)
-v for verbose
-t for destination
NOTE: the above flags are not mandatory
Hope this helps

Resources