how to delete first 50 directories within a directory linux bash - linux

I am looking to run a script which moves 50 directories to a new directory, once it has carried out that action it then deletes those 50 from the original directory
I have the below so far in my bash script:
cd /folder1/subfolder1/directories
mv `ls | head -50` ../subfolder2/

cd /folder1/subfolder1/directories
dirs=( ./*/ ) # an array of directories
mv -t ../subfolder2/ "${dirs[#]:0:50}" # first 50 array elements
For GNU coreutils mv, -t is the "target" (aka, destination) directory. This can be very handy if there are hundreds/thousands of files to move (more than can fit in one command):
some-process-that-produces-filenames-on-stdout | xargs mv -t dest_dir/

Related

Is this possible in this command to cd into the directory thats printed in output

When I do ls | grep -e *-folder1 it prints my-folder1 that's the name of the folder matched in the command in current directory.
Is there a way I can add something like cd into this directory. This is more of an attempt to learn Bash or commands on Linux, rather than about doing what I am trying to accomplish.
You could do
ls | grep -- -folder1 | while read -r dir
do
cd "$dir"
# do things in $dir
done
# do things in the original directory
but parsing the output of ls is not recommended. You could instead use globbing:
for dir in *-efolder*
do
cd "$dir"
# do things in $dir
cd .. # need to back out again
done
# do things in the original directory
If the purpose isn't to grep on all folders matching a certain pattern and to cd down into each one of them, but to simply cd into a directory ending with -folder1, then:
cd *-folder1
If you get zero or multiple hits, cd will shown an error.

How to iterate through folders and subfolders to delete n number of files randomly?

I have 4 folders (named W1, W3, W5, W7) and each one of those folders has approximately 30 subfolders (named M1 - M30). Each subfolder contains 24 .tif files (named Image_XX.tif).
I need to randomly "sample" each subfolder, more specifically, I need to get rid of 14 .tif files while keeping 10 .tif files in each subfolder.
I figure that deleting 14 files at random is easier than choosing 10 files at random and copying them to new subfolders within folders.
I thought that writing a bash script to do so would be the way, but I'm fairly new to programming and I'm stuck.
Below is one of the several scripts I've tried:
#!/bin/bash
for dir in /Users/Fer/Subsets/W1/; do
if [ -d "$dir" ]; then
cd "$dir"
gshuf -zn14 -e *.tif | xargs -0 rm
cd ..
fi
done
It runs for a second, but nothing seems to happen. Any help is appreciated.
For every subdirectory.
Find all files.
Choose a random number of files from the list.
Delete.
I think something along:
for dir in /Users/Fer/Subsets/W*/M*/; do
printf "%s\n" "$dir"/*.tif |
shuf -z -n 14 |
xargs -0 -t echo rm -v
done
Used some of the suggestions above and the code below worked:
for dir in /Users/Fer/Subsets/W*/M*; do
gshuf -zn14 -e "$dir"/*.tif | xargs -0 rm
done

How to copy the contents of a folder to multiple folders based on number of files?

I want to copy the files from a folder (named: 1) to multiple folders based on the number of files (here: 50).
The code given below works. I transferred all the files from the folder to the subfolders based on number of files and then copied back all the files in the directory back to the initial folder.
However, I need something cleaner and more efficient. Apologies for the mess below, I'm a nube.
bf=1 #breakfolder
cd 1 #the folder from where I wanna copy stuff, contains 179 files
flies_exist=$(ls -1q * | wc -l) #assign the number of files in folder 1
#move 50 files from 1 to various subfolders
while [ $flies_exist -gt 50 ]
do
mkdir ../CompiledPdfOutput/temp/1-$bf
set --
for f in .* *; do
[ "$#" -lt 50 ] || break
[ -f "$f" ] || continue
[ -L "$f" ] && continue
set -- "$#" "$f"
done
mv -- "$#" ../CompiledPdfOutput/temp/1-$bf/
flies_exist=$(ls -1q * | wc -l)
bf=$(($bf + 1))
done
#mover the rest of the files into one final subdir
mkdir ../CompiledPdfOutput/temp/1-$bf
set --
for f in .* *; do
[ "$#" -lt 50 ] || break
[ -f "$f" ] || continue
[ -L "$f" ] && continue
set -- "$#" "$f"
done
mv -- "$#" ../CompiledPdfOutput/temp/1-$bf/
#get out of 1
cd ..
# copy back the contents from subdir to 1
find CompiledPdfOutput/temp/ -exec cp {} 1 \;
The required directory structure is:
parent
________|________
| |
1 CompiledPdfOutput
| |
(179) temp
|
---------------
| | | |
1-1 1-2 1-3 1-4
(50) (50) (50) (29)
The number inside "()" denotes the number of files.
BTW, the final step of my code gives this warning, would be glad if anyone can explain what's happening and a solution.
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-4'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-3'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-1'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-2'
I dont wnt to copy the directory as well, just the files so giving -r would be bad.
Assuming that you need something more compact/efficient, you can leverage existing tools (find, xargs) to create a pipeline, eliminating the need to program each step using bash.
The following will move the files into the split folder. It will find the files, group them, 50 into each folder, use awk to generate output folder, and move the files. Solution not as elegant as original one :-(
find 1 -type f |
xargs -L50 echo |
awk '{ print "CompliedOutput/temp/1-" NR, $0 }' |
xargs -L1 echo mv -t
As a side note, current script moves the files from the '1' folder, to the numbered folders, and then copy the file back to the original folder. Why not just copy the files to the numbered folders. You can use 'cp -p' to to preserve timestamp, if that's needed.
Supporting file names with new lines (and spaces)
Clarification to question indicate a solution should work with file names with embedded new lines (and while spaces). This require minor change to use NUL character as separator.
# Count number of output folders
DIR_COUNT=$(find 1 -type f -print0 | xargs -0 -I{} echo X | wc -l)
# Remove previous tree, and create folder
OUT=CompiledOutput/temp
rm -rf $OUT
eval mkdir -p $OUT/1-{1..$DIR_COUNT}
# Process file, use NUL as separator
find 1 -type f -print0 |
awk -vRS="\0" -v"OUT=$OUT" 'NR%50 == 1 { printf "%s/1-%d%s",OUT,1+int(NR/50),RS } { printf "%s", ($0 RS) }' |
xargs -0 -L51 -t mv -t
Did limited testing with both space and new lines in the file. Looks OK on my machine.
I find a couple of issues with the posted script:
The logic of copying maximum 50 files per folder is overcomplicated, and the code duplication of an entire loop is error-prone.
It uses reuses the $# array of positional parameters for internal storage purposes. This variable was not intended for that, it would be better to use a new dedicated array.
Instead of moving files to sub-directories and then copying them back, it would be simpler to just copy them in the first step, without ever moving.
Parsing the output of ls is not recommended.
Consider this alternative, simpler logic:
Initialize an empty array to_copy, to keep files that should be copied
Initialize a folder counter, to use to compute the target folder
Loop over the source files
Apply filters as before (skip if not file)
Add file to to_copy
If to_copy contains the target number of files, then:
Create the target folder
Copy the files contained in to_copy
Reset the content of to_copy to empty
Increment folder_counter
If to_copy is not empty
Create the target folder
Copy the files contained in to_copy
Something like this:
#!/usr/bin/env bash
set -euo pipefail
distribute_to_folders() {
local src=$1
local target=$2
local max_files=$3
local to_copy=()
local folder_counter=1
for file in "$src"/* "$src/.*"; do
[ -f "$file" ] || continue
to_copy+=("$file")
if (( ${#to_copy[#]} == max_files )); then
mkdir -p "$target/$folder_counter"
cp -v "${to_copy[#]}" "$target/$folder_counter/"
to_copy=()
((++folder_counter))
fi
done
if (( ${#to_copy[#]} > 0 )); then
mkdir -p "$target/$folder_counter"
cp -v "${to_copy[#]}" "$target/$folder_counter/"
fi
}
distribute_to_folders "$#"
To distribute files in path/to/1 into directories of maximum 50 files under path/to/compiled-output, you can call this script with:
./distribute.sh path/to/1 path/to/compiled-output 50
BTW, the final step of my code gives this warning, would be glad if anyone can explain what's happening and a solution.
Sure. The command find CompiledPdfOutput/temp/ -exec cp {} 1 \; finds files and directories, and tries to copy them. When cp encounters a directory and the -r parameter is not specified, it issues the warning you saw. You could add a filter for files, with -type f. If there are not excessively many files then a simple shell glob will do the job:
cp -v CompiledPdfOutput/temp/*/* 1
This will copy files to multiple folders of fixed size. Change source, target, and folderSize as per your requirement. This also works with filenames with special character (e.g. 'file 131!##$%^&*()_+-=;?').
source=1
target=CompiledPDFOutput/temp
folderSize=50
find $source -type f -printf "\"%p\"\0" \
| xargs -0 -L$folderSize \
| awk '{system("mkdir -p '$target'/1-" NR); printf "'$target'/1-" NR " %s\n", $0}' \
| xargs -L1 cp -t

How to find which files / folders are on both computers?

I have a folder called documentaries on my Linux computer.
I have SSH access to seedbox (also Linux).
How do I find out which documentaries I have in both computers?
On seedbox it's a flat file structure. Some documentaries are files, some are folders which contain many files, but all in same folder
For example:
data/lions_botswana.mp4
data/lions serengeti/S01E01.mkv
data/lions serengeti/S01E02.mkv
data/strosek_on_capitalism.mp4
data/something_random.mp4
Locally structure is more organized
documentaries/animals/lions_botswana.mp4
documentaries/animals/lions serengeti/S01E01.mkv
documentaries/animals/lions serengeti/S01E02.mkv
documentaries/economy/strosek_on_capitalism.mp4
documentaries/something_random.mp4
I am not looking for command like diff, I am looking for command like same (opposite of diff) if such command exists.
Based on the answer from Zumo de Vidrio, and my comment:
on one computer
cd directory1/; find | sort > filelist1
on the other
cd directory2/; find | sort > filelist2
copy them in one place an run:
comm -12 filelist1 filelist2
or as a one liner:
ssh user#host 'cd remotedir/; find|sort' | comm -12 - <(cd localdir/; find|sort)
Edit: With multiple folders this would look as follows
on one computer
cd remotedir/; find | sort > remotelist
on the other
cd localdir/subdir1/; find > locallist1
cd -;
cd localdir/subdir2/; find > locallist2
cd -;
#... and so on
sort locallist1 locallist2 > locallistall
copy them in one place an run:
comm -12 remotelist locallistall
or as a (now very long) one liner:
ssh user#host 'cd remotedir/; find|sort' | comm -12 - <({cd localdir/subdir1/; find; cd -; cd localdir/subdir2/; find; cd -; cd localdir/subdir3/; find}|sort)
Export list of remote files to local file by:
ssh user#seedbox 'find /path/to/data -type f -execdir echo {} ";"' > remote.txt
Note: On Linux you've to use absolute path to avoid leading ./ or use with "$PWD"/data.
Then grep the result of find command:
find documentaries/ -type f | grep -wFf remote.txt
This will display only these local files which also exist on remote.
If you would like to generate similar list on local and compare two files, try:
find "$PWD"/documentaries/ -type f -execdir echo {} ';' > local.txt
grep -wFf remote.txt local.txt
However above methods aren't reliable, since one file could have a different size. If files would have the same structure, you could use rsync to keep your files up-to-date.
For more reliable solution, you can use fdupes which can find all files which exist in both directories by comparing file sizes and MD5 signatures.
Sample syntax:
fdupes -r documentaries/ data/
However both directories needs to be accessible locally, so you can always use sshfs tool to mount the remote directory locally. Then you can use fdupes to find all duplicate files. It has also option to remove the other duplicates (-d).
Copy the ls output of each Computer to a same folder and then apply diff over them:
In your computer:
ls -R documentaries/ > documentaries_computer.txt
In seedbox:
ls -R documentaries/ > documentaries_seedbox.txt
Copy both files to a same location and execute:
diff documentaries_computer.txt documentaries_seedbox.txt
You can mount remote folder using sshfs, then you can use diff -r to find the differences between them.
E.g.
sshfs user#seedbox-host:/path/to/documentaries documentaries/
diff -rs /local/path/documentaries/animals documentaries/ | grep identical
diff -rs /local/path/documentaries/economy documentaries/ | grep identical

How can I generate a folder with the last X files added?

So I have a huge folder full subfolders with tons of files, and I add files to it all the time.
I need a subfolder in the root of that folder with a symlink of the last 10-20 files added so that I can quickly find the things I recently added. This is located on a NAS, but I have a linux box running Arch connected through NFS, so I assume the best way is to run a bash script with a find command followed by a loop of ln -sf, but I can't do it safely without help.
Something like this is required:
mkdir -p subfolder
find /dir/ -type f -printf '%T# %p\n' | sort -n | tail -n 10 | cut -d' ' -f2- | while IFS= read -r file ; do ln -s "$file" subfolder ; done
Which will create symlinks in subfolder pointing to the 10 most recently modified files in the directory tree rooted at /dir/
You could just create a shell function like:
recent() { ls -lt ${1+"$#"} | head -n 20; }
which will give you a listing of the 20 most recent items in the specified directories, or the current directory if no arguments are given.

Resources