Bash script - iterate through folders and move into folders of 1000 - linux

I have 1.2 million files split out into folders, like so:
Everything
..........Folder 1
..................File 1
..................File 2
..................File 3
..................File 4
..................File 5 etc
..........Folder 2
..................File 1
..................File 2
..................File 3
..................File 4
..................File 5 etc
If I cd into Folder 1 I can run the following script to organize the files there into folders called 1, 2, 3, etc. of 1000 files each:
dir="${1-.}"
x="${2-1000}"
let n=0
let sub=0
while IFS= read -r file ; do
if [ $(bc <<< "$n % $x") -eq 0 ] ; then
let sub+=1
mkdir -p "$sub"
n=0
fi
mv "$file" "$sub"
let n+=1
done < <(find "$dir" -maxdepth 1 -type f)
However I really would like to run it once on the Everything folder at the top level. From there it would consider the child folders, and do the by-1000 sorting so I could move everything out of Folder 1, Folder 2, etc. and into folders of 1000 items each called 1, 2, 3, etc.
Any ideas?
Edit: Heres how I would like the files to end up (as per comments):
Everything
..........Folder1
.................file1(these filenames can be anything, they shouldnt be renamed)
.................(every file in between so file2 > file 999)
.................file1000
..........Folder2
.................file1001
.................(every file in between so file1002 > file file1999)
.................file2000
Every single possible file that is in the original folder structure is grouped into folders of 1000 items under the top level.

Let's assume your script is called organize.sh, and the Everything folder contains only directories. Try the following:
cd Everything
for d in *; do
pushd $d
bash ~/temp/organize.sh
popd
done
Update
To answer Tom's question in the comment: you only need one copy of organize.sh. Say if you put it in ~/temp, then you can invoke as updated above.

Pseudo Algo:
1) Do ls for all your directories and store them in a file.
2) Do cd into each directory you copied into your file.
3) Sort all your files
4) Do cd ..
5) Repeat step 2-4 in a for loop.

Related

How to run bash file for (different directory) as input automatically

I have a bash file which takes 5 inputs.
Input1 = file1
Input2 = file2
Input3 = directory1
Input4 = func
Input5 = 50
input 4 and 5 are always the same, never change.
file1 and file 2 are located inside directory1
directory1 is located inside a code directory
/code/directory1/file1
/code/directory1/file2
and there are many directories with the same structure directory(1-70) inside the code folder
/code/directory1/*
/code/directory2/*
/code/directory3/*
...
/code/directory70/*
In order to run the bash file, I have to run the command from terminal 70 times :<
Is there a way to automatically run all these folders at once?
UPDATE: the directory(1-7) each one has a different name e.g bug1, test, 4-A and so on. Even the files are different e.g. bug1.c hash.c
/code/bug1/bug1.c
code/bug1/hash.c
Try this:
for dirs in $(ls -F /code | grep '/')
do
eval files=( "$(ls -1 ${dirs})" )
<ShellScript>.sh "${dirs}${files[0]}" "${dirs}${files[1]}" "${dirs%/}" func 50
done

Concatenate a string with an array for recursively copy file in bash

I have a concatenation problem between a string and an array
I want to copy all the files contained in the directories stored in the array, my command is in a loop (to recursively copy my files)
yes | cp -rf "./$WORK_DIR/${array[$i]}/"* $DEST_DIR
My array :
array=("My folder" "...")
I have in my array several folder names (they have spaces in their names) that I would like append to my $WORK_DIR to make it possible to copy the files for cp.
But I always have the following error
cp: impossible to evaluate './WORKDIR/my': No such files or folders
cp: impossible to evaluate 'folder/*': No such files or folders
This worked for me
#!/bin/bash
arr=("My folder" "This is a test")
i=0
while [[ ${i} -lt ${#arr[#]} ]]; do
echo ${arr[${i}]}
cp -rfv ./source/"${arr[${i}]}"/* ./dest/.
(( i++ ))
done
exit 0
I ran the script. It gave me the following output:
My folder
'./source/My folder/blah-folder' -> './dest/./blah-folder'
'./source/My folder/foo-folder' -> './dest/./foo-folder'
This is a test
'./source/This is a test/blah-this' -> './dest/./blah-this'
'./source/This is a test/foo-this' -> './dest/./foo-this'
Not sure of the exact difference, but hopefully this will help.

How to get numbers to come after decimal point in alphabetical sorting order in Bash

I have this .sh script that goes through every folder in a parent folder and runs program in each. The code I used was the following:
for d in ./*/
do cp program "$d"
(cd "$d" ; ./program)
done
program, among other things, gets the name of each folder and writes it to a file data.dat, so that all folder names are listed there. These folders' names are numbers (decimal) that identify their contents. program writes the folder name to data.dat when it enters each folder, so that they will appear in the order that Bash goes through the folders.
I want them to be sorted, in data.dat, in alphabetical order, putting lower numbers before higher, regardless of being a 1-digit or 2-digit number. For example, I want 2.32 to come before 10.43 and not the other way around.
The problem, it seems, is that for Bash the . comes after numbers in the order.
How can I change it to come before numbers?
Thanks in advance!
EDIT:
program is in Fortran 77 and goes like this:
` program getData
implicit none
character counter*20, ac*4, bash*270, Xname*4, fname*15
double precision Qwallloss, Qrad, Nrad, Qth, QreacSUM
double precision Xch4in, Ych4in, length, porosity, Uin, RHOin
double precision MFLR, Area, Xvalue
integer I
bash="printf '%s\n'"//' "${PWD##*/}" > RunNumber.txt'
call system(bash) !this gets the folder name and writes
!to RunNumber.txt
open(21, form="FORMATTED", STATUS="OLD", FILE="RunNumber.txt")
rewind(21)
read(21,*) counter !brings the folder name into the program
close(21)
`
(...)
`
call system(' cp -rf ../PowerData.dat . ')
open(27, form="FORMATTED", STATUS="OLD", ACCESS="APPEND", !the new row is appended to the existing file
1 FILE="PowerData.dat")
write(27,600) Counter, Xvalue, Nrad, Qrad, Qth, !writes a row of variables,
1 Area, MFLR, Uin, RHOin, Xch4in, Ych4in !starting with the folder name,
!to the Data file
close(27)
call system('cp -rf PowerData.dat ../')
end program`
I expect that your program will in the future do perhaps a bit more and therefore I made the second loop.
for d in ./*/ ; do
echo "$d"a >> /tmp/tmpfile
done
for d in $(sort -n /tmp/tmpfile) ; do
cp program "$d"
(cd "$d" ; ./program)
done
There are more ways to do this; for example:
for d in $(ls | sort -n) ; do
(some will castigate me for parsing the output of ls) etcetera.
So if you do:
mkdir test
cd test
touch 100
touch 2.00
touch 50.1
ls will give you
100 2.00 50.1
ls | sort -n will give you
2.00
50.1
100
and, as a bonus, ls -v will give you
2.00 50.1 100

LINUX: How can I recursively zip files in sub-folders?

I have the following directory structure:
/Data
- file 1
- file 2
/Folder1
- file 3
- file 4
/Folder2
- file 5
- file 6
/Folder3
- file 7
- file 8
In Linux I want to zip files (excluding folders) in every directory and create a 7z (or zip) archive in each folder resulting the following:
/Data
Data.7z (Note: this should contain only file1 & 2, not any sub directories)
/Folder1
Folder1.7z (this should contain only file3 & 4, not any sub directories)
/Folder2
Folder2.7z (this should contain only file5 & 6, no Folder3)
/Folder3
Folder3.7z (should contain only file7 & 8)
Following script works on the first directory but not on the sub-directories : for i in */ ; do base=$(basename ā€œ$iā€) ; cd $base ; 7za a -t7z -r $base * ; .. ; cd .. ; done;
How can I achieve this? Thank you.

Find which file has the issue from the below Shell Script

Problem Statement:-
Below is the script that someone else wrote and he left the company so I don't know whom should I ask about this. So that is the reason I am posting here to find the solution.
What this script does is- It gzip the data from a particular folder (/data/ds/real/EXPORT_v1x0) for a particular date (20121017) and move it back to HDFS (hdfs://ares-nn/apps/tech/ds/new/) directory.
date=20121017
groups=(0 '1[0-3]' '1[^0-3]' '[^01]')
for shard in 0 1 2 3 4 5 6 7 8 9 10 11; do
for piece in 0 1 2 3; do
group=${groups[$piece]}
if ls -l /data/ds/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz; then
gzip -dc /data/ds/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz | \
hadoop jar /export/home/ds/lib/HadoopUtil.jar com.host.hadoop.platform.util.WriteToHDFS -z -u \
hdfs://ares-nn/apps/tech/ds/new/$date/EXPORT-part-$shard-$piece
sleep 15
fi
done
done
So during the migration to HDFS I found out this file has some problem in HDFS-
hdfs://ares-nn/apps/tech/ds/new/20121017/EXPORT-part-8-3
So Is there any way by doing some permutation from the above script we can find out what are the files under this directory (/data/ds/real/EXPORT_v1x0) which ultimately got converted to this hdfs://ares-nn/apps/tech/ds/new/20121017/EXPORT-part-8-3 which has the problem.
Any thoughts?
Update:-
Something like this below?
groups=(0 '1[0-3]' '1[^0-3]' '[^01]')
for shard in 0 1 2 3 4 5 6 7 8 9 10 11; do
for piece in 0 1 2 3; do
group=${groups[$piece]}
if ls -l /data/ds/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz; then
[ "$date/EXPORT-part-$shard-$piece" == "20121017/EXPORT-part-8-3" ] && {
echo /data/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz
}
fi
done
done
Few Sample Files Format I have in the /data/real/EXPORT folder-
/data/real/EXPORT_v1x0_20121017_4_T_115600_115800.dat.gz
/data/real/EXPORT_v1x0_20121017_4_T_235600_235800.dat.gz
/data/real/EXPORT_v1x0_20121017_4_T_115800_120000.dat.gz
/data/real/EXPORT_v1x0_20121017_4_T_235800_000000.dat.gz
And few sample output that I got after making changes-
/data/real/EXPORT_v1x0_20121017_0_T_0*.dat.gz: No such file or directory
/data/real/EXPORT_v1x0_20121017_0_T_1[0-3]*.dat.gz: No such file or directory
/data/real/EXPORT_v1x0_20121017_0_T_1[^0-3]*.dat.gz: No such file or directory
/data/real/EXPORT_v1x0_20121017_0_T_[^01]*.dat.gz: No such file or directory
In this case reaplce the whole gzip line to:
[ "$date/EXPORT-part-$shard-$piece" == "20121017/EXPORT-part-8-3" ] && {
echo /data/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz
}
That should do the trick.
Edit: remove sleep to speed up the loop!

Resources