Need to divide 8tb directory into 4 directories - linux

I have a directory called BigDataDirectory which has plenty of files and all of it in total adds up to 8tb.
I am trying to upload to our server and want to make sure I can divide the folder into four parts so I have four folders of about 2tb each.
I tried the split command but it doesn't seem to be working
nohup split -b 2T BigDataDirectory "Directory" &
Could you tell me just a simple way to divide my directory/folder into multiple parts?

If all your files are of a similar size or if the total directory size doesn't need to be very precise, you could do it with these fun commands.
It creates a sub-directory every 250 files and moves the files to a sub-directory. If you have 1000 files, they will be moved to 4 sub-directories named 1, 2, 3 and 4. If you have 1001 files, a 5th subdir will be created for the last file.
cd YOUR_BIG_DIR
n=250 # will change destination dir after 250 files
dir=1; mkdir -p $dir # create first destination sub-directory "1"
ls -1 | while read f; do ((c+=1)); echo mv "$f" $dir/; [ $((c % n)) -eq 0 ] && ((dir+=1)) && mkdir -vp $dir; done
If the output looks like what you want, remove echo from the command to really mv the files.
The same command, explained, and without the "echo" used for testing:
# list all files, 1 per line
ls -1 \
| while read f; do # with each file "$f"
((c+=1)) # increase file counter
mv "$f" $dir/ # move the file to $dir/
# if counter c is a divisor of n, increase the directory number
# and create the new destination directory
[ $((c % n)) -eq 0 ] && ((dir+=1)) && mkdir -v -p $dir
done
If you want sub-directories with a more exact size, then you would need to script something more sophisticated, using stat -c %s to get the size of each file or something similar.

Related

Shell script to copy one file at a time in a cron job

I have some csv files in location A like this
abc1.csv,
abc2.csv,
abc3.csv
I have a cron job which runs every 30 mins and in each execution I want to copy only 1 file(which shouldn't be repeated) and placed it in location B
I had though of 2 ways of doing this
1)I will pick the first file in the list of files and copy it to location B and will delete it once copied.Problem with this is I am not sure when the file will get copied completely and if i delete before its completed copied it can be an issue
2)I will have a temp folder.So i will copy the file from location A to location B and also keep it in temp location.In next iteration, when I pick the file from list of files I will compare its existence in the temp file location.If it exists I will move to next file .I think this will be more time consuming etc.
Please suggest if there is any other better way
you can use this bash script for your use case:
source="/path/to/.csv/directory"
dest="/path/to/destination/directory"
cd $source
for file in *.csv
do
if [ ! -f $dest/"$file" ]
then
cp -v $file $dest
break
fi
done
You can ensure you move the already copied file with:
cp abc1.csv destination/ && mv abc1.csv.done
(here you can make your logic to find only *.csv files, and not take into account *.done files.. that have been already processed by your script... or use any suffix you want..
if the cp does not succeed, nothing after that will get executed, so the file will not be moved.
You can also replace mv with rm to delete it:
cp abc1.csv destination/ && rm -f abc1.csv
Further more, you can add to the above commands error messages in case you want to be informed if the cp failed:
cp abc1.csv destination/ && mv abc1.csv.done || echo "copy of file abc1.csv failed"
And get informed via CRON/email output
Finally I took some idea from both the opted solution.Here is the final script
source="/path/to/.csv/directory"
dest="/path/to/destination/directory"
cd $source
for file in *.csv
do
if [ ! -f $dest/"$file" ]
then
cp -v $file $dest || echo "copy of file $file failed"
rm -f $file
break
fi
done

How to perform a cronjob only when a file is greater than a certain size?

The following script (credit to Romeo Ninov) selects the most recent directory and performs a cp operation:
dir=$(ls -tr1 /var/lib/test|tail -1)
cd /var/lib/test/$dir && cp *.zip /home/bobby/
Please see: How can I use a cronjob when another program makes the commands in the cronjob fail? for the previous question.
I would like to modify this so that the cp only happens if the .zip file is larger than a defined byte size e.g. 28,000 bytes. If the .zip file is smaller, nothing is copied.
As before, this would happen in /var/lib/test/**** (where **** goes from 0000 to FFFF and increments every day).
Thanks!
You can rewrite your script on this way:
dir=$(ls -tr1 /var/lib/test|tail -1)
cd /var/lib/test/$dir
for i in *.zip
do
if [ "$(stat --printf="%s" $i)" -gt 28000 ]
then cp $i /home/bobby
fi
done

copy and append files in ubuntu linux

I have two folders each containing 351 text files and i want to copy the corresponding text from one folder to corresponding file in another folder?
when i am using cat command i am getting an empty file as a result? what could be the problem
my code is :
#!/bin/bash
DIR1=$(ls 2/)
DIR2=$(ls 3/)
for each $i in $DIR1; do
for each $j in $DIR2; do
if [[ $i == $j ]];then
sudo cat $i $j >> $j
fi
done
done
2/ and 3/ are the folders containing the data...
DIR1 and DIR2 contain the file names in directories 2 and 3 respectively.
Apart from possible problems with spaces or special characters in file names, you would have to use 2/$i and 3/$j. $i and $j alone would reference files with the same names in the current directory (parent of 2 and 3).
It's better not to parse the output of ls.
You don't need two nested loops.
#!/bin/bash
DIR1=2
DIR2=3
for source in $DIR1/*
do
dest="$DIR2/$(basename $source)"
if [ -f "$dest" ]
then
sudo cat "$source" >> "$dest"
fi
done
see also https://mywiki.wooledge.org/BashPitfalls#for_f_in_.24.28ls_.2A.mp3.29
Depending on your needs it may be better to run the whole script with sudo instead of running sudo for every file. The version above will only execute cat "$source" as root. When running the whole script as root this includes also >> "$dest".

Extracting zip file and then cd into it with different filename

I am creating a bash script to extract a tar file and cd'ing into it and then it runs another script. So far this has been working pretty well with my code below, however, i ran into a case where if the extracted folder is different than the .tar file name then it would cause an issue. So my question is, how should I handle unique cases where the file name is different than then .tar filename.
e.g,) my_file.tar ---> after extraction ----> my_different_file_name
#!/bin/bash
fname=$1
echo the file you are about to extract is $fname
if [ -f $fname ]; then #if the file exists
tar -xvzf $fname #tar it
cd ${fname%.*} #the `%.*` will extract filename from filename.tgz and cd into it
echo ${fname%.*}
echo $PWD
loadIt #another script to load
fi
You could do a:
topDir=$(tar -xvzf $fname | sed "s|/.*$||" | uniq)
[ $(wc -w <<< $topDir) == 1 ] || exit 1
echo topDir=$topDir
Explanation: the first command untars vebosely (outputs all files it's untarring), and then gets all the leading directory names, and pipes them into uniq. (so basically it returns a list of all the top level directories in the tar file). The next line checks that there's exactly one entry in topDir, otherwise it exits.
At this point $topdir will be the directory you want to cd into.
Maybe you could do something like that:
cd $(tar -tf $fname | head -1)
If you don't mind moving the directory around after you extract it you can do something like this
# Create a temporary directory
$ tmpd=$(mktemp -d)
# Change to the temporary directory
$ pushd "$tmpd"
# Extract the tarball
$ tar -xf "$fname"
# Glob the directory name
$ d=(*)
# Error if we have more (or less) than one directory
$ [ "${#d}" = 0 ] || exit 1
# Explicitly use just the first directory (optional since `$d` does the same thing)
$ d=${d[0]}
# Move the extracted directory to the previous directory
$ mv "$d" "$OLDPWD"
# Change back to the starting directory
$ popd
# Remove the (now empty) temporary directory
$ rmdir "$tmpd"
# Change into the extracted directory
$ cd "$d"
# Run 'loadIt'
$ loadIt

Bash script that creates a directory structure

I've been googling all night trying to find a way to create a script that creates a directory structure. That looks something like this:
/
shared
shared/projects
shared/series
shared/movies
shared/movies/action
You get the point.
The file that the script reads from look like this:
shared backup
shared data
shared projects
shared projcets series
shared projects movies
shared projects movies action
I want to create a script that reads each line in the file and run the following for each line:
If the directory exist, it places itself in the directory and create the structure from there, if
The directory doesn’t exist, create it.
When all entries in the row have been preceded by, go back to original directory and read the next line.
My system is Ubuntu 10.10.
So far I’ve done this, but it doesn’t work.
#!/bin/bash
pwd=$(pwd)
for structure in ${column[*]}
do
if [ $structure ]
then
cd $structure
else
mkdir $structure
fi
done
cd $pwd
You can use mkdir -p shared/projects/movies/action to create the whole tree: it will create shared, then shared/projects, then shared/projects/movies, and shared/projects/movies/action.
So basically you need script that runs mkdir -p $dir where $dir is the leaf directory of your directory tree.
If struct.txt contains the directory structure that you mention, then just run:
sed '/^$/d;s/ /\//g' struct.txt | xargs mkdir -p
sed will remove blank lines and make the remaining lines look like directory paths.
xargs will take each line and pass it as a parameter to mkdir.
mkdir will make the directory and the -p flag will create any parent directories if needed.
mkdir has a flag -p that creates all the parent directories of the directory you're creating if needed. you can just just read each line, turn it into a path (i.e. s/ /\//g) and call mkdir -p $path on each line
For my solution it was important to me:
a) I wanted to be able to edit the directory structure directly in my bash script so that I didn't have to jump back and forth between two files
b) The code for the folders should be as clear as possible without redundancy with the same paths, so that I can change it easily
# Creates the folder structure defined in folder structure section below
function createFolderStructure() {
depth="1"
while (( "$#" )); do
while (( $1 != $depth )); do
cd ..
(( depth-- ))
done
shift
mkdir "$1"
cd "$1"
(( depth++ ))
shift
done
while (( 1 != $depth )); do
cd ..
(( depth-- ))
done
}
# Folder Structure Section
read -r -d '' FOLDERSTRUCTURE << EOM
1 shared
2 projects
3 movies
4 action
2 series
2 backup
EOM
createFolderStructure $FOLDERSTRUCTURE
Git needs files to record directories. So I put a readme file in each directory and extended the script as follows:
# Creates the folder structure defined in folder structure section below
function createFolderStructure() {
depth="1"
while (( "$#" )); do
while (( $1 != $depth )); do
cd ..
(( depth-- ))
done
shift
mkdir "$1"
cd "$1"
(( depth++ ))
shift
shift
out=""
while [[ "$1" != "-" ]]; do
out=$out" ""$1"
shift
done
shift
echo "$out" > README.md
done
while (( 1 != $depth )); do
cd ..
(( depth-- ))
done
}
# If you like you can read in user defined values here and use them as variables in the folder structure section, e.g.
# echo -n "Enter month of films"
# read month
# ...
# 1 shared - Folder for shared stuff -
# 2 $month - Films from month $month -
# 3 projects - Folder for projects -
# ...
# Folder Structure Section
read -r -d '' FOLDERSTRUCTURE << EOM
1 shared - Folder for shared stuff -
2 projects - Folder for projects -
3 movies - Folder for movies -
4 action - Folder for action movies -
2 series - Folder for series -
2 backup - Backup folder -
EOM
createFolderStructure $FOLDERSTRUCTURE
1) Do something like this
find . -type d > folder_list.txt
to create a list of the folders you need to create.
2) Transfer the list to your destination
3) Recreate the structure in your new location:
cat folder_list.txt | xargs mkdir
notice that you don't need '-p' option in this case though it wouldn't hurt too.
I use this script in my .bash_profile that I use for new projects:
alias project_setup="mkdir Sites Documents Applications Website_Graphics Mockups Logos Colors Requirements Wireframes"
If you want to make a nested folder structure you you could do something like:
alias shared_setup="mkdir Shared shared/projects shared/series shared/movies shared/movies/action"
Assuming you wish to create a tree of folders / directories as below:
tmpdir
________|______
| | |
branches tags trunk
|
sources
____|_____
| |
includes docs
Also assuming that you have a variable that mentions the directory names.
DOMAIN_NAME=includes,docs
You may issue below command:
$ eval "mkdir -p tmpdir/{trunk/sources/{${DOMAIN_NAME}},branches,tags}"
Note: use the BASH version that supports curly-braces expansion.

Resources