Bash scripting wanting to find a size of a directory and if size is greater than x then do a task

Bash scripting wanting to find a size of a directory and if size is greater than x then do a task - linux

I have put the following together with a couple of other articles but it does not seem to be working. What I am trying to do eventually do is for it to check the directory size and then if the directory has new content above a certain total size it will then let me know.
#!/bin/bash
file=private/videos/tv
minimumsize=2
actualsize=$(du -m "$file" | cut -f 1)
if [ $actualsize -ge $minimumsize ]; then
echo "nothing here to see"
else
echo "time to sync"
fi
this is the output:
./sync.sh: line 5: [: too many arguments
time to sync
I am new to bash scripting so thank you in advance.

The error:
[: too many arguments
seems to indicate that either $actualsize or $minimumsize is expanding to more than one argument.
Change your script as follows:
#!/bin/bash
set -x # Add this line.
file=private/videos/tv
minimumsize=2
actualsize=$(du -m "$file" | cut -f 1)
echo "[$actualsize] [$minimumsize]" # Add this line.
if [ $actualsize -ge $minimumsize ]; then
echo "nothing here to see"
else
echo "time to sync"
fi
The set -x will echo commands before attempting to execute them, something which assists greatly with debugging.
The echo "[$actualsize] [$minimumsize]" will assist in trying to establish whether these variables are badly formatted or not, before the attempted comparison.
If you do that, you'll no doubt find that some arguments will result in a lot of output from the du -m command since it descends into subdirectories and gives you multiple lines of output.
If you want a single line of output for all the subdirectories aggregated, you have to use the -s flag as well:
actualsize=$(du -ms "$file" | cut -f 1)
If instead you don't want any of the subdirectories taken into account, you can take a slightly different approach, limiting the depth to one and tallying up all the sizes:
actualsize=$(find . -maxdepth 1 -type f -print0 | xargs -0 ls -al | awk '{s += $6} END {print int(s/1024/1024)}')

Related

Shell - iterate over content of file but do something only the first x lines

So guys,
I need your help trying to identify the fastest and the most "fault" tolerant solution to my problem.
I have a shell script which executes some functions, based on a txt file, in which I have a list of files.
The list can contain from 1 file to X files.
What I would like to do is iterate over the content of the file and execute my scripts for only 4 items out of the file.
Once the functions have been executed for these 4 files, go over to the next 4 .... and keep on doing so until all the files from the list have been "processed".
My code so far is as follows.
#!/bin/bash
number_of_files_in_folder=$(cat list.txt | wc -l)
max_number_of_files_to_process=4
Translated_files=/home/german_translated_files/
while IFS= read -r files
do
while [[ $number_of_files_in_folder -gt 0 ]]; do
i=1
while [[ $i -le $max_number_of_files_to_process ]]; do
my_first_function "$files" & # I execute my translation function for each file, as it can only perform 1 file per execution
find /home/german_translator/ -name '*.logs' -exec mv {} $Translated_files \; # As there will be several files generated, I have them copied to another folder
sed -i "/$files/d" list.txt # We remove the processed file from within our list.txt file.
my_second_function # Without parameters as it will process all the files copied at step 2.
done
# here, I want to have all the files processed and don't stop after the first iteration
done
done < list.txt
Unfortunately, as I am not quite good at shell scripting, I do not know how to structure it so that it won't waste any resources and mostly, to make sure that it "processes" everything from that file.
Do you have any advice on how to achieve what I am trying to achieve?

only 4 items out of the file. Once the functions have been executed for these 4 files, go over to the next 4
Seems to be quite easy with xargs.
your_function() {
echo "Do something with $1 $2 $3 $4"
}
export -f your_function
xargs -d '\n' -n 4 bash -c 'your_function "$#"' _ < list.txt
xargs -d '\n' for each line
-n 4 take for arguments
bash .... - run this command with 4 arguments
_ - the syntax is bash -c <script> $0 $1 $2 etc..., see man bash.
"$#" - forward arguments
export -f your_function - export your function to environment so child bash can pick it up.
I execute my translation function for each file
So you execute your translation function for each file, not for each 4 files. If the "translation function" is really for each file with no inter-file state, consider rather executing 4 processes in parallel with same code and just xargs -P 4.

If you have GNU Parallel it looks something like this:
doit() {
my_first_function "$1"
my_first_function "$2"
my_first_function "$3"
my_first_function "$4"
my_second_function "$1" "$2" "$3" "$4"
}
export -f doit
cat list.txt | parallel -n4 doit

List files greater than 100K in bash

I want to list the files recursively in the HOME directory. I'm trying to write my own script , so I should not use the command find or ls. My script is:
#!/bin/bash
minSize=102400;
printFiles() {
for x in "$1/"*; do
if [ -d "$x" ]; then
printFiles "$x";
else
size=$(wc -c "$x");
if [[ "$size" -gt "$minSize" ]]; then
echo "$size";
fi
fi
done
}
printFiles "/~";
So, the problem here is that when I run this script, the terminal throws Line 11: division by 0 and /home/gandalf/Videos/*: No such file or directory. I have not divided by any number, why I'm getting this error?. And the second one?
Alternatively, I can't use find or ls because I have to display the files one by one asking to the user if he want to see the next file or not. This is possible using the command find or ls or only can be done writing my own function?
Thanks.

size=$(wc -c "$x");
That's the line that is failing. When you run that wc command manually you should be able to see why:
$ wc -c /tmp/out
5 /tmp/out
The output contains not only the file size but also the file name. So you can't use $size with the -gt comparator on the next line. One way to fix that is to change the wc line to use cut (or awk, or sed, etc) to keep just the file size.
size=$(wc -c "$x" | cut -f1 -d " ")
A simpler alternative suggested by #mklement0:
size=$(wc -c < "$x")

How to extract only file name return from diff command?

I am trying to prepare a bash script for sync 2 directories. But I am not able to file name return from diff. everytime it converts to array.
Here is my code :
#!/bin/bash
DIRS1=`diff -r /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/ `
for DIR in $DIRS1
do
echo $DIR
done
And if I run this script I get out put something like this :
Only
in
/opt/lampp/htdocs/scripts/www/:
file1
diff
-r
"/opt/lampp/htdocs/scripts/dev/File
1.txt"
"/opt/lampp/htdocs/scripts/www/File
1.txt"
0a1
>
sa
das
Only
in
/opt/lampp/htdocs/scripts/www/:
File
1.txt~
Only
in
/opt/lampp/htdocs/scripts/www/:
file
2
-
second
Actually I just want to file name where I find the diffrence so I can take perticular action either copy/delete.
Thanks

I don't think diff produces output which can be parsed easily for your purposes. It's possible to solve your problem by iterating over the files in the two directories and running diff on them, using the return value from diff instead (and throwing the diff output away).
The code to do this is a bit long, but here it is:
DIR1=./one # set as required
DIR2=./two # set as required
# Process any files in $DIR1 only, or in both $DIR1 and $DIR2
find $DIR1 -type f -print0 | while read -d $'\0' -r file1; do
relative_path=${file1#${DIR1}/};
file2="$DIR2/$relative_path"
if [[ ! -f "$file2" ]]; then
echo "'$relative_path' in '$DIR1' only"
# Do more stuff here
elif diff -q "$file1" "$file2" >/dev/null; then
echo "'$relative_path' same in '$DIR1' and '$DIR2'"
# Do more stuff here
else
echo "'$relative_path' different between '$DIR1' and '$DIR2'"
# Do more stuff here
fi
done
# Process files in $DIR2 only
find $DIR2 -type f -print0 | while read -d $'\0' -r file2; do
relative_path=${file2#${DIR2}/};
file1="$DIR1/$relative_path"
if [[ ! -f "$file2" ]]; then
echo "'$relative_path' in '$DIR2 only'"
# Do more stuff here
fi
done
This code leverages some tricks to safely handle files which contain spaces, which would be very difficult to get working by parsing diff output. You can find more details on that topic here.
Of course this doesn't do anything regarding files which have the same contents but different names or are located in different directories.
I tested by populating two test directories as follows:
echo "dir one only" > "$DIR1/dir one only.txt"
echo "dir two only" > "$DIR2/dir two only.txt"
echo "in both, same" > $DIR1/"in both, same.txt"
echo "in both, same" > $DIR2/"in both, same.txt"
echo "in both, and different" > $DIR1/"in both, different.txt"
echo "in both, but different" > $DIR2/"in both, different.txt"
My output was:
'dir one only.txt' in './one' only
'in both, different.txt' different between './one' and './two'
'in both, same.txt' same in './one' and './two'

Use -q flag and avoid the for loop:
diff -rq /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/
If you only want the files that differs:
diff -rq /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/ |grep -Po '(?<=Files )\w+'|while read file; do
echo $file
done
-q --brief
Output only whether files differ.
But defitnitely you should check rsync: http://linux.die.net/man/1/rsync

creating Unix script to check for directories and subdirectories in

I'm completing the following for one of my assignments using Korn shell.
For each argument in the argument list (which becomes the current pathname):
Check whether the current pathname is a directory, and if so:
Initialize a variable maxsubdir with the null (empty) string, and
a maxentries variable to 0;
For each entry in the directory check if that entry represents a
directory and if so, find the numbers of entries in that
subdirectory with a pipe consisting of ls -l and wc, and save the
result in a variable named curentries.
Compare curentries with maxentries, and if curentries is greater,
update maxsubdir and maxentries. (--10 points)
When the for cycle for a directory is completed, display (with
echo) the directory name, maxsubdir and maxentries (with appropriate
explanatory text.)
If the pathname in a) is not a directory, display the pathname
and an explanatory text saying that the pathname does not represent
a directory.
Go to the next command line argument (pathname) and repeat 1-7
The execution of the script ends when all pathnames are processed (the while is completed )
This is the code I have for it so far (EDITED):
#!/bin/ksh
directoy=$1
while [ $# -ne 0 ]; do
if [ -d $1 ]; then
maxsubdir=
maxentries=0
for x in $1; do
echo "Checking if $1 represents a directory..\n"
curentries="ls -l | wc"
if [ $curentries > $maxentries ]; then
maxentries=$curentries
maxsubdir=$curentries
fi;
done
echo "The directory structure of $1 is … \n"
echo "Maximum sub directories: \n"
echo "$maxsubdir\n"
echo "Maximum directory entries: \n"
echo "$maxentries"
fi
done
Where do I need to insert the "shift" command since I Unix can only handle a limited number of arguments?
Is my syntax appropriate? Or do I have syntax errors on sort lines?
Script seems to run but does not produce output to screen? Perhaps it's endless?

Have a look here and see if this helps out. Explanations are in the code.
#!/bin/ksh
directory=$1
# check whether the entered path is a directory
if [ -d $1 ];then # yes, it's a directory
maxsubdir=null
maxentries=0
echo "$1 is a directory"
# you are only counting lines, add -l to wc
# also you have to not count the first line. it's returns the size
curentries=`ls -l $1 | wc -l`
echo ${curentries}
fi

You don't.
You do have some errors.
Or perhaps, it never reaches that code?
Your assignment says specifically to use a for loop, and you've implemented a while loop.
I'll get you started:
for directory in $*; do
cd "$directory"
curentries=$(ls -1 | wc -l)
for entry in $(ls -1); do
...
done
done

Linux Bash file Reading Lines and words

I apologize if this is a trivial question. I am learning how to use linux bash and this little task is giving me a headache...
So I need to write a script, let's call it count.sh. I want that: for each file in the working directory, prints the filename, the number of lines, and the number of words to the console:
test.txt 100 1023
someOtherfiles 10 233
So far, I know that the following gives me all the files names in the directory. And thanks for all who helped me, I get this working version:
for f in *; do
echo -n "$f"
cat "$f" | wc -wl
done
I would really appreciate your help! Thanks ahead!
P.s. If you know great resources (links for tutorials) for learning about script and you are willing to share it with me. I think I really need to know these basics. Thanks again!

If you must have the file name as the first field in your output, try this:
for f in *; do
if [ -f "$f" ]; then
echo -n "$f"
cat "$f" | wc -wl
fi
done

for f in *; do
if [[ -f $f ]]; then
echo "$f $(wc -wl < "$f")"
fi
done
[[ -f $f ]] processes only files (excludes subdirectories) and also handles the case where the directory is empty (in which case * is (by default) left unexpanded, i.e. assigned to $f as is).
echo "$f $(wc -wl < "$f")" uses command substitution ($( ... )) to directly include the output from the enclosed command in the output string passed to echo.
Note that the reason that < is used to direct the content of file $f to wc via stdin is that wc would otherwise append the name of the input file to its output (thanks, #R Sahu).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string