List files greater than 100K in bash

List files greater than 100K in bash - linux

I want to list the files recursively in the HOME directory. I'm trying to write my own script , so I should not use the command find or ls. My script is:
#!/bin/bash
minSize=102400;
printFiles() {
for x in "$1/"*; do
if [ -d "$x" ]; then
printFiles "$x";
else
size=$(wc -c "$x");
if [[ "$size" -gt "$minSize" ]]; then
echo "$size";
fi
fi
done
}
printFiles "/~";
So, the problem here is that when I run this script, the terminal throws Line 11: division by 0 and /home/gandalf/Videos/*: No such file or directory. I have not divided by any number, why I'm getting this error?. And the second one?
Alternatively, I can't use find or ls because I have to display the files one by one asking to the user if he want to see the next file or not. This is possible using the command find or ls or only can be done writing my own function?
Thanks.

size=$(wc -c "$x");
That's the line that is failing. When you run that wc command manually you should be able to see why:
$ wc -c /tmp/out
5 /tmp/out
The output contains not only the file size but also the file name. So you can't use $size with the -gt comparator on the next line. One way to fix that is to change the wc line to use cut (or awk, or sed, etc) to keep just the file size.
size=$(wc -c "$x" | cut -f1 -d " ")
A simpler alternative suggested by #mklement0:
size=$(wc -c < "$x")

Related

how to move a file after grep command when there is no return result

I wanna move a file after the grep command but as I execute my script, I noticed that there are no results coming back. regardless of that, I want to move the file/s to another directory.
this is what I've been doing:
for file in *.sup
do
grep -iq "$file" '' /desktop/list/varlogs.txt || mv "$file" /desktop/first;
done
but I am getting this error:
mv: 0653-401 Cannot rename first /desktop/first/first
suggestions would be very helpful

I am not sure what the two single quotes are for in between ..."$file" '' /desktop.... With them there, grep is looking also for $file in a file called '', so grep will throw the grep: : No such file or directory error with that there.
Also pay attention to the behavior change of adding the -q or --quiet flags, as it affects the returned value of grep and will impact whether the command to the || is run or not (see man grep for more).
I can't make out exactly what you are trying to do, but you can add a couple statements to help figure out what is going on. You could run your script with bash -x ./myscript.sh to display everything that runs as it runs, or add set -x before and set +x after the for loop in the script to show what is happening.
I added some debugging to your script and changed th || to an if/then statement to expose what is happening. Try this and see if you can find where things are going awry.
echo -e "============\nBEFORE:\n============"
echo -e "\n## The files in current dir '$(pwd)' are: ##\n$(ls)"
echo -e "\n## The files in '/desktop/first' are: ##\n$(ls /desktop/first)"
echo -e "\n## Looking for '.sup' files in '$(pwd)' ##"
for file in *.sup; do
echo -e "\n## == look for '${file}' in '/desktop/list/varlogs.txt' == ##"
# let's change this to an if/else
# the || means try the left command for success, or try the right one
# grep -iq "$file" '' /desktop/list/varlogs.txt || mv -v "$file" /desktop/first
# based on `man grep`: EXIT STATUS
# Normally the exit status is 0 if a line is selected,
# 1 if no lines were selected, and 2 if an error occurred.
# However, if the -q or --quiet or --silent is used and a line
# is selected, the exit status is 0 even if an error occurred.
# note that --ignore-case and --quiet are long versions of -i and -q/ -iq
if grep --ignore-case --quiet "${file}" '' /desktop/list/varlogs.txt; then
echo -e "\n'${file}' found in '/desktop/list/varlogs.txt'"
else
echo -e "\n'${file}' not found in '/desktop/list/varlogs.txt'"
echo -e "\nmove '${file}' to '/desktop/first'"
mv --verbose "${file}" /desktop/first
fi
done
echo -e "\n============\nAFTER:\n============"
echo -e "\n## The files in current dir '$(pwd)' are: ##\n$(ls)"
echo -e "\n## The files in '/desktop/first' are: ##\n$(ls /desktop/first)"
|| means try the first command, and if it is not successful (i.e. does not return 0), then do the next command. In your case, it appears you are looking in /desktop/list/varlogs.txt to see if any .sup files in the current directory match any in the varlogs file and if not, then move them to the /desktop/first/ directory. If matches were found, leave them in the current dir. (according to the logic you have currently)
mv --verbose explain what is being done
echo -e enables interpretation of backslash escapes
set -x shows the commands that are being run/ debugging
Please respond and clarify if anything is different. I am trying to raise in the ranks to be more helpful so I would appreciate comments, and upvotes if this was helpful.

Suggesting to avoid repeated scans of /desktop/list/varlogs.txt, and remove duplicats:
mv $(grep -o -f <<<$(ls -1 *.sup) /desktop/list/varlogs.txt|sort|uniq) /desktop/first
Suggesting to test step 1. in explanation below to list the files to be moved.
Explanation
1. grep -o -f <<<$(ls -1 *.sup) /desktop/list/varlogs.txt| sort| uniq
List all the files selected in ls -1 *.sup mentioned in /desktop/list/varlogs.txt in a single scan.
-o list only matched filenames.
<<<$(ls -1 *.sup) prepare a temporary redirected input file containing all the pattern match strings. From the output of ls -1 *.sup
|sort|uniq Than, sort the list and remove duplicates (we can move the file only once).
2. mv <files-list-output-from-step-1> /desktop/first
Move all the files found in step 1 to directory /desktop/first

Shell - iterate over content of file but do something only the first x lines

So guys,
I need your help trying to identify the fastest and the most "fault" tolerant solution to my problem.
I have a shell script which executes some functions, based on a txt file, in which I have a list of files.
The list can contain from 1 file to X files.
What I would like to do is iterate over the content of the file and execute my scripts for only 4 items out of the file.
Once the functions have been executed for these 4 files, go over to the next 4 .... and keep on doing so until all the files from the list have been "processed".
My code so far is as follows.
#!/bin/bash
number_of_files_in_folder=$(cat list.txt | wc -l)
max_number_of_files_to_process=4
Translated_files=/home/german_translated_files/
while IFS= read -r files
do
while [[ $number_of_files_in_folder -gt 0 ]]; do
i=1
while [[ $i -le $max_number_of_files_to_process ]]; do
my_first_function "$files" & # I execute my translation function for each file, as it can only perform 1 file per execution
find /home/german_translator/ -name '*.logs' -exec mv {} $Translated_files \; # As there will be several files generated, I have them copied to another folder
sed -i "/$files/d" list.txt # We remove the processed file from within our list.txt file.
my_second_function # Without parameters as it will process all the files copied at step 2.
done
# here, I want to have all the files processed and don't stop after the first iteration
done
done < list.txt
Unfortunately, as I am not quite good at shell scripting, I do not know how to structure it so that it won't waste any resources and mostly, to make sure that it "processes" everything from that file.
Do you have any advice on how to achieve what I am trying to achieve?

only 4 items out of the file. Once the functions have been executed for these 4 files, go over to the next 4
Seems to be quite easy with xargs.
your_function() {
echo "Do something with $1 $2 $3 $4"
}
export -f your_function
xargs -d '\n' -n 4 bash -c 'your_function "$#"' _ < list.txt
xargs -d '\n' for each line
-n 4 take for arguments
bash .... - run this command with 4 arguments
_ - the syntax is bash -c <script> $0 $1 $2 etc..., see man bash.
"$#" - forward arguments
export -f your_function - export your function to environment so child bash can pick it up.
I execute my translation function for each file
So you execute your translation function for each file, not for each 4 files. If the "translation function" is really for each file with no inter-file state, consider rather executing 4 processes in parallel with same code and just xargs -P 4.

If you have GNU Parallel it looks something like this:
doit() {
my_first_function "$1"
my_first_function "$2"
my_first_function "$3"
my_first_function "$4"
my_second_function "$1" "$2" "$3" "$4"
}
export -f doit
cat list.txt | parallel -n4 doit

> and < difference Bash

I have to test if pathname is a regular file and if it's length is greater 50 bytes , for this reason I do like this:
if [[ -f $path && `wc -c < $path` -gt 50 ]]; then ......
and it works , but , for curiosity , I tried to do also like this:
if [[ -f $path && `$path > wc -c` -gt 50 ]]; then ......
but it doesn't work and I don't understand why.
For this reason I ask you the difference between < and > operator in Bash.

< is "read from" -- redirecting input, while > is "write to" -- redirecting output. Both are followed by the name of the file to use. So
wc -c < $path
runs the wc command, reading from the file $path
$path > wc -c
runs the $path command, writing to the file wc

These operators are not commutative (position aren't swappable).
wc -c < $path means launch wc and use the file at $path as the input.
$path > wc -c means launch the executable at $path (which in your case $path isn't an executable) and send it's output to the file at wc.
As you can see the second one doesn't really make sense. Always make the executable the first operand (argument), and the file you are reading from or writing to the second operand.

< instructs the shell to take the contents of the file on the right side of the operator and provide them as input to the command on the left side.
> instructs the shell to take the output of the command on the left side and store it in the file named on the right side.
Accordingly, the command wc -c < $path is equivalent to cat $path | wc -c. $path > wc -c would mean "run the command $path and store the output in a file named wc (the -c would be discarded)."

Bash scripting wanting to find a size of a directory and if size is greater than x then do a task

I have put the following together with a couple of other articles but it does not seem to be working. What I am trying to do eventually do is for it to check the directory size and then if the directory has new content above a certain total size it will then let me know.
#!/bin/bash
file=private/videos/tv
minimumsize=2
actualsize=$(du -m "$file" | cut -f 1)
if [ $actualsize -ge $minimumsize ]; then
echo "nothing here to see"
else
echo "time to sync"
fi
this is the output:
./sync.sh: line 5: [: too many arguments
time to sync
I am new to bash scripting so thank you in advance.

The error:
[: too many arguments
seems to indicate that either $actualsize or $minimumsize is expanding to more than one argument.
Change your script as follows:
#!/bin/bash
set -x # Add this line.
file=private/videos/tv
minimumsize=2
actualsize=$(du -m "$file" | cut -f 1)
echo "[$actualsize] [$minimumsize]" # Add this line.
if [ $actualsize -ge $minimumsize ]; then
echo "nothing here to see"
else
echo "time to sync"
fi
The set -x will echo commands before attempting to execute them, something which assists greatly with debugging.
The echo "[$actualsize] [$minimumsize]" will assist in trying to establish whether these variables are badly formatted or not, before the attempted comparison.
If you do that, you'll no doubt find that some arguments will result in a lot of output from the du -m command since it descends into subdirectories and gives you multiple lines of output.
If you want a single line of output for all the subdirectories aggregated, you have to use the -s flag as well:
actualsize=$(du -ms "$file" | cut -f 1)
If instead you don't want any of the subdirectories taken into account, you can take a slightly different approach, limiting the depth to one and tallying up all the sizes:
actualsize=$(find . -maxdepth 1 -type f -print0 | xargs -0 ls -al | awk '{s += $6} END {print int(s/1024/1024)}')

How can I list the path of the output of this script?

How can I list the path of the output of this script?
This is my command:
(ls -d */ ); echo -n $i; ls -R $i | grep "wp-config.php" ;
This is my current output:
/wp-config.php

It seems you want find the path to a file called "wp-config.php".
Does the following help?
find $PWD -name 'wp-config.php'

Your script is kind of confusing: Why does ls -d */ does not show any output? What's the value of $i? Your problem in fact seems to be that ls -R lists the contents of all subdirectories but doesn't give you full paths for their contents.
Well, find is the best tool for that, but you can simulate it in this case via a script like this:
#!/bin/bash
searchFor=wp-config.php
startDir=${1:-.}
lsSubDir() {
local actDir="$1"
for entry in $(ls "$actDir"); do
if [ -d "$actDir/$entry" ]; then
lsSubDir "$actDir/$entry"
else
[ $entry = $searchFor ] && echo "$actDir/$entry"
fi
done
}
lsSubDir $startDir
Save it in a file like findSimulator, make it executable and call it with the directory where to start searching as parameter.
Be warned: this script is not very efficient, it may stop on large subdirectory-trees because of recursion. I would strongly recommend the solution using find.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string