bash script: calculate sum size of files - linux

I'm working on Linux and need to calculate the sum size of some files in a directory.
I've written a bash script named cal.sh as below:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
done<`ls -l | grep opencv | awk '{print $5}'`
However, when I executed this script ./cal.sh, I got an error:
./cal.sh: line 6: `ls -l | grep opencv | awk '{print $5}'`: ambiguous redirect
And if I execute it with sh cal.sh, it seems to work but I will get some weird message at the end of output:
25
31
385758: File name too long
Why does sh cal.sh seem to work? Where does File name too long come from?

Alternatively, you can do:
du -cb *opencv* | awk 'END{print $1}'
option -b will display each file in bytes and -c will print the total size.

Ultimately, as other answers will point out, it's not a good idea to parse the output of ls because it may vary between systems. But it's worth knowing why the script doesn't work.
The ambiguous redirect error is because you need quotes around your ls command i.e.:
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
done < "`ls -l | grep opencv | awk '{print $5}'`"
But this still doesn't do what you want. The "<" operator is expecting a filename, which is being defined here as the output of the ls command. But you don't want to read a file, you want to read the output of ls. For that you can use the "<<<" operator, also known as a "here string" i.e.:
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
done <<< "`ls -l | grep opencv | awk '{print $5}'`"
This works as expected, but has some drawbacks. When using a "here string" the command must first execute in full, then store the output of said command in a temporary variable. This can be a problem if the command takes long to execute or has a large output.
IMHO the best and most standard method of iterating a commands output line by line is the following:
ls -l | grep opencv | awk '{print $5} '| while read -r line ; do
echo "line: $line"
done

I would recommend against using that pipeline to get the sizes of the files you want - in general parsing ls is something that you should avoid. Instead, you can just use *opencv* to get the files and stat to print the size:
stat -c %s *opencv*
The format specifier %s prints the size of each file in bytes.
You can pipe this to awk to get the sum:
stat -c %s *opencv* | awk '{ sum += $0 } END { if (sum) print sum }'
The if is there to ensure that no input => no output.

Related

AWK don't use first value of a column

First of all, thank you for your help. I have a problem with awk and using the while read. I have a file separated in two columns that each column has 8 values. My script consist of selecting the second columnn and download 8 different files and decompress them. The problem is that the my script doesn't download the first value of the column.
This is my script
#!/bin/bash
cat $1 | while read line
do
echo "Downloading fasta files from NCBI..."
awk '{print $2}' | wget -i- 2>> log
gzip -d *.gz
done
This is the file I am using
Salmonella_enterica_subsp_enterica_Typhi https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/717/755/GCF_003717755.1_ASM371775v1/GCF_003717755.1_ASM371775v1_translated_cds.faa.gz
Salmonella_enterica_subsp_enterica_Paratyphi_A https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/818/115/GCF_000818115.1_ASM81811v1/GCF_000818115.1_ASM81811v1_translated_cds.faa.gz
Salmonella_enterica_subsp_enterica_Paratyphi_B https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/018/705/GCF_000018705.1_ASM1870v1/GCF_000018705.1_ASM1870v1_translated_cds.faa.gz
Salmonella_enterica_subsp_enterica_Infantis https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/011/182/555/GCA_011182555.2_ASM1118255v2/GCA_011182555.2_ASM1118255v2_translated_cds.faa.gz
Salmonella_enterica_subsp_enterica_Typhimurium_LT2 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/006/945/GCF_000006945.2_ASM694v2/GCF_000006945.2_ASM694v2_translated_cds.faa.gz
Salmonella_enterica_subsp_diarizonae https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/324/755/GCF_003324755.1_ASM332475v1/GCF_003324755.1_ASM332475v1_translated_cds.faa.gz
Salmonella_enterica_subsp_arizonae https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/635/675/GCA_900635675.1_31885_G02/GCA_900635675.1_31885_G02_translated_cds.faa.gz
Salmonella_bongori https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/006/113/225/GCF_006113225.1_ASM611322v2/GCF_006113225.1_ASM611322v2_translated_cds.faa.gz
The problem is not the download. Check the output of
#!/bin/bash
cat "$1" | while read line
do
awk '{print $2}'
done
This also prints only 7 of the 8 urls. When entering the loop, the read reads the first line into the variable line. However, you never use that variable, so the line is lost. Then awk reads the remaining 7 lines from stdin in one go. The loop only runs once.
You probably wanted to write
#!/bin/bash
cat "$1" | while read -r line
do
echo "Downloading fasta files from NCBI..."
echo "$line" | awk '{print $2}' | wget -i- 2>> log
gzip -d *.gz
done
But there is an easier and safer way:
awk '{print $2}' "$1" | wget -i- 2>> log
gzip -d *.gz
Since the command cut is made to select a column, why not simply issue:
#!/bin/bash
for url in $(cut -f2 "$1")
do
wget "$url" >> log
done
gzip -d *.gz

how to not escape space and backslashes in echo and while in bash?

I'm passing two positional args to a script to run, both args are a path, and while in the scenario analyzing the paths, the problem is sometimes there is some path like: m i sc . . . . .. . . it has dots and spaces, and sometimes even we have a backslash in dir names.
It is so tried to get arguments via two procedures, directly and via at sign.
SOURCE_ARG=$1
DESTINATION_ARG=$2
and
ARG_COUNT=0
for POSITIONAL_ARGUMENTS in "${#}"
do
((ARG_COUNT++))
ARGUMENT_ARRAY[$ARG_COUNT]=$POSITIONAL_ARGUMENTS
done
In the loop, I iterate through the result of commands that have forwarded to them.
while IFS= read -r dir
do
echo "${ARGUMENT_ARRAY[1]}"
echo "${dir}"
while IFS= read -r item
do
# do some stuff
done < <(ls -A "$dir"/)
done < <(du -hP "$SOURCE_ARG" | awk '{$1=""; print $0}' | grep -v "^.$" | sed "s/^ //g")
when i use echo "${ARGUMENT_ARRAY[1]}" i get the same path as i need to check but when using loop iteration varible as dir in here ->echo "${dir}" i got all the spaces escaped, since other commands for that path could not do their jobs.
What I'm Asking for is that how can I get the output of $dir within the loop and as like as echo "${ARGUMENT_ARRAY[1]}" that i mentioned above(input with all spaces and backslashes)
Thanks to #Barmar in comments.
The only reason that filenames are without escapes (i.e. you see directories with no special character or special characters have been escaped) is because du is printing the filenames with escapes, so $dir variable would have escaped once and special characters are no longer available for the other loop iteration in my problem.
Now that we know the problem was raised by using du in my script:
while IFS= read -r dir
# do sth
done < <(du -hP "$SOURCE_ARG" | awk '{$1=""; print $0}' | grep -v "^.$" | sed "s/^ //g")
We can change the du to find and the problem is solved:
while IFS= read -r dir
# do sth
done < <(find "$SOURCE_ARG" -type d –)
PS 1:
Another problem raised as I wanted to print the lines to check them if they are ok or not (i.e. while debugging application) was with echo.
So be sure to try printf "%s\n" "$dir" instead of echo, as some versions of echo process escape sequences.
echo "${dir}"
printf "%s\n" "$dir"
PS 2:
Also If a filename has more than one space in a row, The way I used awk, was collapsing them into a single space.
awk '{$1=""; print $0}' | grep -v "^.$" | sed "s/^ //g"

Getting error "cat: write error: Broken pipe" only when running bash script non-interactively

I wrote a bash script where I define a variable like this:
var=$(cat $file_path | head -n $var2 | tail -n 1 | cut -f1)
Where $file_path simply contains the path to a file and $var2 is an int, e.g., 1 or 2. The variable is therefore assigned the value of the first field of line number var2 of the file.
It works perfectly fine when I run this from the command line. However, when running the script containing this command, I get the error
cat: write error: Broken pipe
Any idea why that is?
There's no need to use cat, since head takes a filename argument.
var=$(head -n $var2 $file_path | tail -n 1 | cut -f1)
Actually, there's no need to use any of those commands.
var=$(awk -v line=$var2 'NR == line { print $1; exit }' $file_path)

Reformatting name / content pairs from grep in a bash script

I'm attempting to create a bash script that will grep a single file for two separate pieces of data, and print them to stdout.
So far this is what I have:
#!/bin/sh
cd /my/filePath/to/directory
APP=`grep -r --include "inputs.conf" "\[" | grep -oP '^[^\/]+'`
INPUT=`grep -r --include "inputs.conf" "\[" | grep -oP '\[[^\]]+'`
for i in $APP
do
{cd /opt/splunk/etc/deployment-apps
INPUT=`grep -r --include "inputs.conf" "\[" | grep -oP '\[[^\]]+'`
echo -n "$i | $INPUT"}
done
echo "";
exit
Which gives me an output printing the entire output of the first command (which is about 200 lines), then a |, then the other results from the second command. I was thinking I could create an array to do this, however I'm still learning bash.
This is an output example from the command without piping to grep:
TA-XA6x-Server/local/inputs.conf:[perfmon://Processor]
There are 200+ of these in a single execution, and I was looking to have the format be printed as something like this
app="TA-XA6x-Server/local/inputs.conf:" | input="[perfmon://Processor]"
There are essentially two pieces of information I'm attempting to stitch together:
the file path to the file
the contents of the file itself (the input)
Here is an example of the file path:
/opt/splunk/etc/deployment-apps/TA-XA6x-Server/local/inputs.conf
and this is an example of the inputs.conf file contents:
[perfmon://TCPv4]
The easy, mostly-working-ish approach is something like this:
#!/bin/bash
while IFS=: read -r name content; do
printf 'app="%s" | input="%s"\n' "$name" "$content"
done < <(grep -r --include "inputs.conf" "\[")
If you need to work reliably with all possible filenames (including names with colons or newlines) and have GNU grep available, consider the --null argument to grep and adjusting the read usage appropriately:
#!/bin/bash
while IFS= read -r -d '' name && IFS= read -r content; do
printf 'app="%s" | input="%s"\n' "$name" "$content"
done < <(grep -r --null --include "inputs.conf" "\[")

How to print the result of the first part of the pipe?

I have the following grep:
grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' /home/programs/hello_word.sh
Wich displays the string between PROGRAM( and ):
RECTONTER
Then, I need to know if these string extracted is contained in a file, so:
grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' /home/programs/hello_word.sh | xargs -I % grep -e % /home/leherad/pgm_currentdate
File content:
RECTONTER
CORASFE
RENTOASD
UBICARP
If its found, returns the line of /home/leherad/pgm_currentdate, but I want to print the line extracted in the first grep (RECTONTER). If not found, then wouldn't return nothing.
There is a simple way to do this, or I should not complicate and would be better build a script and save the first grep in a variable?
You can store it on a variable first:
read -r FIRST < <(exec grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' /home/programs/hello_word.sh) && grep -e "$FIRST" /home/leherad/pgm_currentdate
Update 01
#!/bin/bash
shopt -s nullglob
for FILE in /home/programs/*; do
read -r FIRST < <(exec grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' "$FILE") && grep -e "$FIRST" /home/leherad/pgm_currentdate && echo "$FIRST"
done
I think a straightforward way to solve this is to use a function.
Also, your grep pattern will match shell comments, which could cause unexpected behavior in your xargs command when there are more than one matches; you might want to take steps to only grab the first match. It's hard to say without actually seeing the input files, so I'm guessing this is either ok or comments are actually the expected place for your target pattern.
Anyway, here's my best guess at a function that would work for you.
get_program() {
local filename="$1"
local program="$( grep -m1 -Po '(?<=PROGRAM\()[^\)]+(?=\))' "$filename" )"
if grep -q -e "$program" /home/leherad/pgm_currentdate; then
echo $program
grep -e "$program" /home/leherad/pgm_currentdate
fi
}
get_program /home/programs/hello_word.sh

Resources