Script which subtract two file sizes - linux

I would like to subtract sizes of two files. I found location of that files and then I used command:
du -h /bin/ip | cut -d "K" -f1
I got 508 and I wanted to create variable
x=$((du -h /bin/ip | cut -d "K" -f1))
but at the result I got
"-bash: du -h /bin/ip | cut -d 'K' -f1: division by 0 (error token is "bin/ip | cut -d 'K' -f1")"
What did I do wrong? How can i put this value in variable?

What did I do wrong?
You used arithmetic expansion $(( ... )) instead of a command substitution $( ... ). As a result shell interpreted /bin as / as division and bin as 0 (because there is no variable named bin) and tried to divide by 0.
How can i put this value in variable?
Use a command substitution:
x=$(du -h /bin/ip | cut -d "K" -f1)
But it would be way more reliable to use stat for collecting information about files:
x=$(stat -c %s /bin/ip)
To substract two file sizes, you can again use command substitutions to get the size, but use arithmetic expansion to calculate the difference.
difference=$(( $(stat -c %s file1) - $(stat -c %s file2) ))

Perl to the rescue!
perl -le 'print((-s "file1.txt") - (-s "file2.txt"))'
-l adds newline to print
-s returns a file size (see -x)

Related

How to multiply float value with variable(command is stored in this variable)?

res=$(echo `sed '1d'| cut -d ';' -f3 |sort -nrk3 | head -1`)
a=0.15
echo `expr $a \* $res`
I have this piece of code and this will print 50000 when echo "$res" is executed and I want to multiply this answer with a float number say 0.10. How can I perform this operation in bash?
I've tried using the expr command but isn't working, giving me the error expr: non-integer argument. Is there any other way so that I'll get the result as 5000.
Suppose the input provided will be in the following format:
Empld;EmpName:Salary
1231;Tushar;20000
5671;Dick;35000
7712;Harry;50000
8712;Reenee;25000
4444;Bakul;50000
this works for me
a=1.5
res=200
RES=$(echo "scale=4; $a*$res" | bc)
echo $RES
updated after question update lets say suppose contents.txt file has this content.
Empld;EmpName:Salary
1231;Tushar;20000
5671;Dick;35000
7712;Harry;50000
8712;Reenee;25000
4444;Bakul;50000
in order to get 5000 as you mentioned I created this script stack_ans.sh
#!/bin/sh
res=$(less /path/to/contents.txt| echo `sed '1d'| cut -d ';' -f3 |sort -nrk3 | head -1`)
#echo $res #returns 50000
a=0.10 #to multiply with 0.10 as you mentioned
RESULT=$(echo "scale=4; $a*$res" | bc)
echo $RESULT #returns 5000
command to set permission and execute the script
chmod 700 stack_ans.sh #to execute
./stack_ans.sh
script returns 5000
note if script and the file which has all entries are in same directory then you don't need to mention its complete path.
note this version is without using awk, same thing can also be achieved by awk like in this example https://stackoverflow.com/a/41695682/13126651

Using "for" to align paired end sequences

I have a folder with many paired end files (1.1.fq 1.2.fq 2.1.fq 2.2.fq ...) I want to use the "for" to do the aligment for each pair (*.1fq *2.fq) and generate 2 outputs *.stats.txt and *.sam.
I wrote the following command:
for x in *.fq ; do
~/Pedro_Dias/Mamão/Single_end/novocraft/novoalign -d cpapaya.novoIndex -f demultiplex-fq/$x *.1.fq demultiplex-fq/$x *.2.fq -x 3 -H -a -o SAM 2> demultiplex-sam/$x *.stats.txt > demultiplex-sam/$x *.sam;
done
The code return the error:
demultiplex-sam/demultiplex-fq/98.1.fq*.stats.txt:No file or directory
P.s. My files were in demultiplex-fq folder and the output must go to the demultiplex-sam folder. I'm working in a folder that contains the demultiplex-fq demultiplex-sam folders.
You should just loop over one file in the pairs. Then replace .1.fq with .2.fq to get the other file in that pair.
The wildcards need to include the directory name, and then you have to replace the directory when generating the output filenames.
for x in demultiplex-fq/*.1.fq
do
y=${x/.1.fq/.2.fq}
stats=${x/demultiplex-fq/demultiplex/sam}.stats.txt
sam=${x/demultiplex-fq/demultiplex/sam}.sam
~/Pedro_Dias/Mamão/Single_end/novocraft/novoalign -d cpapaya.novoIndex -f "$x" "$y" -x 3 -H -a -o SAM 2> "$stats" > "$sam"
You don't use wildcards in the command, you just use the $x and $y variables.
I used that code and works:
for x in $( ls -1 dem*/*.fq | rev | cut -d . -f 3- | rev | sort -u ) ; do ~/Pedro_Dias/Mamão/Single_end/novocraft/novoalign -d cpapaya.novoIndex -f $x.1.fq $x.2.fq -x 3 -H -a -o SAM 2> ./bams/$(echo $x | tr / _).stats.txtx > ./bams/$(echo $x | tr / _).sam; done

Piping grep to cut

This line:
echo $(grep Uid /proc/1/status) | cut -d ' ' -f 2
Produces output:
0
This line:
grep Uid /proc/1/status | cut -d ' ' -f 2
Produces output:
Uid: 0 0 0 0
My goal was the first output. My question is, why the second command does not produce the output I expected. Why am I required to echo it?
One way to do this is to change the Output Field Separator or OFS variable in the bash shell
IFSOLD="$IFS" # IFS for Internal field separator
IFS=$'\t'
grep 'Uid' /proc/1/status | cut -f 2
0 # Your result
IFS="$IFSOLD"
or the easy way
grep 'Uid' /proc/1/status | cut -d $'\t' -f 2
Note : By the way tab is the default delim for cut as pointed out [ here ]
Use awk
awk '/Uid/ { print $2; }' /proc/1/status
You should almost never need to write something like echo $(...) - it's almost equivalent to calling ... directly. Try echo "$(...)" (which you should always use) instead, and you'll see it behaves like ....
The reason is because when the $() command substitution is invoked without quotes the resulting string is split by Bash into separate arguments before being passed to echo, and echo outputs each argument separated by a single space, regardless of the whitespace generated by the command substitution (in your case tabs).
As sjsam suggested, if you want to cut tab-delimited output, just specify tabs as the delimiter instead of spaces:
cut -d $'\t' -f 2
grep Uid /proc/1/status |sed -r “s/\s+/ /g” | awk ‘{print $3}’
Output
0

Bash: extract (percent) number of variable length from a string

I want to write a little progress bar using a bash script.
To generate the progress bar I have to extract the progress from a log file.
The content of such a file (here run.log) looks like this:
Time to finish 2d 15h, 42.5% completed, time steps left 231856
I'm now intersted to isolate the 42.5%. The problem is now that the length of this digit is variable as well as the position of the number (e.g. 'time to finish' might content only one number like 23h or 59min).
I tried it over the position via
echo "$(tail -1 run.log | awk '{print $6}'| sed -e 's/[%]//g')"
which fails for short 'Time to finish' as well as via the %-sign
echo "$(tail -1 run.log | egrep -o '[0-9][0-9].[0-9]%')"
Here is works only for digits >= 10%.
Any solution for a more variable nuumber extraction?
======================================================
Update: Here is now the full script for the progress bar:
#!/bin/bash
# extract % complete from run.log
perc="$(tail -1 run.log | grep -o '[^ ]*%')"
# convert perc to int
pint="${perc/.*}"
# number of # to plot
nums="$(echo "$pint /2" | bc)"
# output
echo -e ""
echo -e " completed: $perc"
echo -ne " "
for i in $(seq $nums); do echo -n '#'; done
echo -e ""
echo -e " |----.----|----.----|----.----|----.----|----.----|"
echo -e " 0% 20% 40% 60% 80% 100%"
echo -e ""
tail -1 run.log
echo -e ""
Thanks for your help, guys!
based on your example
grep -o '[^ ]*%'
should give what you want.
You can extract % from below command:
tail -n 1 run.log | grep -o -P '[0-9]*(\.[0-9]*)?(?=%)'
Explanation:
grep options:
-o : Print only matching string.
-P : Use perl style regex
regex parts:
[0-9]* : Match any number, repeated any number of times.
(\.[0-9]*)? : Match decimal point, followed by any number of digits.
? at the end of it => optional. (this is to take care of numbers without fraction part.)
(?=%) :The regex before this must be followed by a % sign. (search for "positive look-ahead" for more details.)
You should be able to isolate the progress after the first comma (,) in your file. ie.you want the characters between , and %
There are many ways to achieve your goal. I would prefer using cut several times as it is easy to read.
cut -f1 -d'%' | cut -f2 -d',' | cut -f2 -d' '
After first cut:
Time to finish 2d 15h, 42.5
After second (note space):
42.5
And the last one just to get rid of space, the final result:
42.5

Finding the longest word in a text file

I am trying to make a a simple script of finding the largest word and its number/length in a text file using bash. I know when I use awk its simple and straight forward but I want to try and use this method...lets say I know if a=wmememememe and if I want to find the length I can use echo {#a} its word I would echo ${a}. But I want to apply it on this below
for i in `cat so.txt` do
Where so.txt contains words, I hope it makes sense.
bash one liner.
sed 's/ /\n/g' YOUR_FILENAME | sort | uniq | awk '{print length, $0}' | sort -nr | head -n 1
read file and split the words (via sed)
remove duplicates (via sort | uniq)
prefix each word with it's length (awk)
sort the list by the word length
print the single word with greatest length.
yes this will be slower than some of the above solutions, but it also doesn't require remembering the semantics of bash for loops.
Normally, you'd want to use a while read loop instead of for i in $(cat), but since you want all the words to be split, in this case it would work out OK.
#!/bin/bash
longest=0
for word in $(<so.txt)
do
len=${#word}
if (( len > longest ))
then
longest=$len
longword=$word
fi
done
printf 'The longest word is %s and its length is %d.\n' "$longword" "$longest"
Another solution:
for item in $(cat "$infile"); do
length[${#item}]=$item # use word length as index
done
maxword=${length[#]: -1} # select last array element
printf "longest word '%s', length %d" ${maxword} ${#maxword}
longest=""
for word in $(cat so.txt); do
if [ ${#word} -gt ${#longest} ]; then
longest=$word
fi
done
echo $longest
awk script:
#!/usr/bin/awk -f
# Initialize two variables
BEGIN {
maxlength=0;
maxword=0
}
# Loop through each word on the line
{
for(i=1;i<=NF;i++)
# Assign the maxlength variable if length of word found is greater. Also, assign
# the word to maxword variable.
if (length($i)>maxlength)
{
maxlength=length($i);
maxword=$i;
}
}
# Print out the maxword and the maxlength
END {
print maxword,maxlength;
}
Textfile:
[jaypal:~/Temp] cat textfile
AWK utility is a data_extraction and reporting tool that uses a data-driven scripting language
consisting of a set of actions to be taken against textual data (either in files or data streams)
for the purpose of producing formatted reports.
The language used by awk extensively uses the string datatype,
associative arrays (that is, arrays indexed by key strings), and regular expressions.
Test:
[jaypal:~/Temp] ./script.awk textfile
data_extraction 15
Relatively speedy bash function using no external utils:
# Usage: longcount < textfile
longcount ()
{
declare -a c;
while read x; do
c[${#x}]="$x";
done;
echo ${#c[#]} "${c[${#c[#]}]}"
}
Example:
longcount < /usr/share/dict/words
Output:
23 electroencephalograph's
'Modified POSIX shell version of jimis' xargs-based
answer; still very slow, takes two or three minutes:
tr "'" '_' < /usr/share/dict/words |
xargs -P$(nproc) -n1 -i sh -c 'set -- {} ; echo ${#1} "$1"' |
sort -n | tail | tr '_' "'"
Note the leading and trailing tr bit to get around GNU xargs
difficulty with single quotes.
for i in $(cat so.txt); do echo ${#i}; done | paste - so.txt | sort -n | tail -1
Slow because of the gazillion of forks, but pure shell, does not require awk or special bash features:
$ cat /usr/share/dict/words | \
xargs -n1 -I '{}' -d '\n' sh -c 'echo `echo -n "{}" | wc -c` "{}"' | \
sort -n | tail
23 Pseudolamellibranchiata
23 pseudolamellibranchiate
23 scientificogeographical
23 thymolsulphonephthalein
23 transubstantiationalist
24 formaldehydesulphoxylate
24 pathologicopsychological
24 scientificophilosophical
24 tetraiodophenolphthalein
24 thyroparathyroidectomize
You can easily parallelize, e.g. to 4 CPUs by providing -P4 to xargs.
EDIT: modified to work with the single quotes that some dictionaries have. Now it requires GNU xargs because of -d argument.
EDIT2: for the fun of it, here is another version that handles all kinds of special characters, but requires the -0 option to xargs. I also added -P4 to compute on 4 cores:
cat /usr/share/dict/words | tr '\n' '\0' | \
xargs -0 -I {} -n1 -P4 sh -c 'echo ${#1} "$1"' wordcount {} | \
sort -n | tail

Resources