How to print the results in tab separted using linux commands? - linux

for i in `cat /home/htvam/muthu/mbovis/uk_fastq/trim_files/sub_list.txt`
do
for j in `cat /home/htvam/muthu/mbovis/uk_fastq/trim_files/fn_pos.txt`
do
echo "$i\t\c"
echo "$j\t\c"
echo "$(samtools view "$i"_dup_mapped_sorted.bam NC_000962.3:$j-$j | awk '{if($2<1023) print}' | wc -l)\t\c"
echo "$(samtools view "$i"_dup_mapped_sorted.bam NC_000962.3:$j-$j | awk '{if($2>1023) print}' | wc -l)"
done
done
sub_list.txt contains
ERR125598
ERR125599
fn_pos.txt contains
14401
62049
71336
4386228
4394265
4395387
4395804
It output results like:
But I need the next sample "i" in a list should print the tab instead of newline like

Related

bash count sequential files

I'm pretty new to bash scripting so some of the syntaxes may not be optimal. Please do point them out if you see one.
I have files in a directory named sequentially.
Example: prob01_01 prob01_03 prob01_07 prob02_01 prob02_03 ....
I am trying to have the script iterate through the current directory and count how many extensions each problem has. Then print the pre-extension name then count
Sample output for above would be:
prob01 3
prob02 2
This is my code:
#!/bin/bash
temp=$(mktemp)
element=''
count=0
for i in *
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1"
else
echo $element $count >> temp
element=$current
count=1
fi
done
echo 'heres the temp:'
cat temp
rm 'temp'
The Problem:
Current output:
prob1 3
Desired output:
prob1 3
prob2 2
The last count isn't appended because it's not seeing a different element after it
My Guess on possible solutions:
Have the last append occur at the end of the for loop?
Your code has 2 problems.
The first problem doesn't answer your question. You make a temporary file, the filename is stored in $temp. You should use that one, and not the file with the fixed name temp.
The problem is that you only write results when you see a new problem/filename. The last one will not be printed.
Fixing only these problems will result in
results() {
if (( count == 0 )); then
return
fi
echo $element $count >> "${temp}"
}
temp=$(mktemp)
element=''
count=0
for i in prob*
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1" # Better is using ((count++))
else
results
element=$current
count=1
fi
done
results
echo 'heres the temp:'
cat "${temp}"
rm "${temp}"
You can do without the script with
ls prob* | cut -d"_" -f1 | sort | uniq -c
When you want the have the output displayed as given, you need one more step.
ls prob* | cut -d"_" -f1 | sort | uniq -c | awk '{print $2 " " $1}'
You may use printf + awk solution:
printf '%s\n' *_* | awk -F_ '{a[$1]++} END{for (i in a) print i, a[i]}'
prob01 3
prob02 2
We use printf to print each file that has at least one _
We use awk to get a count of each file's first element delimited by _ by using an associative array.
I would do it like this:
$ ls | awk -F_ '{print $1}' | sort | uniq -c | awk '{print $2 " " $1}'
prob01 3
prob02 2

Increment variable when matched awk from tail

I'm monitoring from an actively written to file:
My current solution is:
ws_trans=0
sc_trans=0
tail -F /var/log/file.log | \
while read LINE
echo $LINE | grep -q -e "enterpriseID:"
if [ $? = 0 ]
then
((ws_trans++))
fi
echo $LINE | grep -q -e "sc_ID:"
if [ $? = 0 ]
then
((sc_trans++))
fi
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done
However when attempting to do this with AWK I don't get the output - the $ws_trans and $sc_trans remains 0
ws_trans=0
sc_trans=0
tail -F /var/log/file.log | \
while read LINE
echo $LINE | awk '/enterpriseID:/ {++ws_trans} END {print | ws_trans}'
echo $LINE | awk '/sc_ID:/ {++sc_trans} END {print | sc_trans}'
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done
Attempting to do this to reduce load. I understand that AWK doesn't deal with bash variables, and it can get quite confusing, but the only reference I found is a non tail application of AWK.
How can I assign the AWK Variable to the bash ws_trans and sc_trans? Is there a better solution? (There are other search terms being monitored.)
You need to pass the variables using the option -v, for example:
$ var=0
$ printf %d\\n {1..10} | awk -v awk_var=${var} '{++awk_var} {print awk_var}'
To set the variable "back" you could use declare, for example:
$ declare $(printf %d\\n {1..10} | awk -v awk_var=${var} '{++awk_var} END {print "var=" awk_var}')
$ echo $var
$ 10
Your script could be rewritten like this:
ws_trans=0
sc_trans=0
tail -F /var/log/system.log |
while read LINE
do
declare $(echo $LINE | awk -v ws=${ws_trans} '/enterpriseID:/ {++ws} END {print "ws_trans="ws}')
declare $(echo $LINE | awk -v sc=${sc_trans} '/sc_ID:/ {++sc} END {print "sc_trans="sc}')
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done

Linux usernames /etc/passwd listing

I want to print the longest and shortest username found in /etc/passwd. If I run the code below it works fine for the shortest (head -1), but doesn't run for (sort -n |tail -1 | awk '{print $2}). Can anyone help me figure out what's wrong?
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
sort -n |tail -1 | awk '{print $2}'
Here the issue is:
Piping finishes with the first sort -n |head -1 | awk '{print $2}' command. So, input to first command is provided through piping and output is obtained.
For the second command, no input is given. So, it waits for the input from STDIN which is the keyboard and you can feed the input through keyboard and press ctrl+D to obtain output.
Please run the code like below to get desired output:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |tail -1 | awk '{print $2}
'
All you need is:
$ awk -F: '
NR==1 { min=max=$1 }
length($1) > length(max) { max=$1 }
length($1) < length(min) { min=$1 }
END { print min ORS max }
' /etc/passwd
No explicit loops or pipelines or multiple commands required.
The problem is that you only have two pipelines, when you really need one. So you have grep | while read do ... done | sort | head | awk and sort | tail | awk: the first sort has an input (i.e., the while loop) - the second sort doesn't. So the script is hanging because your second sort doesn't have an input: or rather it does, but it's STDIN.
There's various ways to resolve:
save the output of the while loop to a temporary file and use that as an input to both sort commands
repeat your while loop
use awk to do both the head and tail
The first two involve iterating over the password file twice, which may be okay - depends what you're ultimately trying to do. But using a small awk script, this can give you both the first and last line by way of the BEGIN and END blocks.
While you already have good answers, you can also use POSIX shell to accomplish your goal without any pipe at all using the parameter expansion and string length provided by the shell itself (see: POSIX shell specifiction). For example you could do the following:
#!/bin/sh
sl=32;ll=0;sn=;ln=; ## short len, long len, short name, long name
while read -r line; do ## read each line
u=${line%%:*} ## get user
len=${#u} ## get length
[ "$len" -lt "$sl" ] && { sl="$len"; sn="$u"; } ## if shorter, save len, name
[ "$len" -gt "$ll" ] && { ll="$len"; ln="$u"; } ## if longer, save len, name
done </etc/passwd
printf "shortest (%2d): %s\nlongest (%2d): %s\n" $sl "$sn" $ll "$ln"
Example Use/Output
$ sh cketcpw.sh
shortest ( 2): at
longest (17): systemd-bus-proxy
Using either pipe/head/tail/awk or the shell itself is fine. It's good to have alternatives.
(note: if you have multiple users of the same length, this just picks the first, you can use a temp file if you want to save all names and use -le and -ge for the comparison.)
If you want both the head and the tail from the same input, you may want something like sed -e 1b -e '$!d' after you sort the data to get the top and bottom lines using sed.
So your script would be:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n | sed -e 1b -e '$!d'
Alternatively, a shorter way:
cut -d":" -f1 /etc/passwd | awk '{ print length, $0 }' | sort -n | cut -d" " -f2- | sed -e 1b -e '$!d'

Array Length is 1 in Bash scripting

In the below code, the array length is 1.
Could anyone explain why, as grep output will displayed in each new line but when it is stored in the array, the array length will be 1.
How to display each line reading the array?
#!/bin/bash
NUM=()
SHORT_TEXT=()
LONG_TEXT=()
#cat /tmp/dummy2 |
while read NUM
do
LONG_TEXT+=$(grep $NUM -A4 RtpLogShm.Msg | grep -vi abate | grep ^LG)
done < /tmp/dummy2
#cat /tmp/dummy1 |
while read LINE
do
NUM+=$(echo $LINE | awk -F':' '{print $1}')
SHORT_TEXT+=$(echo $LINE | awk -F':' '{print $2}')
done < /tmp/dummy1
printf "[%s]\n" "${LONG_TEXT[#]}"
done
done
In bash, the syntax of appending to an array is (say we want to append an element stored in ${new_element} to an existing array ${array[#]}):
array=("${array[#]}" "${new_element}")

awk - send sum to global variable

I have a line in a bash script that calculates the sum of unique IP requests to a certain page.
grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}' | sort | uniq -c | awk '{sum += 1; print } END { print " ", sum, "total"}'
I am trying to get the value of sum to a variable outside the awk statement so I can compare pages to each other. So far I have tried various combinations of something like this:
unique_sum=0
grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}' | sort | uniq -c | awk '{sum += 1; print ; $unique_sum=sum} END { print " ", sum, "total"}'
echo "${unique_sum}"
This results in an echo of "0". I've tried placing __$unique_sum=sum__ in the END, various combinations of initializing the variable (awk -v unique_sum=0 ...) and placing the variable assignment outside of the quoted sections.
So far, my Google-fu is failing horribly as most people just send the whole of the output to a variable. In this example, many lines are printed (one for each IP) in addition to the total. Failing a way to capture the 'sum' variable, is there a way to capture that last line of output?
This is probably one of the most sophisticated things I've tried in awk so my confidence that I've done anything useful is pretty low. Any help will be greatly appreciated!
You can't assign a shell variable inside an awk program. In general, no child process can alter the environment of its parent. You have to have the awk program print out the calculated value, and then shell can grab that value and assign it to a variable:
output=$( grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}' | sort | uniq -c | awk '{sum += 1; print } END {print sum}' )
unique_sum=$( sed -n '$p' <<< "$output" ) # grab the last line of the output
sed '$d' <<< "$output" # print the output except for the last line
echo " $unique_sum total"
That pipeline can be simplified quite a lot: awk can do what grep can do, so first
grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}'
is (longer, but only one process)
awk -F" - " -v date="$YESTERDAY" -v patt="$1" '$0 ~ date && $0 ~ patt {print $1}' "$ACCESSLOG"
And the last awk program just counts how many lines and can be replaced with wc -l
All together:
unique_output=$(
awk -F" - " -v date="$YESTERDAY" -v patt="$1" '
$0 ~ date && $0 ~ patt {print $1}
' "$ACCESSLOG" | sort | uniq -c
)
echo "$unique_output"
unique_sum=$( wc -l <<< "$unique_output" )
echo " $unique_sum total"

Resources