ubuntu bash printing result in system with extra "" - linux

Why do my result for A have "" and only capture first word while my B is fine?
File: sample.txt
Amos Tan:Sunny Day:22.5:3:2
Jason Ong:Rainy Day:20.5:3:2
Bryan Sing:Cloudy Day:29.5:3:2
Code in terminal:
cat ./sample.txt | while read A B
do
title=`echo “$A” | cut -f 1 -d ":"`
echo "Found $title"
author=`echo “$B” | cut -f 2 -d ":"`
echo "Found $author
done
Results:
Found “Amos”
Found Sunny Day
Found “Jason”
Found Rainy Day
Found “Bryan”
Found Cloudy Day

This line is the problem:
cat ./sample.txt | while read A B
It is reading first word into A and rest of the line in variable B.
You can better use:
while read -r line
do
title=$(echo "$line" | cut -f 1 -d ":")
echo "Found title=$title"
author=$(echo "$line" | cut -f 2 -d ":")
echo "Found author=$author"
done < ./sample.txt
Or simply use awk:
awk -F : '{printf "title=%s, author=%s\n", $1, $2}' sample.txt

Related

copy a string form a line and paste at the end in another line for a huge file based on pattern

I have the below requirement. I am trying to run the condition in loop and it's taking more time. Is there a one time command anything which will not take more time to process a 70 MB file.
Requirement:
if #pRECTYPE="SBSB" line contains #pSBEL_MCTR_RSN="XXX" tag then we need to copy and append that to next #pRECTYPE="SBEL record at the end of the line
File :note : in file there will be no blank lines. I have given enter to avoid line continuation
#pRUKE=dfgt#pRECTYPE="SMDR", #pCONFIG="Y" XXXXXXX
#pRUKE=dfgt#pRECTYPE="SBSB", #pGWID="1234", #pSBEL_MCTR_RSN="KX28", #pSBSB_9000_COLL=""
#pRUKE=dfgt#pRECTYPE="KBSG", #pKBSG_UPDATE_CD="IN", XXXXXXXXXXX
#pRUKE=dfgt#pRECTYPE="SBEL", #pSBEL_EFF_DT="01/01/2017", #pCSPI_ID="JKOX0001", #pSBEL_FI="A"
#pRUKE=dfgt#pRECTYPE="SBEK", #pSBEK_UPDATE_CD="IN",XXXXXXXXXXXXXXXXXXX
#pRUKE=dfgt#pRECTYPE="DBCS", #pDBCS_UPDATE_CD="IN",XXXXXXXXXXXXXXXXXXXXXXXXXX
#pRUKE=dfgt#pRECTYPE="MEME", #pMEME_REL="18", #pMEEL_MCTR_RSN="KX28"
#pRUKE=dfgt#pRECTYPE="ATT0", #pATT0_UPDATE_CD="AP",XXXXXXXXX
#pRUKE=dfgt#pRECTYPE="SBSB", #pGWID="1234", #pSBEL_MCTR_RSN="KX28", #pSBSB_9000_COLL=""
#pRUKE=dfgt#pRECTYPE="KBSG", #pKBSG_UPDATE_CD="IN", XXXXXXXXXXX
example :
Before :
#pRUKE=dfgt#pRECTYPE="SMDR", #pCONFIG="Y" XXXXXXX
#pRUKE=dfgt#pRECTYPE="SBSB", #pGWID="1234", #pSBEL_MCTR_RSN="KX28", #pSBSB_9000_COLL=""
#pRUKE=dfgt#pRECTYPE="KBSG", #pKBSG_UPDATE_CD="IN", XXXXXXXXXXX
#pRUKE=dfgt#pRECTYPE="SBEL", #pSBEL_EFF_DT="01/01/2017", #pCSPI_ID="JKOX0001", #pSBEL_FI="A"
After:
#pRUKE=dfgt#pRECTYPE="SMDR", #pCONFIG="Y" XXXXXXX
#pRUKE=dfgt#pRECTYPE="SBSB", #pGWID="1234", #pSBEL_MCTR_RSN="KX28", #pSBSB_9000_COLL=""
#pRUKE=dfgt#pRECTYPE="KBSG", #pKBSG_UPDATE_CD="IN", XXXXXXXXXXX
#pRUKE=dfgt#pRECTYPE="SBEL", #pSBEL_EFF_DT="01/01/2017", #pCSPI_ID="JKOX0001", #pSBEL_FI="A", #pSBEL_MCTR_RSN="KX28"
After SBSB, if there is no SBEL, then that SBSB can be ignored.
What I did is:
egrep -n "pRECTYPE=\"SBSB\"|pRECTYPE=\"SBEL\"" filename | sed '$!N;/pRECTYPE=\"SBEL\"/P;D' | awk -F\: '{print $1}' | awk 'NR%2{printf "%s,",$0;next;}1' > 4.txt;
by this I will get the line number, eg:
2,4
17,19
Line 9 12 14 will be ignored
while read line
do
echo "$line";
SBSB=`echo "$line" | awk -F, '{print $1}'`;
SBEL=`echo "$line" | awk -F, '{print $2}'`;
echo $SBSB;
echo $SBEL;
SBSB_Fetch=`sed -n "$SBSB p" $fil | grep -Eo '(#pSBEL_MCTR_RSN)=[^ ]+' | sed 's/,$//' | sed 's/^/, /g'`;
echo $SBSB_Fetch;
if [[ "$SBSB_Fetch" == "" ]];then
echo "blank";
s=blank;
else
echo "value";
sed -i "${SBEL}s/.*/&${SBSB_Fetch}/" $fil;
fi
done < 4.txt;
Since I am ready and updating each line ,it's taking more time, is there any way to reduce the run time?
For 70 Mb it's taking 4 .5 hours now.
For performance, you need to really limit how many external tools you invoke inside a loop in a shell script.
This requires GNU awk:
gawk '
/#pRECTYPE="SBSB"/ {match($0, /#pSBEL_MCTR_RSN="[^"]*"/, m)}
/#pRECTYPE="SBEL"/ && isarray(m) {$0 = $0 ", " m[0]; delete m}
1
' file
This should be pretty quick:
only invoking one external command
no shell loops
only have to read the input file once.

bash count sequential files

I'm pretty new to bash scripting so some of the syntaxes may not be optimal. Please do point them out if you see one.
I have files in a directory named sequentially.
Example: prob01_01 prob01_03 prob01_07 prob02_01 prob02_03 ....
I am trying to have the script iterate through the current directory and count how many extensions each problem has. Then print the pre-extension name then count
Sample output for above would be:
prob01 3
prob02 2
This is my code:
#!/bin/bash
temp=$(mktemp)
element=''
count=0
for i in *
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1"
else
echo $element $count >> temp
element=$current
count=1
fi
done
echo 'heres the temp:'
cat temp
rm 'temp'
The Problem:
Current output:
prob1 3
Desired output:
prob1 3
prob2 2
The last count isn't appended because it's not seeing a different element after it
My Guess on possible solutions:
Have the last append occur at the end of the for loop?
Your code has 2 problems.
The first problem doesn't answer your question. You make a temporary file, the filename is stored in $temp. You should use that one, and not the file with the fixed name temp.
The problem is that you only write results when you see a new problem/filename. The last one will not be printed.
Fixing only these problems will result in
results() {
if (( count == 0 )); then
return
fi
echo $element $count >> "${temp}"
}
temp=$(mktemp)
element=''
count=0
for i in prob*
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1" # Better is using ((count++))
else
results
element=$current
count=1
fi
done
results
echo 'heres the temp:'
cat "${temp}"
rm "${temp}"
You can do without the script with
ls prob* | cut -d"_" -f1 | sort | uniq -c
When you want the have the output displayed as given, you need one more step.
ls prob* | cut -d"_" -f1 | sort | uniq -c | awk '{print $2 " " $1}'
You may use printf + awk solution:
printf '%s\n' *_* | awk -F_ '{a[$1]++} END{for (i in a) print i, a[i]}'
prob01 3
prob02 2
We use printf to print each file that has at least one _
We use awk to get a count of each file's first element delimited by _ by using an associative array.
I would do it like this:
$ ls | awk -F_ '{print $1}' | sort | uniq -c | awk '{print $2 " " $1}'
prob01 3
prob02 2

Converting date format in bash

I have similar different file of the format backup_2016-26-10_16-30-00 is it possible to rename using bash script to backup_26-10-2016_16:30:00 for all files.
Kindly suggest some method to fix this.
Original file:
backup_2016-30-10_12-00-00
Expected output:
backup_30-10-2016_12:00:00
To perform only the name transformation, you can use awk:
echo 'backup_2016-30-10_12-00-00' |
awk -F'[_-]' '{ print $1 "_" $3 "-" $4 "-" $2 "_" $5 ":" $6 ":" $7 }'
As fedorqui points out in a comment, awk's printf function may be tidier in this case:
echo 'backup_2016-30-10_12-00-00' |
awk -F'[_-]' '{ printf "%s_%s-%s-%s_%s:%s:%s\n", $1,$3,$4,$2,$5,$6,$7 }'
That said, your specific Linux distro may come with a rename tool that allows you to do the same while performing actual file renaming.
with perl based rename command:
$ touch backup_2016-30-10_12-00-00 backup_2016-26-10_16-30-00
$ rename -n 's/(\d{4})-([^_]+)_(\d+)-(\d+)-/$2-$1_$3:$4:/' backup*
rename(backup_2016-26-10_16-30-00, backup_26-10-2016_16:30:00)
rename(backup_2016-30-10_12-00-00, backup_30-10-2016_12:00:00)
remove the -n option for actual renaming
rename is for this task
$ rename 's/_(\d{4})-(\d\d-\d\d)_(\d\d)-(\d\d)-(\d\d)$/_$2-$1_$3:$4:$5/' backup_2016-30-10_12-00-00
but not sure will be simpler
you can also this script;
#!/bin/bash
fileName=$1
prefix=$(echo ${fileName} | cut -d _ -f1)
date=$(echo ${fileName} | cut -d _ -f2)
time=$(echo ${fileName} | cut -d _ -f3)
year=$(echo ${date} | cut -d - -f1)
day=$(echo ${date} | cut -d '-' -f2)
month=$(echo ${date} | cut -d '-' -f3)
formatedTime=$(echo $time | sed 's/-/:/g')
formatedDate=$day"-"$month"-"$year
formatedFileName=$prefix"_"$formatedDate"_"$formatedTime
echo $formatedFileName
Eg;
user#host:/tmp$ ./test.sh backup_2016-30-10_12-00-00
backup_30-10-2016_12:00:00

Split a string with two patterns

I have a string id=12345&data=23456
I want to print 12345 23456
Currently I just know how to split one of them separately by awk
echo id=12345&data=23456 | awk -F"id=" '{print substr($2,1,5)}'
and it's similar for data.
How can I combine those awk command to get the desired result?
regex groups can be one solution but awk can't handle regex groups but, gawk can.
Example
echo "id=12345&data=23456" | gawk 'match($0, /^id=([^&]*)&data=(.*)$/, groups) {print groups[1] " " groups[2]}'
Output
12345 23456
There's no need for external processes. You can use the builtin read to extract the two numbers:
$ IFS="=&" read _ num1 _ num2 <<< "id=12345&data=23456"
$ printf "%s\n" "$num1" "$num2"
12345
23456
With awk:
echo "id=12345&data=23456" | awk -F[\&=] '{ print $2,$4}'
With grep and tr:
echo "id=12345&data=23456" | grep -o '[0-9]\+' | tr '\n' ' '
Note: This above command will add one more space at the end.
Looks like a query string to me...I would suggest parsing it as such. For example, using PHP:
echo "id=12345&data=23456" | php -r 'parse_str(fgets(STDIN), $query); print_r($query);'
This gives the output:
Array
(
[id] => 12345
[data] => 23456
)
So to get the output you were looking for, you could go for:
$ echo "id=12345&data=23456" | php -r 'parse_str(fgets(STDIN), $query); echo $query["id"] . " " . $query["data"];'
12345 23456
For a quick and dirty alternative, you could use sed:
$ echo "id=12345&data=23456" | sed -r 's/id=([^&]+)&data=([^&]+)/\1 \2/'
12345 23456
This captures the part following id= up to the & and the part following &data= up to the next & (if there is one). The disadvantage of this approach is that it breaks if the two parts of the query string are in the opposite order but it might be good enough for your use case.
Alternative Code
echo "id=12345&data=23456" | tr -s '&' '\n' | cut -d '=' -f 2 | tr -s '\n' ''
Depending what you're going to do with the output, one of these may be all you need:
$ echo 'id=12345&data=23456' | tr -c -s '[0-9]' ' '
12345 23456 $
$ echo 'id=12345&data=23456' | tr -s '[a-z=&]' ' '
12345 23456
$

Merge two text files specific position

I need to merge two files with a Bash script.
File_1.txt
TEXT01 TEXT02 TEXT03 TEXT04
TEXT05 TEXT06 TEXT07 TEXT08
TEXT09 TEXT10 TEXT11 TEXT12
File_2.txt
1993.0
1994.0
1995.0
Result.txt
TEXT01 TEXT02 1993.0 TEXT03 TEXT04
TEXT05 TEXT06 1994.0 TEXT07 TEXT08
TEXT09 TEXT10 1995.0 TEXT11 TEXT12
File_2.txt need to be merged at this specific position. I have tried different solutions with multiple do while loops, but they have not been working so far..
awk '{
getline s3 < "file1"
printf "%s %s %s ",$1,$2,s3
for(i=3;i<=NF;i++){
printf "%s ",$i
}
print ""
}END{close(s3)}' file
output
# more file
TEXT01 TEXT02 TEXT03 TEXT04
TEXT05 TEXT06 TEXT07 TEXT08
TEXT09 TEXT10 TEXT11 TEXT12
$ more file1
1993.0
1994.0
1995.0
$ ./shell.sh
TEXT01 TEXT02 1993.0 TEXT03 TEXT04
TEXT05 TEXT06 1994.0 TEXT07 TEXT08
TEXT09 TEXT10 1995.0 TEXT11 TEXT12
Why, use cut and paste, of course! Give this a try:
paste -d" " <(cut -d" " -f 1-2 File_1.txt) File_2.txt <(cut -d" " -f 3-4 File_1.txt)
This was inspirated by Dennis Williamson's answer so if you like it give there a +1 too!
paste test1.txt test2.txt | awk '{print $1,$2,$5,$3,$4}'
This is a solution without awk.
The interesting is how to use the file descriptors in bash.
#!/bin/sh
exec 5<test2.txt # open file descriptor 5
cat test1.txt | while read ln
do
read ln2 <&5
#change this three lines as you wish:
echo -n "$(echo $ln | cut -d ' ' -f 1-2) "
echo -n "$ln2 "
echo $ln | cut -d ' ' -f 3-4
done
exec 5>&- # Close fd 5
Since the question was tagged with 'sed', here's a variant of Vereb's answer using sed instead of awk:
paste File_1.txt File_2.txt | sed -r 's/( [^ ]* [^ ]*)\t(.*)/ \2\1/'
Or in pure sed ... :D
sed -r '/ /{H;d};G;s/^([^\n]*)\n*([^ ]* [^ ]*)/\2 \1/;P;s/^[^\n]*\n//;x;d' File_1.txt File_2.txt
Using perl, give file1 and file2 as arguments to:
#/usr/local/bin/perl
open(TXT2, pop(#ARGV));
while (<>) {
chop($m = <TXT2>);
s/^((\w+\s+){2})/$1$m /;
print;
}

Resources