How to duplicate string until another string in bash - linux

I have a text file that is structure like below :
293.0 2305.3 1508.0
2466.3 1493.0
2669.5 1578.6
3497.2 1768.9
4265.5 2092.4
5940.8 2558.6
7308.7 3015.4
9377.7 3814.6
295.0 2331.4 1498.1
3617.0 1893.2
I'm still new in Linux, is there anyway for it to be output as desire like an example below :
293.0 2305.3 1508.0
293.0 2466.3 1493.0
293.0 2669.5 1578.6
293.0 3497.2 1768.9
293.0 4265.5 2092.4
293.0 5940.8 2558.6
293.0 7308.7 3015.4
293.0 9377.7 3814.6
295.0 2331.4 1498.1
295.0 3617.0 1893.2
So basically, I want it to duplicate until it meets another variable.

With Barmar's idea:
If row contains three columns, save first column to a variable and print all three columns. If row contains two columns, print variable and column one and two:
awk 'NF==3{c=$1; print $1,$2,$3}; NF==2{print c,$1,$2}' file
Output:
293.0 2305.3 1508.0
293.0 2466.3 1493.0
293.0 2669.5 1578.6
293.0 3497.2 1768.9
293.0 4265.5 2092.4
293.0 5940.8 2558.6
293.0 7308.7 3015.4
293.0 9377.7 3814.6
295.0 2331.4 1498.1
295.0 3617.0 1893.2

You can also use the following command:
$ sed 's/ \+/|/g' dupl.in | awk 'BEGIN{FS="|"}{if($1){buff=$1; printf "%.1f ", buff;} else {printf "%.1f ", buff;} printf "%1.f %1.f \n",$2, $3}'
293.0 2305 1508
293.0 2466 1493
293.0 2670 1579
293.0 3497 1769
293.0 4266 2092
293.0 5941 2559
293.0 7309 3015
293.0 9378 3815
295.0 2331 1498
295.0 3617 1893

You can also use pure bash style thanks to its array capacities:
$ while read -a f; do \
if [ ! -z ${f[2]} ]; then \
c=${f[0]}; echo ${f[#]}; \
else \
echo $c ${f[#]}; \
fi; \
done <file
I have cut lines for reading purpose.
Line by line (while read), I save fields in an array "f". If the third field is not null (if [ ! -z...]), I save first column and print all fields, else, I print c and the 2 other fields.

Related

Line counting - How to exclude a directory and images?

In order to count the lines of my repository, I typed the code below, and found out that images and pdfs are also included in the word count.
git ls-files | xargs wc -l
When someone asks you for the scale of the repository, would you include the images/pdfs?
If not, could someone help me answer the questions below?
How to exclude the files under "/pdfs" directory
How to exclude .jpg and .png?
You can make use of cloc. It counts blank lines, comment lines, and physical lines of source code in many programming languages. Cloc can take file, directory, and/or archive names as inputs. For instance, if you want to count the number of lines of code in your repository and exclude some directories while counting, you can specify those directories separated by comma like this:
cloc --exclude-dir=imagedir,pdfdir your_repository
cloc will show you the report like this:
387 text files.
387 unique files.
22 files ignored.
github.com/AlDanial/cloc v 1.88 T=0.97 s (376.5 files/s, 152866.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Go 235 17216 11769 95308
InstallShield 2 410 0 11178
XML 41 1418 159 2738
Python 5 516 523 1792
Bourne Shell 21 266 283 1512
JSON 19 24 0 1005
Markdown 23 452 0 797
AsciiDoc 4 119 0 312
Ruby 4 44 31 238
YAML 4 4 2 113
WiX source 1 19 24 112
make 3 16 25 68
DOS Batch 2 13 2 38
WiX include 1 0 0 28
Dockerfile 1 13 9 17
-------------------------------------------------------------------------------
SUM: 366 20530 12827 115256
-------------------------------------------------------------------------------
You can also use CLOC with Git like this:
cloc $(git ls-files)
which is equivalent to
git ls-files | xargs cloc
cloc sounds like it does the job. You should remove space+tab from IFS if you use command sub though: IFS=$'\n' cloc $(git ls-files)
If you just want to know a word count or line count, you could bodge it together like this. It gives you the language too. Clone the repo, test for text file / file type, count lines, delete files.
#!/bin/sh -e
# Get dir name from URL + remove trailing slashes - works for _most_ urls
url=${1:? No URL given}
url=${url%/}; url=${url%/}
repo=${1##*/}
repo=${repo%.git}
dir=./$repo
# Clone repo in tmp
cd "${TMPDIR:-/tmp}"
[ -e "$dir" ] && { echo Exists: "$dir" >&2; exit 1; }
trap 'rm -rf "$dir"' EXIT INT
git clone "$url"
# Get column 1 width, for alignment
max_path_length=$(printf '%s\n' "$dir/"* | wc -L)
# Extract and print the data
printf '\n%s\n\n' "$repo text files details:"
for file in "$dir"/*; do
mime=$(file --brief --mime-type "$file")
type=${mime%%/*}
if [ "$type" = text ]; then
lines=$(grep -c . "$file") || true
lang=${mime##*/}
printf "%-${max_path_length}s %s\n" "${file#$dir}" "[$lang, $lines lines]"
total_lines=$((total_lines + lines))
fi
done
printf '\n%s\n\n' "${dir#./} total lines: $total_lines"
Example output:
$ git-wc 'git://git.savannah.gnu.org/sed.git'
Cloning into 'sed'...
remote: Counting objects: 6276, done.
remote: Compressing objects: 100% (1134/1134), done.
remote: Total 6276 (delta 4994), reused 6276 (delta 4994)
Receiving objects: 100% (6276/6276), 2.14 MiB | 495.00 KiB/s, done.
Resolving deltas: 100% (4994/4994), done.
sed text files details:
/AUTHORS [plain, 6 lines]
/BUGS [plain, 101 lines]
/COPYING [plain, 553 lines]
/ChangeLog-2014 [plain, 2586 lines]
/Makefile.am [x-makefile, 123 lines]
/NEWS [plain, 498 lines]
/README [plain, 12 lines]
/README-hacking [plain, 58 lines]
/THANKS.in [plain, 63 lines]
/basicdefs.h [x-c, 83 lines]
/bootstrap [x-shellscript, 930 lines]
/bootstrap.conf [plain, 121 lines]
/cfg.mk [plain, 343 lines]
/configure.ac [x-m4, 294 lines]
/init.cfg [plain, 163 lines]
/thanks-gen [x-perl, 12 lines]
sed total lines: 5946
If the repo is local, you can just adjust the input methods. I'm sure the idea is clear. I know cloning the whole repo may be the dumbest way to do something like this, but sometimes you just want to know a thing. Plus you can use bash/sh - eg. [[ "$file" == "$dir/<exclude-dir>/* ]].

How to make this awk script simple and use in gnuscript in loop form

I am attempting to plot a multicolum file using gnuplot script.
I am doing it like
plot "100.dat" u ($1-CONS):($2*$3) w l lt 4 ,
"200.dat" u ($1-CONS):($2*$3) w l lt 2 ,
"300.dat" u ($1-CONS):($2*$3) w l lt 1
where CONS is my variable defined at the top of file.
My set xrange is [-0.2:0.2] while data in the scale is beyond this scale.
What I want to capture is (in loop form for multiple files):
maximum value of above three plots in negative and positive both sides and corresponding value of column 1 in my xrange for both the maximum.
in a shell script I can do it easily but I am facing problem in defining in my gnuscript
my shell script is below
for i in 100.0000 200.0000 200.0000
do
grep $i data.dat > $i.dat
awk '{print ($1-CONS), ($2*$3)}' $i.dat | awk '{ if($1 <= 0.2 && $1 >= 0.0) { print }}' > $i.p2.dat ; awk 'BEGIN {min=1000000; max=0;}; { if($2<min && $2 != "") min = $2; if($2>max && $2 != "") max = $2; } END {print min, max}' $i.p2.dat | awk '{print $2}' > $i.p2Max.dat ; PMAX=$(cat $i.p2Max.dat) ; grep "$PMAX" $i.p2.dat | tail -n 1 >> MAX.dat
awk '{print ($1-CONS), ($2*$3)}' $i.dat | awk '{ if($1 <= 0.0 && $1 >= -0.2) { print }}' > $i.mi.dat ; awk 'BEGIN {min=1000000; max=0;}; { if($2<min && $2 != "") min = $2; if($2>max && $2 != "") max = $2; } END {print min, max}' $i.mi.dat | awk '{print $2}' > $i.mi_Max.dat ; N_MAX=$(cat $i.mi_Max.dat) ; grep "$N_MAX" $i.mi.dat | tail -n 1 >> MAX.dat
done
I am looking for a simple script that can be used in the gnuplot script in loop form so that if I have multiple data file and I need to grep the maximum of a colum two (on both the sides of the zero) then it store the maximum value of column two wrt corresponding value of column 1 separately for negative and positive scale.
I would love to see if this can be done using a loop so that I do not need to write all the lines repetitively.
Your description is a bit confusing to me. My understanding is the following: loop through several files and extract the maxima in the xranges [-0.2:0] and [0:0.2],
respectively.
Test data:
100.dat
-0.17 0.447 0.287
-0.13 0.353 0.936
-0.09 0.476 0.309
-0.05 0.504 0.220
-0.01 0.340 0.564
0.03 0.096 0.947
0.07 0.564 0.885
0.11 0.312 0.957
0.15 0.058 0.347
0.19 0.016 0.923
0.23 0.835 0.461
200.dat
-0.17 0.608 0.875
-0.13 0.266 0.805
-0.09 0.948 0.696
-0.05 0.513 0.800
-0.01 0.736 0.392
0.03 0.318 0.312
0.07 0.708 0.534
0.11 0.246 0.975
0.15 0.198 0.914
0.19 0.174 0.318
0.23 0.727 0.341
300.dat
-0.17 0.527 0.658
-0.13 0.166 0.340
-0.09 0.695 0.031
-0.05 0.623 0.542
-0.01 0.996 0.674
0.03 0.816 0.365
0.07 0.286 0.433
0.11 0.069 0.381
0.15 0.719 0.621
0.19 0.516 0.701
0.23 0.248 0.659
Code:
### loop of files and extracting values
reset session
FILES = "100.dat 200.dat 300.dat"
Count = words(FILES)
CONS = 0.03
# get maxima
array NegMaxX[Count]
array NegMaxY[Count]
array PosMaxX[Count]
array PosMaxY[Count]
do for [i=1:Count] {
stats [-0.2:0] word(FILES,i) u ($1-CONS):($2*$3) nooutput
NegMaxX[i] = STATS_pos_max_y
NegMaxY[i] = STATS_max_y
stats [0:0.2] word(FILES,i) u ($1-CONS):($2*$3) nooutput
PosMaxX[i] = STATS_pos_max_y
PosMaxY[i] = STATS_max_y
}
set xrange[-0.2:0.2]
# set labels
do for [i=1:Count] {
set label i*2-1 at NegMaxX[i], NegMaxY[i] sprintf("%.3f/%.3f",NegMaxX[i],NegMaxY[i])
set label i*2 at PosMaxX[i], PosMaxY[i] sprintf("%.3f/%.3f",PosMaxX[i],PosMaxY[i])
}
plot for [i=1:Count] word(FILES,i) u ($1-CONS):($2*$3) w l lt i ti word(FILES,i), \
### end of code
Result:

Convert a text into time format using bash script

I am new to shell scripting.. I have a tab-separated file, e.g.,
0018803 01 1710 2050 002571
0018951 01 1934 2525 003277
0019362 02 2404 2415 002829
0019392 01 2621 2820 001924
0019542 01 2208 2413 003434
0019583 01 1815 2134 002971
Here, the 3rd and 4th column is representing Start Time and End Time.
I want to convert these two columns in proper timeFrame so that I can get 6th column as the exact time difference between column 4 and column 3 in hours and minutes.
Column 6 result will be 3:40, 5:51, 00:11, 1:59, 2:05.
One way with awk:
$ cat test.awk
# create a function to split hour and minute
function f(h, x) {
h[0] = substr(x,1,2)+0
h[1] = substr(x,3,2)+0
}
{
f(start, $3);
f(end, $4);
span = end[1] - start[1] > 0 \
? sprintf("%d:%02d", end[0]-start[0], end[1]-start[1]) \
: sprintf("%d:%02d", end[0]-start[0]-1, 60+end[1]-start[1]);
print $0 OFS span
}
then run the awk file as the following:
$ awk -f test.awk input_file
Edit: per #glenn jackman's suggestion, the code can be simplified (refer to #Kamil Cuk's method):
function g(x) {
return substr(x,1,2)*60 + substr(x,3,2)
}
{
span = g($4) - g($3)
printf("%s%s%d:%02d\n", $0, OFS, int(span/60), span%60)
}
A simple bash solution using arithmetic expansion:
while IFS='' read -r l; do
IFS=' ' read -r _ _ st et _ <<<"$l"
d=$(( (10#${et:0:2} * 60 + 10#${et:2:2}) - (10#${st:0:2} * 60 + 10#${st:2:2}) ))
printf "%s %02d:%02d\n" "$l" "$((d/60))" "$((d%60))"
done < intput_file_path
will output:
0018803 01 1710 2050 002571 03:40
0018951 01 1934 2525 003277 05:51
0019362 02 2404 2415 002829 00:11
0019392 01 2621 2820 001924 01:59
0019542 01 2208 2413 003434 02:05
0019583 01 1815 2134 002971 03:19
Here is one in GNU awk using time functions, mktime to convert to epoch time and strftime to convert the time to desired format HH:MM:
$ awk -v OFS="\t" '{
dt3="1970 01 01 " substr($3,1,2) " " substr($3,3,2) " 00"
dt4="1970 01 01 " substr($4,1,2) " " substr($4,3,2) " 00"
print $0,strftime("%H:%M",mktime(dt4)-mktime(dt3),1) # thanks #glennjackman,1 :)
}' file
Output ($6 only):
03:40
05:51
00:11
01:59
02:05
03:19

find length of a fixed width file wtih a little twist

Hi Wonderful People/My Gurus and all kind-hearted people.
I've a fixed width file and currently i'm trying to find the length of those rows that contain x bytes. I tried couple of awk commands but, it is not giving me the result that i wanted. My fixed width contains 208bytes, but there are few rows that don't contain 208 bytes. I"m trying to discover those records that doesn't have 208bytes.
this cmd gave me the file length
awk '{print length;exit}' file.text
here i tried to print rows that contain 101 bytes, but it didn't work.
awk '{print length==101}' file.text
Any help/insights here would be highly helpful
With awk:
awk 'length() < 208' file
Well, length() gives you the number of characters, not bytes. This number can differ in unicode context. You can use the LANG environment variable to force awk to use bytes:
LANG=C awk 'length() < 208' file
Perl to the rescue!
perl -lne 'print "$.:", length if length != 208' -- file.text
-n reads the input line by line
-l removes newlines from the input before processing it and adds them to print
The one-liner will print line number ($.) and the length of the line for each line whose length is different than 208.
if you're using gawk, then it's no issue, even in typical UTF-8 locale mode :
length(s) = # chars native to locale,
# typically that means # utf-8 chars
match(s, /$/) - 1 = # raw bytes # this also work for pure-binary
# inputs, without triggering
# any error messages in gawk Unicode mode
Best illustrated by example :
0000000 3347498554 3381184647 3182945161 171608122
: Ɔ ** LJ ** Ȉ ** ɉ ** 㷽 ** ** : 210 : \n
072 306 206 307 207 310 210 311 211 343 267 275 072 210 072 012
: ? 86 ? 87 ? 88 ? 89 ? ? ? : 88 : nl
58 198 134 199 135 200 136 201 137 227 183 189 58 136 58 10
3a c6 86 c7 87 c8 88 c9 89 e3 b7 bd 3a 88 3a 0a
0000020
# gawk profile, created Sat Oct 29 20:32:49 2022
BEGIN {
1 __ = "\306\206\307\207\310" (_="\210") \
"\311\211\343\267\275"
1 print "",__,_
1 STDERR = "/dev/stderr"
1 print ( match(_, /$/) - 1, "_" ) > STDERR # *A
1 print ( length(__), match(__, /$/) - 1 ) > STDERR # *B
1 print ( (__~_), match(__, (_) ".*") ) > STDERR # *C
1 print ( RSTART, RLENGTH ) > STDERR # *D
}
1 | _ *A # of bytes off "_" because it was defined as 0x88 \210
5 | 11 *B # of chars of "__", and
# of bytes of it :
# 4 x 2-byte UC
# + 1 x 3-byte UC = 11
1 | 3 *C # does byte \210 exist among larger string (true/1),
# and which unicode character is 1st to
# contain \210 - the 3rd one, by original definition
3 | 3 *D # notice I also added a ".*" to the tail of this match() :
# if the left-side string being tested is valid UTF-8,
# then this will match all the way to the end of string,
# inclusive, in which you can deduce :
#
# "\210 first appeared in 3rd-to-last utf-8 character"
Combining that inferred understanding :
RLENGTH = "3 chars to the end, inclusive",
with knowledge of how many to its left :
RSTART - 1 = "2 chars before",
yields a total count of 3 + 2 = 5, affirming length()'s result

Best way to divide in bash using pipes?

I'm just looking for an easy way to divide a number (or provide other math functions). Let's say I have the following command:
find . -name '*.mp4' | wc -l
How can I take the result of wc -l and divide it by 3?
The examples I've seen don't deal with re-directed out/in.
Using bc:
$ bc -l <<< "scale=2;$(find . -name '*.mp4' | wc -l)/3"
2.33
In contrast, the bash shell only performs integer arithmetic.
Awk is also very powerful:
$ find . -name '*.mp4' | wc -l | awk '{print $1/3}'
2.33333
You don't even need wc if using awk:
$ find . -name '*.mp4' | awk 'END {print NR/3}'
2.33333
Edit 2018-02-22: Adding shell connector
There is more than 1 way:
Depending on precision required and number of calcul to be done! See shell connector further!
Using bc (binary calculator)
find . -type f -name '*.mp4' -printf \\n | wc -l | xargs printf "%d/3\n" | bc -l
6243.33333333333333333333
or
echo $(find . -name '*.mp4' -printf \\n | wc -l)/3|bc -l
6243.33333333333333333333
or using bash, result in integer only:
echo $(($(find . -name '*.mp4' -printf \\n| wc -l)/3))
6243
Using bash interger builtin math processor
res=000$((($(find . -type f -name '*.mp4' -printf "1+")0)*1000/3))
printf -v res "%.2f" ${res:0:${#res}-3}.${res:${#res}-3}
echo $res
6243.33
Pure bash
With recent 64bits bash, you could even use #glennjackman's ideas of using globstar, but computing pseudo floating could be done by:
shopt -s globstar
files=(**/*.mp4)
shopt -u globstar
res=$[${#files[*]}000/3]
printf -v res "%.2f" ${res:0:${#res}-3}.${res:${#res}-3}
echo $res
6243.33
There is no fork and $res contain a two digit rounded floating value.
Nota: Care about symlinks when using globstar and **!
Introducing shell connector
If you plan to do a lot of calculs, require high precision and use bash, you could use long running bc sub process:
mkfifo /tmp/mybcfifo
exec 5> >(exec bc -l >/tmp/mybcfifo)
exec 6</tmp/mybcfifo
rm /tmp/mybcfifo
then now:
echo >&5 '12/34'
read -u 6 result
echo $result
.35294117647058823529
This subprocess stay open and useable:
ps --sid $(ps ho sid $$) fw
PID TTY STAT TIME COMMAND
18027 pts/9 Ss 0:00 bash
18258 pts/9 S 0:00 \_ bc -l
18789 pts/9 R+ 0:00 \_ ps --sid 18027 fw
Computing $PI:
echo >&5 '4*a(1)'
read -u 6 PI
echo $PI
3.14159265358979323844
To terminate sub process:
exec 6<&-
exec 5>&-
Little demo, about The best way to divide in bash using pipes!
Computing range {1..157} / 42 ( I will let you google for answer to the ultimate question of life, the universe, and everything ;)
... and print 13 result by lines in order to reduce output:
printf -v form "%s" "%5.3f "{,}{,}{,,};form+="%5.3f\n";
By regular way
testBc(){
for ((i=1; i<157; i++)) ;do
echo $(bc -l <<<"$i/42");
done
}
By using long running bc sub process:
testLongBc(){
mkfifo /tmp/mybcfifo;
exec 5> >(exec bc -l >/tmp/mybcfifo);
exec 6< /tmp/mybcfifo;
rm /tmp/mybcfifo;
for ((i=1; i<157; i++)) ;do
echo "$i/42" 1>&5;
read -u 6 result;
echo $result;
done;
exec 6>&-;
exec 5>&-
}
Let's see without:
time printf "$form" $(testBc)
0.024 0.048 0.071 0.095 0.119 0.143 0.167 0.190 0.214 0.238 0.262 0.286 0.310
0.333 0.357 0.381 0.405 0.429 0.452 0.476 0.500 0.524 0.548 0.571 0.595 0.619
0.643 0.667 0.690 0.714 0.738 0.762 0.786 0.810 0.833 0.857 0.881 0.905 0.929
0.952 0.976 1.000 1.024 1.048 1.071 1.095 1.119 1.143 1.167 1.190 1.214 1.238
1.262 1.286 1.310 1.333 1.357 1.381 1.405 1.429 1.452 1.476 1.500 1.524 1.548
1.571 1.595 1.619 1.643 1.667 1.690 1.714 1.738 1.762 1.786 1.810 1.833 1.857
1.881 1.905 1.929 1.952 1.976 2.000 2.024 2.048 2.071 2.095 2.119 2.143 2.167
2.190 2.214 2.238 2.262 2.286 2.310 2.333 2.357 2.381 2.405 2.429 2.452 2.476
2.500 2.524 2.548 2.571 2.595 2.619 2.643 2.667 2.690 2.714 2.738 2.762 2.786
2.810 2.833 2.857 2.881 2.905 2.929 2.952 2.976 3.000 3.024 3.048 3.071 3.095
3.119 3.143 3.167 3.190 3.214 3.238 3.262 3.286 3.310 3.333 3.357 3.381 3.405
3.429 3.452 3.476 3.500 3.524 3.548 3.571 3.595 3.619 3.643 3.667 3.690 3.714
real 0m10.113s
user 0m0.900s
sys 0m1.290s
Wow! Ten seconds on my raspberry-pi!!
Then with:
time printf "$form" $(testLongBc)
0.024 0.048 0.071 0.095 0.119 0.143 0.167 0.190 0.214 0.238 0.262 0.286 0.310
0.333 0.357 0.381 0.405 0.429 0.452 0.476 0.500 0.524 0.548 0.571 0.595 0.619
0.643 0.667 0.690 0.714 0.738 0.762 0.786 0.810 0.833 0.857 0.881 0.905 0.929
0.952 0.976 1.000 1.024 1.048 1.071 1.095 1.119 1.143 1.167 1.190 1.214 1.238
1.262 1.286 1.310 1.333 1.357 1.381 1.405 1.429 1.452 1.476 1.500 1.524 1.548
1.571 1.595 1.619 1.643 1.667 1.690 1.714 1.738 1.762 1.786 1.810 1.833 1.857
1.881 1.905 1.929 1.952 1.976 2.000 2.024 2.048 2.071 2.095 2.119 2.143 2.167
2.190 2.214 2.238 2.262 2.286 2.310 2.333 2.357 2.381 2.405 2.429 2.452 2.476
2.500 2.524 2.548 2.571 2.595 2.619 2.643 2.667 2.690 2.714 2.738 2.762 2.786
2.810 2.833 2.857 2.881 2.905 2.929 2.952 2.976 3.000 3.024 3.048 3.071 3.095
3.119 3.143 3.167 3.190 3.214 3.238 3.262 3.286 3.310 3.333 3.357 3.381 3.405
3.429 3.452 3.476 3.500 3.524 3.548 3.571 3.595 3.619 3.643 3.667 3.690 3.714
real 0m0.670s
user 0m0.190s
sys 0m0.070s
Less than one second!!
Hopefully, results are same, but execution time is very different!
My shell connector
I've published a connector function: Connector-bash on GitHub.com
and shell_connector.sh on my own site.
source shell_connector.sh
newConnector /usr/bin/bc -l 0 0
myBc 1764/42 result
echo $result
42.00000000000000000000
find . -name '*.mp4' | wc -l | xargs -I{} expr {} / 2
Best used if you have multiple outputs you'd like to pipe through xargs. Use{} as a placeholder for the expression term.
Depending on your bash version, you don't even need find for this simple task:
shopt -s nullglob globstar
files=( **/*.mp4 )
dc -e "3 k ${#files[#]} 3 / p"
This method will correctly handle the bizarre edgecase of filenames containing newlines.

Resources