How to find min String from a delimited string in linux? - linux

I have a delimited string stored in a variable.
date_str = 2017-04-03,2017-04-04,2017-04-05
How do I take the min value out of this delimited string using linux
Expected output is --> 2017-04-03
Could some one help me to do that?

Short gawk approach:
awk -v d=$date_str 'BEGIN{split(d,a,","); asort(a); print a[1]}'
The output:
2017-04-03
split(d,a,",") - splits "date" string into pieces separated by ,
asort(a) - sorts an array values
a[1] - represents the first item of sorted array

Using awk:
$ awk -v d=$date_str ' # set variable to awk var d
BEGIN {
n=split(d,a,",") # split to a on ,
for(i=1;i<=n;i++) # iterate thru a
if(m==""||a[i]<m) # compare to current min m
m=a[i]
print m # after everything print min m
}'
2017-04-03
Regarding comments:
$ echo $date_str | awk -v RS=, 'NR==1||$0<m{m=$0}END{print m}'
2017-04-03

Use this command:
echo "date_str = 2017-04-03,2017-04-04,2017-04-05" | grep -Po "[0-9][^,]+" | sort -n | head -n 1
result is:
2017-04-03

If you need sorting by date
#!/bin/sh
STR='2017-04-03,2017-04-04,2017-04-05,2017-02-23,2017-04-25,2017-03-12,2016-08-25';
TS_ARR=();
IFS=',' read -r -a dates <<< $STR
for next_date in ${dates[#]}; do
date_ts=`date --date="${next_date}" +%s`
TS_ARR+=($date_ts)
done
IFS=$'\n' SORT_TS=($(sort <<< "${TS_ARR[*]}"))
echo "sorted: `date -d #${SORT_TS[0]} +%Y-%m-%d`"
Should show you sorted: 2016-08-25

Here's a solution that converts the strings to seconds from epoch, sorts them, grabs the first one and converts it back to a string:
date_str="2017-04-03,2017-04-04,2017-04-05"
while IFS=, read -r -a arr || [[ -n $arr ]]; do
for str in ${arr[*]}; do
echo $(date -d "$str" +%s)
done
done <<<"$date_str" |
sort -n |
head -n 1 |
{ read -r earliest; date -d #"${earliest}" +%F ; }

Related

bash count sequential files

I'm pretty new to bash scripting so some of the syntaxes may not be optimal. Please do point them out if you see one.
I have files in a directory named sequentially.
Example: prob01_01 prob01_03 prob01_07 prob02_01 prob02_03 ....
I am trying to have the script iterate through the current directory and count how many extensions each problem has. Then print the pre-extension name then count
Sample output for above would be:
prob01 3
prob02 2
This is my code:
#!/bin/bash
temp=$(mktemp)
element=''
count=0
for i in *
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1"
else
echo $element $count >> temp
element=$current
count=1
fi
done
echo 'heres the temp:'
cat temp
rm 'temp'
The Problem:
Current output:
prob1 3
Desired output:
prob1 3
prob2 2
The last count isn't appended because it's not seeing a different element after it
My Guess on possible solutions:
Have the last append occur at the end of the for loop?
Your code has 2 problems.
The first problem doesn't answer your question. You make a temporary file, the filename is stored in $temp. You should use that one, and not the file with the fixed name temp.
The problem is that you only write results when you see a new problem/filename. The last one will not be printed.
Fixing only these problems will result in
results() {
if (( count == 0 )); then
return
fi
echo $element $count >> "${temp}"
}
temp=$(mktemp)
element=''
count=0
for i in prob*
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1" # Better is using ((count++))
else
results
element=$current
count=1
fi
done
results
echo 'heres the temp:'
cat "${temp}"
rm "${temp}"
You can do without the script with
ls prob* | cut -d"_" -f1 | sort | uniq -c
When you want the have the output displayed as given, you need one more step.
ls prob* | cut -d"_" -f1 | sort | uniq -c | awk '{print $2 " " $1}'
You may use printf + awk solution:
printf '%s\n' *_* | awk -F_ '{a[$1]++} END{for (i in a) print i, a[i]}'
prob01 3
prob02 2
We use printf to print each file that has at least one _
We use awk to get a count of each file's first element delimited by _ by using an associative array.
I would do it like this:
$ ls | awk -F_ '{print $1}' | sort | uniq -c | awk '{print $2 " " $1}'
prob01 3
prob02 2

Bash: Split a string by delimiter ignoring spaces

I have string like
name1::1.1.1.1::ps -ax
I want to split the string based on delimiter :: using bash scripting.
The desired output should be an array of 3 elements
("name1" "1.1.1.1" "ps -ax")
without double quotes
I appreciate your help.
Assuming there are no :s in the array data, use bash pattern substitution to squeeze the :: to : while assigning the string to $array, then show the whole array, then just element #2:
a="name1::1.1.1.1::ps -ax"
IFS=: array=(${a//::/:}) ; echo ${array[#]} ; echo "${array[2]}"
Output:
name1 1.1.1.1 ps -ax
ps -ax
But what if there are :s in the array data? Specifically in the third field, (the command), and only in that field. Use read with dummy variables to absorb the extra :: separators:
a="name1::1.1.1.1::parallel echo ::: 1 2 3 ::: a b"
IFS=: read x a y b z <<< "$a"; array=("$x" "$y" "$z"); printf "%s\n" "${array[#]}"
Output:
name1
1.1.1.1
parallel echo ::: 1 2 3 ::: a b
The only safe possibility is use a loop:
a='name1::1.1.1.1::ps -ax'
array=()
a+=:: # artificially append the separator
while [[ $a ]]; do
array+=( "${a%%::*}" )
a=${a#*::}
done
This will work with any symbol in a (spaces, glob characters, newlines, etc.)
echo "name1::1.1.1.1::ps -ax" | awk -F"::" '{print $1 $2 $3}'
i=0
string="name1::1.1.1.1::ps -ax"
echo "$string" | awk 'BEGIN{FS="::";OFS="\n"}{$1=$1;print $0}'>tempFile
while read line;
do
arr["$i"]="$line"
i=$(expr $i + 1)
done<tempFile
echo "${arr[#]}"
echo "${arr[0]}"
echo "${arr[1]}"
echo "${arr[2]}"
Output:
sh-4.4$ ./script1.sh
name1 1.1.1.1 ps -ax
name1
1.1.1.1
ps -ax

How to increment version number using shell script?

I have a version number with three columns and two digits (xx:xx:xx). Can anyone please tell me how to increment that using shell script.
Min Value
00:00:00
Max Value
99:99:99
Sample IO
10:23:56 -> 10:23:57
62:54:99 -> 62:55:00
87:99:99 -> 88:00:00
As a one liner using awk, assuming VERSION is a variable with the version in it:
echo $VERSION | awk 'BEGIN { FS=":" } { $3++; if ($3 > 99) { $3=0; $2++; if ($2 > 99) { $2=0; $1++ } } } { printf "%02d:%02d:%02d\n", $1, $2, $3 }'
Nothing fancy (other than Bash) needed:
$ ver=87:99:99
$ echo "$ver"
87:99:99
$ printf -v ver '%06d' $((10#${ver//:}+1))
$ ver=${ver%????}:${ver: -4:2}:${ver: -2:2}
$ echo "$ver"
88:00:00
We just use the parameter expansion ${ver//:} to remove the colons: we're then left with a usual decimal number, increment it and reformat it using printf; then use some more parameter expansions to group the digits.
This assumes that ver has already been thorougly checked (with a regex or glob).
It's easy, just needs some little math tricks and bc command, here is how:
#!/bin/bash
# read VERSION from $1 into VER
IFS=':' read -r -a VER <<< "$1"
# increment by 1
INCR=$(echo "ibase=10; ${VER[0]}*100*100+${VER[1]}*100+${VER[2]}+1"|bc)
# prepend zeros
INCR=$(printf "%06d" ${INCR})
# output the result
echo ${INCR:0:2}:${INCR:2:2}:${INCR:4:2}
If you need overflow checking you can do it with the trick like INCR statement.
This basically works, but may or may not do string padding:
IN=43:99:99
F1=`echo $IN | cut -f1 '-d:'`
F2=`echo $IN | cut -f2 '-d:'`
F3=`echo $IN | cut -f3 '-d:'`
F3=$(( F3 + 1 ))
if [ "$F3" -gt 99 ] ; then F3=00 ; F2=$(( F2 + 1 )) ; fi
if [ "$F2" -gt 99 ] ; then F2=00 ; F1=$(( F1 + 1 )) ; fi
OUT="$F1:$F2:$F3"
echo $OUT
try this one liner:
awk '{gsub(/:/,"");$0++;gsub(/../,"&:");sub(/:$/,"")}7'
tests:
kent$ awk '{gsub(/:/,"");$0++;gsub(/../,"&:");sub(/:$/,"")}7' <<< "22:33:99"
22:34:00
kent$ awk '{gsub(/:/,"");$0++;gsub(/../,"&:");sub(/:$/,"")}7' <<< "22:99:99"
23:00:00
kent$ awk '{gsub(/:/,"");$0++;gsub(/../,"&:");sub(/:$/,"")}7' <<< "22:99:88"
22:99:89
Note, corner cases were not tested.

How to generate string elements that don't match a pattern?

If I have
days="1 2 3 4 5 6"
func() {
echo "lSecure1"
echo "lSecure"
echo "lSecure4"
echo "lSecure6"
echo "something else"
}
and do
func | egrep "lSecure[1-6]"
then I get
lSecure1
lSecure4
lSecure6
but what I would like is
lSecure2
lSecure3
lSecure5
which is all the days that doesn't have a lSecure string.
Question
My current idea is to use awk to split the $days and then loop over all combinations.
Is there a better way?
Note that grep -v inverts the sense of a plain grep and does not solve the problem as it does not generate the required strings.
I usually use the -f flag of grep for similar purposes. The <( ... ) code generates a file with all possibilities, grep only selects those not present in the func.
func | grep 'lSecure[1-6]' | grep -v -f- <( for i in $days ; do echo lSecure$i ; done )
Or, you may prefer it the other way round:
for i in $days ; do echo lSecure$i ; done | grep -vf <( func | grep 'lSecure[1-6]' )
F=$(func)
for f in $days; do
if ! echo $F | grep -q lSecure$f; then
echo lSecure$f
fi
done
An awk solution:
$ func | awk -v i="${days}" 'BEGIN{split(i,a," ")}{gsub(/lSecure/,"");
for(var in a)if(a[var] == $0){delete a[var];break}}
END{for(var in a) print "lSecure" a[var]}' | sort
We store it in an awk array a then while reading a line, get the last number, if it is present in array, then remove that from the array. So at the end, in the array, only those element which have not been found remains. Sort is just to present in a sorted manner :)
I am not sure exactly what you are trying to achieve, but you might consider using uniq -u which deletes repeated sequences. For example you can do this with it:
( echo "$days" | tr -s ' ' '\n'; func | grep -oP '(?<=lSecure)[1-6]' ) | sort | uniq -u
Output:
2
3
5

unix - breakdown of how many lines with number of character occurrences

Is there an inbuilt command to do this or has anyone had any luck with a script that does it?
I am looking to get counts of how many lines had how many occurrences of a specfic character. (sorted descending by the number of occurrences)
For example, with this sample file:
gkdjpgfdpgdp
fdkj
pgdppp
ppp
gfjkl
Suggested input (for the 'p' character)
bash/perl some_script_name "p" samplefile
Desired output:
occs count
4 1
3 2
0 2
Update:
How would you write a solution that worked off a 2 character string such as 'gd' not a just a specific character such as p?
$ sed 's/[^p]//g' input.txt | awk '{print length}' | sort -nr | uniq -c | awk 'BEGIN{print "occs", "count"}{print $2,$1}' | column -t
occs count
4 1
3 2
0 2
You could give the desired character as the field separator for awk, and do this:
awk -F 'p' '{ print NF-1 }' |
sort -k1nr |
uniq -c |
awk -v OFS="\t" 'BEGIN { print "occs", "count" } { print $2, $1 }'
For your sample data, it produces:
occs count
4 1
3 2
0 2
If you want to count occurrences of multi-character strings, just give the desired string as the separator, e.g., awk -F 'gd' ... or awk -F 'pp' ....
#!/usr/bin/env perl
use strict; use warnings;
my $seq = shift #ARGV;
die unless defined $seq;
my %freq;
while ( my $line = <> ) {
last unless $line =~ /\S/;
my $occurances = () = $line =~ /(\Q$seq\E)/g;
$freq{ $occurances } += 1;
}
for my $occurances ( sort { $b <=> $a} keys %freq ) {
print "$occurances:\t$freq{$occurances}\n";
}
If you want short, you can always use:
#!/usr/bin/env perl
$x=shift;/\S/&&++$f{$a=()=/(\Q$x\E)/g}while<>
;print"$_:\t$f{$_}\n"for sort{$b<=>$a}keys%f;
or, perl -e '$x=shift;/\S/&&++$f{$a=()=/(\Q$x\E)/g}while<>;print"$_:\t$f{$_}\n"for sort{$b<=>$a}keys%f' inputfile, but now I am getting silly.
Pure Bash:
declare -a count
while read ; do
cnt=${REPLY//[^p]/} # remove non-p characters
((count[${#cnt}]++)) # use length as array index
done < "$infile"
for idx in ${!count[*]} # iterate over existing indices
do echo -e "$idx ${count[idx]}"
done | sort -nr
Output as desired:
4 1
3 2
0 2
Can to it in one gawk process (well, with a sort coprocess)
gawk -F p -v OFS='\t' '
{ count[NF-1]++ }
END {
print "occs", "count"
coproc = "sort -rn"
for (n in count)
print n, count[n] |& coproc
close(coproc, "to")
while ((coproc |& getline) > 0)
print
close(coproc)
}
'
Shortest solution so far:
perl -nE'say tr/p//' | sort -nr | uniq -c |
awk 'BEGIN{print "occs","count"}{print $2,$1}' |
column -t
For multiple characters, use a regex pattern:
perl -ple'$_ = () = /pg/g' | sort -nr | uniq -c |
awk 'BEGIN{print "occs","count"}{print $2,$1}' |
column -t
This one handles overlapping matches (e.g. it finds 3 "pp" in "pppp" instead of 2):
perl -ple'$_ = () = /(?=pp)/g' | sort -nr | uniq -c |
awk 'BEGIN{print "occs","count"}{print $2,$1}' |
column -t
Original cryptic but short pure-Perl version:
perl -nE'
++$c{ () = /pg/g };
}{
say "occs\tcount";
say "$_\t$c{$_}" for sort { $b <=> $a } keys %c;
'

Resources