How to generate string elements that don't match a pattern?

How to generate string elements that don't match a pattern? - linux

If I have
days="1 2 3 4 5 6"
func() {
echo "lSecure1"
echo "lSecure"
echo "lSecure4"
echo "lSecure6"
echo "something else"
}
and do
func | egrep "lSecure[1-6]"
then I get
lSecure1
lSecure4
lSecure6
but what I would like is
lSecure2
lSecure3
lSecure5
which is all the days that doesn't have a lSecure string.
Question
My current idea is to use awk to split the $days and then loop over all combinations.
Is there a better way?
Note that grep -v inverts the sense of a plain grep and does not solve the problem as it does not generate the required strings.

I usually use the -f flag of grep for similar purposes. The <( ... ) code generates a file with all possibilities, grep only selects those not present in the func.
func | grep 'lSecure[1-6]' | grep -v -f- <( for i in $days ; do echo lSecure$i ; done )
Or, you may prefer it the other way round:
for i in $days ; do echo lSecure$i ; done | grep -vf <( func | grep 'lSecure[1-6]' )

F=$(func)
for f in $days; do
if ! echo $F | grep -q lSecure$f; then
echo lSecure$f
fi
done

An awk solution:
$ func | awk -v i="${days}" 'BEGIN{split(i,a," ")}{gsub(/lSecure/,"");
for(var in a)if(a[var] == $0){delete a[var];break}}
END{for(var in a) print "lSecure" a[var]}' | sort
We store it in an awk array a then while reading a line, get the last number, if it is present in array, then remove that from the array. So at the end, in the array, only those element which have not been found remains. Sort is just to present in a sorted manner :)

I am not sure exactly what you are trying to achieve, but you might consider using uniq -u which deletes repeated sequences. For example you can do this with it:
( echo "$days" | tr -s ' ' '\n'; func | grep -oP '(?<=lSecure)[1-6]' ) | sort | uniq -u
Output:
2
3
5

Related

min and max of float array bash

I have an array of floats in bash:
eg:
array(1.0002, 1.00232, 1.3222, ....)
I want to find the maximum and the minimum element of the array using bash.
The problem with this is that I have float elements that are not quite supported in bash.
I have tried for example:
IFS=$'\n'
echo "${ar[*]}" | sort -nr | head -n1
but it does not work for floats.
What is the best way to do this ?

There are probably many ways and I don't claim the two following are "the best".
You could use a calculator that supports floats, like bc, for instance:
max="${array[0]}"
min="${array[0]}"
for v in "${a[#]}"; do
max=$(echo "if($v>$max) $v else $max" | bc)
min=$(echo "if($v<$min) $v else $min" | bc)
done
echo "max=$max"
echo "min=$min"
awk also supports floats, so the following would do the same:
printf '%s\n' "${array[#]}" | \
awk '$1>max||NR==1 {max=$1}
$1<min||NR==1 {min=$1}
END {print "max=" max; print "min=" min}'

$array=(1.0002 1.00232 1.3222)
$printf "%s\n" "${array[#]}" | sort -rn | head -n1
1.3222
$ printf "%s\n" "${array[#]}" | sort -rn | tail -n1
1.0002
#Tried with %s or %f

Linux usernames /etc/passwd listing

I want to print the longest and shortest username found in /etc/passwd. If I run the code below it works fine for the shortest (head -1), but doesn't run for (sort -n |tail -1 | awk '{print $2}). Can anyone help me figure out what's wrong?
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
sort -n |tail -1 | awk '{print $2}'

Here the issue is:
Piping finishes with the first sort -n |head -1 | awk '{print $2}' command. So, input to first command is provided through piping and output is obtained.
For the second command, no input is given. So, it waits for the input from STDIN which is the keyboard and you can feed the input through keyboard and press ctrl+D to obtain output.
Please run the code like below to get desired output:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |tail -1 | awk '{print $2}
'

All you need is:
$ awk -F: '
NR==1 { min=max=$1 }
length($1) > length(max) { max=$1 }
length($1) < length(min) { min=$1 }
END { print min ORS max }
' /etc/passwd
No explicit loops or pipelines or multiple commands required.

The problem is that you only have two pipelines, when you really need one. So you have grep | while read do ... done | sort | head | awk and sort | tail | awk: the first sort has an input (i.e., the while loop) - the second sort doesn't. So the script is hanging because your second sort doesn't have an input: or rather it does, but it's STDIN.
There's various ways to resolve:
save the output of the while loop to a temporary file and use that as an input to both sort commands
repeat your while loop
use awk to do both the head and tail
The first two involve iterating over the password file twice, which may be okay - depends what you're ultimately trying to do. But using a small awk script, this can give you both the first and last line by way of the BEGIN and END blocks.

While you already have good answers, you can also use POSIX shell to accomplish your goal without any pipe at all using the parameter expansion and string length provided by the shell itself (see: POSIX shell specifiction). For example you could do the following:
#!/bin/sh
sl=32;ll=0;sn=;ln=; ## short len, long len, short name, long name
while read -r line; do ## read each line
u=${line%%:*} ## get user
len=${#u} ## get length
[ "$len" -lt "$sl" ] && { sl="$len"; sn="$u"; } ## if shorter, save len, name
[ "$len" -gt "$ll" ] && { ll="$len"; ln="$u"; } ## if longer, save len, name
done </etc/passwd
printf "shortest (%2d): %s\nlongest (%2d): %s\n" $sl "$sn" $ll "$ln"
Example Use/Output
$ sh cketcpw.sh
shortest ( 2): at
longest (17): systemd-bus-proxy
Using either pipe/head/tail/awk or the shell itself is fine. It's good to have alternatives.
(note: if you have multiple users of the same length, this just picks the first, you can use a temp file if you want to save all names and use -le and -ge for the comparison.)

If you want both the head and the tail from the same input, you may want something like sed -e 1b -e '$!d' after you sort the data to get the top and bottom lines using sed.
So your script would be:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n | sed -e 1b -e '$!d'
Alternatively, a shorter way:
cut -d":" -f1 /etc/passwd | awk '{ print length, $0 }' | sort -n | cut -d" " -f2- | sed -e 1b -e '$!d'

How do i append some text to pipe without temporary file

I am trying to get the max version number from a directory where i have several versions of one program
for example if output of ls is
something01_1.sh
something02_0.1.2.sh
something02_0.1.sh
something02_1.1.sh
something02_1.2.sh
something02_2.0.sh
something02_2.1.sh
something02_2.3.sh
something02_3.1.2.sh
something.sh
I am getting the max version number with the following -
ls somedir | grep some_prefix | cut -d '_' -f2 | sort -t '.' -k1 -r | head -n 1
Now if at the same time i want to check it with the version number which i already have in the system, whats the best way to do it...
in bash i got this working (if 2.5 is the current version)
(ls somedir | grep some_prefix | cut -d '_' -f2; echo 2.5) | sort -t '.' -k1 -r | head -n 1
is there any other correct way to do it?
EDIT: In the above example some_prefix is something02.
EDIT: Actual Problem here is
(ls smthing; echo more) | sort
is it the best way to merge output of two commands/program for piping into third.

I have found the solution. The best way it seems is using process substitution.
cat <(ls smthing) <(echo more) | sort
for my version example
cat <(ls somedir | grep some_prefix | cut -d '_' -f2) <(echo 2.5) | sort -t '.' -k1 -r | head -n 1
for the benefit of future readers, I recommend - please drop the lure of one-liner and use glob as chepner suggested.
Almost similar question is asked on superuser.
more info about process substitution.

Is the following code more suitable to what you're looking for:
#/bin/bash
highest_version=$(ls something* | sort -V | tail -1 | sed "s/something02_\|\.sh//g")
current_version=$(echo $0 | sed "s/something02_\|\.sh//g")
if [ $current_version > $highest_version ]; then
echo "Uh oh! Looks like we need to update!";
fi

You can try something like this :
#! /bin/bash
lastversion() { # prefix
local prefix="$1" a=0 b=0 c=0 r f vmax=0
for f in "$prefix"* ; do
test -f "$f" || continue
read a b c r <<< $(echo "${f#$prefix} 0 0 0" | tr -C '[0-9]' ' ')
v=$(((a*100+b)*100+c))
if ((v>vmax)); then vmax=$v; fi
done
echo $vmax
}
lastversion "something02"
It will print: 30102

sorting a "key/value pair" array in bash

How do I sort a "python dictionary-style" array e.g. ( "A: 2" "B: 3" "C: 1" ) in bash by the value? I think, this code snippet will make it bit more clear about my question.
State="Total 4 0 1 1 2 0 0"
W=$(echo $State | awk '{print $3}')
C=$(echo $State | awk '{print $4}')
U=$(echo $State | awk '{print $5}')
M=$(echo $State | awk '{print $6}')
WCUM=( "Owner: $W;" "Claimed: $C;" "Unclaimed: $U;" "Matched: $M" )
echo ${WCUM[#]}
This will simply print the array: Owner: 0; Claimed: 1; Unclaimed: 1; Matched: 2
How do I sort the array (or the output), eliminating any pair with "0" value, so that the result like this:
Matched: 2; Claimed: 1; Unclaimed: 1
Thanks in advance for any help or suggestions. Cheers!!

Quick and dirty idea would be (this just sorts the output, not the array):
echo ${WCUM[#]} | sed -e 's/; /;\n/g' | awk -F: '!/ 0;?/ {print $0}' | sort -t: -k 2 -r | xargs

echo -e ${WCUM[#]} | tr ';' '\n' | sort -r -k2 | egrep -v ": 0$"
Sorting and filtering are independent steps, so if you only like to filter 0 values, it would be much more easy.
Append an
| tr '\n' ';'
to get it to a single line again in the end.
nonull=$(for n in ${!WCUM[#]}; do echo ${WCUM[n]} | egrep -v ": 0;"; done | tr -d "\n")
I don't see a good reason to end $W $C $U with a semicolon, but $M not, so instead of adapting my code to this distinction I would eliminate this special case. If not possible, I would append a semicolon temporary to $M and remove it in the end.

Another attempt, using some of the bash features, but still needs sort, that is crucial:
#! /bin/bash
State="Total 4 1 0 4 2 0 0"
string=$State
for i in 1 2 ; do # remove unnecessary fields
string=${string#* }
string=${string% *}
done
# Insert labels
string=Owner:${string/ /;Claimed:}
string=${string/ /;Unclaimed:}
string=${string/ /;Matched:}
# Remove zeros
string=(${string[#]//;/; })
string=(${string[#]/*:0;/})
string=${string[#]}
# Format
string=${string//;/$'\n'}
string=${string//:/: }
# Sort
string=$(sort -t: -nk2 <<< "$string")
string=${string//$'\n'/;}
echo "$string"

Finding the longest word in a text file

I am trying to make a a simple script of finding the largest word and its number/length in a text file using bash. I know when I use awk its simple and straight forward but I want to try and use this method...lets say I know if a=wmememememe and if I want to find the length I can use echo {#a} its word I would echo ${a}. But I want to apply it on this below
for i in `cat so.txt` do
Where so.txt contains words, I hope it makes sense.

bash one liner.
sed 's/ /\n/g' YOUR_FILENAME | sort | uniq | awk '{print length, $0}' | sort -nr | head -n 1
read file and split the words (via sed)
remove duplicates (via sort | uniq)
prefix each word with it's length (awk)
sort the list by the word length
print the single word with greatest length.
yes this will be slower than some of the above solutions, but it also doesn't require remembering the semantics of bash for loops.

Normally, you'd want to use a while read loop instead of for i in $(cat), but since you want all the words to be split, in this case it would work out OK.
#!/bin/bash
longest=0
for word in $(<so.txt)
do
len=${#word}
if (( len > longest ))
then
longest=$len
longword=$word
fi
done
printf 'The longest word is %s and its length is %d.\n' "$longword" "$longest"

Another solution:
for item in $(cat "$infile"); do
length[${#item}]=$item # use word length as index
done
maxword=${length[#]: -1} # select last array element
printf "longest word '%s', length %d" ${maxword} ${#maxword}

longest=""
for word in $(cat so.txt); do
if [ ${#word} -gt ${#longest} ]; then
longest=$word
fi
done
echo $longest

awk script:
#!/usr/bin/awk -f
# Initialize two variables
BEGIN {
maxlength=0;
maxword=0
}
# Loop through each word on the line
{
for(i=1;i<=NF;i++)
# Assign the maxlength variable if length of word found is greater. Also, assign
# the word to maxword variable.
if (length($i)>maxlength)
{
maxlength=length($i);
maxword=$i;
}
}
# Print out the maxword and the maxlength
END {
print maxword,maxlength;
}
Textfile:
[jaypal:~/Temp] cat textfile
AWK utility is a data_extraction and reporting tool that uses a data-driven scripting language
consisting of a set of actions to be taken against textual data (either in files or data streams)
for the purpose of producing formatted reports.
The language used by awk extensively uses the string datatype,
associative arrays (that is, arrays indexed by key strings), and regular expressions.
Test:
[jaypal:~/Temp] ./script.awk textfile
data_extraction 15

Relatively speedy bash function using no external utils:
# Usage: longcount < textfile
longcount ()
{
declare -a c;
while read x; do
c[${#x}]="$x";
done;
echo ${#c[#]} "${c[${#c[#]}]}"
}
Example:
longcount < /usr/share/dict/words
Output:
23 electroencephalograph's
'Modified POSIX shell version of jimis' xargs-based
answer; still very slow, takes two or three minutes:
tr "'" '_' < /usr/share/dict/words |
xargs -P$(nproc) -n1 -i sh -c 'set -- {} ; echo ${#1} "$1"' |
sort -n | tail | tr '_' "'"
Note the leading and trailing tr bit to get around GNU xargs
difficulty with single quotes.

for i in $(cat so.txt); do echo ${#i}; done | paste - so.txt | sort -n | tail -1

Slow because of the gazillion of forks, but pure shell, does not require awk or special bash features:
$ cat /usr/share/dict/words | \
xargs -n1 -I '{}' -d '\n' sh -c 'echo `echo -n "{}" | wc -c` "{}"' | \
sort -n | tail
23 Pseudolamellibranchiata
23 pseudolamellibranchiate
23 scientificogeographical
23 thymolsulphonephthalein
23 transubstantiationalist
24 formaldehydesulphoxylate
24 pathologicopsychological
24 scientificophilosophical
24 tetraiodophenolphthalein
24 thyroparathyroidectomize
You can easily parallelize, e.g. to 4 CPUs by providing -P4 to xargs.
EDIT: modified to work with the single quotes that some dictionaries have. Now it requires GNU xargs because of -d argument.
EDIT2: for the fun of it, here is another version that handles all kinds of special characters, but requires the -0 option to xargs. I also added -P4 to compute on 4 cores:
cat /usr/share/dict/words | tr '\n' '\0' | \
xargs -0 -I {} -n1 -P4 sh -c 'echo ${#1} "$1"' wordcount {} | \
sort -n | tail

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to generate string elements that don't match a pattern? - linux

F=$(func) for f in $days; do if ! echo $F | grep -q lSecure$f; then echo lSecure$f fi done

I am not sure exactly what you are trying to achieve, but you might consider using uniq -u which deletes repeated sequences. For example you can do this with it: ( echo "$days" | tr -s ' ' '\n'; func | grep -oP '(?<=lSecure)[1-6]' ) | sort | uniq -u Output: 2 3 5

Related

min and max of float array bash

Linux usernames /etc/passwd listing

How do i append some text to pipe without temporary file

sorting a "key/value pair" array in bash

Finding the longest word in a text file

Categories

Resources