JQ: Enumerate Object Stream - object

I've got a pipe full of objects and I am trying to add a accumulating count string to each object in a stream with jq to get the following output
{"count":"Num_000"}
{"count":"Num_001"}
{"count":"Num_002"}
{"count":"Num_003"}
{"count":"Num_004"}
{"count":"Num_005"}
{"count":"Num_006"}
{"count":"Num_007"}
{"count":"Num_008"}
{"count":"Num_009"}
Something like the following but I'm sure I don't need to rely on awk.
yes '{}' | head -n10 | jq -c '.count|="Num_ "' | awk '{printf("%s%03i%s\n",$1,NR-1,$2)}'
So far I have found one way to get the count into my objects but it feels very wasteful since I slurp up all the objects.
yes '{}' | head -n10 | jq -c -s 'range(0;.|length) as $i|(.[$i]|.count|=$i)'
I'm going to keep playing with this but I figured this was a chance for me to learn. Any ideas how i can do this more efficiently?
I've also figured out one hack-y way to format the string since I assume < 1000 objects in my stream.
yes '{}' | head -n20 | jq -c -s 'range(0;.|length) as $i|(.[$i]|.count|=(1000+$i|tostring|ltrimstr("1")|"Num_"+.))'

Using a recent version of jq (i.e. with foreach and inputs), e.g jq 1.5rc1, the task can be performed efficiently and quite elegantly along the following lines:
yes 1 | head -n10 |\
jq -c -n 'foreach inputs as $line (0; .+1; {"count": "Num_\(.)"})'
The key here is to use the -n option.

Using the -s (slurp) option, you could do the following:
yes '{}' | head -n10 | jq -s 'to_entries | map(.value.count = .key)[].value'
But, yes, as you said yourself, slurping is wasteful; and, even worse, it blocks the stream.
What you could do instead is, for each element, compact it so that it takes one line (piping it through jq -c '.'; your "yes" object sample doesn't need it, but arbitrary objects coming from the pipeline might) and then iterate through it on your shell. On fish shell, but easily portable to anything else:
set j 0
for i in (yes '{}' | head -n 100000 | jq -c '.')
set j (expr $j + 1)
echo $i | jq --arg j $j '.count = ($j | tonumber)'
end

Related

Required nested while loop in linux :- I have two variable A=(a1;b2),B=(A1,B2) required output like this1=A1 & b2=B2

while read line; do while read line1; do $line | grep $line1; done < <(echo "opt;Mem" ';' '\n');done < <(echo "df -k;free -b" | tr ';' '\n')
I have use this but its compeers the send all second variable
My requirement is to run some set of commands (separated by ;) and grep it with expected output (separated by ;)
I am using below command:
while read line
do while read line1
do $line | grep $line1
break
done < <(echo "m;Mem" | tr ';' '\n')
done < <(echo "df -kh;free" | tr ';' '\n')
Output which I am looking for:
df -kH | grep m
free | grep mem
You cannot use nested loops to iterate over two collections in parallel, since as you have noticed it leads to iterating over each item of one collection for each item of the other (n*n iterations when you only want n iterations).
You need to use a single loop in which you will iterate over both collections at once. Making arrays out of your two strings will help you do that :
commands="df -kh
free"
patterns="m
Mem"
readarray -t commands_array <<<"$commands"
readarray -t patterns_array <<<"$patterns"
for ((index=0; index<${#commands_array[#]}; index++)); do
echo "${commands_array[index]} | grep ${patterns_array[index]}"
done
You can try it here.
If you need to handle ;-separated commands and patterns you can use the following instead :
commands="df -kh;free"
patterns="m;Mem"
readarray -d';' -t commands_array < <(echo -n "$commands")
readarray -d';' -t patterns_array < <(echo -n "$patterns")
#[...]
You can try it here.

Using "$RANDOM" to generate a random string in Bash

I am trying to use the Bash variable $RANDOM to create a random string that consists of 8 characters from a variable that contains integer and alphanumeric digits, e.g., var="abcd1234ABCD".
How can I do that?
Use parameter expansion. ${#chars} is the number of possible characters, % is the modulo operator. ${chars:offset:length} selects the character(s) at position offset, i.e. 0 - length($chars) in our case.
chars=abcd1234ABCD
for i in {1..8} ; do
echo -n "${chars:RANDOM%${#chars}:1}"
done
echo
For those looking for a random alpha-numeric string in bash:
LC_ALL=C tr -dc A-Za-z0-9 </dev/urandom | head -c 64
The same as a well-documented function:
function rand-str {
# Return random alpha-numeric string of given LENGTH
#
# Usage: VALUE=$(rand-str $LENGTH)
# or: VALUE=$(rand-str)
local DEFAULT_LENGTH=64
local LENGTH=${1:-$DEFAULT_LENGTH}
LC_ALL=C tr -dc A-Za-z0-9 </dev/urandom | head -c $LENGTH
# LC_ALL=C: required for Mac OS X - https://unix.stackexchange.com/a/363194/403075
# -dc: delete complementary set == delete all except given set
}
Another way to generate a 32 bytes (for example) hexadecimal string:
xxd -l 32 -c 32 -p < /dev/random
add -u if you want uppercase characters instead.
OPTION 1 - No specific length, no openssl needed, only letters and numbers, slower than option 2
sed "s/[^a-zA-Z0-9]//g" <<< $(cat /dev/urandom | tr -dc 'a-zA-Z0-9!##$%*()-+' | fold -w 32 | head -n 1)
DEMO: x=100; while [ $x -gt 0 ]; do sed "s/[^a-zA-Z0-9]//g" <<< $(cat /dev/urandom | tr -dc 'a-zA-Z0-9!##$%*()-+' | fold -w 32 | head -n 1) <<< $(openssl rand -base64 17); x=$(($x-1)); done
Examples:
j0PYAlRI1r8zIoOSyBhh9MTtrhcI6d
nrCaiO35BWWQvHE66PjMLGVJPkZ6GBK
0WUHqiXgxLq0V0mBw2d7uafhZt2s
c1KyNeznHltcRrudYpLtDZIc1
edIUBRfttFHVM6Ru7h73StzDnG
OPTION 2 - No specific length, openssl needed, only letters and numbers, faster than option 1
openssl rand -base64 12 # only returns
rand=$(openssl rand -base64 12) # only saves to var
sed "s/[^a-zA-Z0-9]//g" <<< $(openssl rand -base64 17) # leave only letters and numbers
# The last command can go to a var too.
DEMO: x=100; while [ $x -gt 0 ]; do sed "s/[^a-zA-Z0-9]//g" <<< $(openssl rand -base64 17); x=$(($x-1)); done
Examples:
9FbVwZZRQeZSARCH
9f8869EVaUS2jA7Y
V5TJ541atfSQQwNI
V7tgXaVzmBhciXxS
Others options not necessarily related:
uuidgen or cat /proc/sys/kernel/random/uuid
After generating 1 billion UUIDs every second for the next 100 years,
the probability of creating just one duplicate would be about 50%. The
probability of one duplicate would be about 50% if every person on
earth owns 600 million UUIDs 😇 source
Not using $RANDOM, but worth mentioning.
Using shuf as source of entropy (a.k.a randomness) (which, in turn, may use /dev/random as source of entropy. As in `shuf -i1-10 --random-source=/dev/urandom) seems like a solution that use less resources:
$ shuf -er -n8 {A..Z} {a..z} {0..9} | paste -sd ""
tf8ZDZ4U
head -1 <(fold -w 20 <(tr -dc 'a-zA-Z0-9' < /dev/urandom))
This is safe to use in bash script if you have safety options turned on:
set -eou pipefail
This is a workaround of bash exit status 141 when you use pipes
tr -dc 'a-zA-Z0-9' < /dev/urandom | fold -w 20 | head -1
Little bit obscure but short to write solution is
RANDSTR=$(mktemp XXXXX) && rm "$RANDSTR"
expecting you have write access to current directory ;-)
mktemp is part of coreutils
UPDATE:
As Bazi pointed out in the comment, mktemp can be used without creating the file ;-) so the command can be even shorter.
RANDSTR=$(mktemp --dry-run XXXXX)
Using sparse array to shuffle characters.
#!/bin/bash
array=()
for i in {a..z} {A..Z} {0..9}; do
array[$RANDOM]=$i
done
printf %s ${array[#]::8} $'\n'
(Or alot of random strings)
#!/bin/bash
b=()
while ((${#b[#]} <= 32768)); do
a=(); for i in {a..z} {A..Z} {0..9}; do a[$RANDOM]=$i; done; b+=(${a[#]})
done
tr -d ' ' <<< ${b[#]} | fold -w 8 | head -n 4096
An abbreviated safe pipe workaround based on Radu Gabriel's answer and tested with GNU bash version 4.4.20 and set -euxo pipefail:
head -c 20 <(tr -dc [:alnum:] < /dev/urandom)

How to generate string elements that don't match a pattern?

If I have
days="1 2 3 4 5 6"
func() {
echo "lSecure1"
echo "lSecure"
echo "lSecure4"
echo "lSecure6"
echo "something else"
}
and do
func | egrep "lSecure[1-6]"
then I get
lSecure1
lSecure4
lSecure6
but what I would like is
lSecure2
lSecure3
lSecure5
which is all the days that doesn't have a lSecure string.
Question
My current idea is to use awk to split the $days and then loop over all combinations.
Is there a better way?
Note that grep -v inverts the sense of a plain grep and does not solve the problem as it does not generate the required strings.
I usually use the -f flag of grep for similar purposes. The <( ... ) code generates a file with all possibilities, grep only selects those not present in the func.
func | grep 'lSecure[1-6]' | grep -v -f- <( for i in $days ; do echo lSecure$i ; done )
Or, you may prefer it the other way round:
for i in $days ; do echo lSecure$i ; done | grep -vf <( func | grep 'lSecure[1-6]' )
F=$(func)
for f in $days; do
if ! echo $F | grep -q lSecure$f; then
echo lSecure$f
fi
done
An awk solution:
$ func | awk -v i="${days}" 'BEGIN{split(i,a," ")}{gsub(/lSecure/,"");
for(var in a)if(a[var] == $0){delete a[var];break}}
END{for(var in a) print "lSecure" a[var]}' | sort
We store it in an awk array a then while reading a line, get the last number, if it is present in array, then remove that from the array. So at the end, in the array, only those element which have not been found remains. Sort is just to present in a sorted manner :)
I am not sure exactly what you are trying to achieve, but you might consider using uniq -u which deletes repeated sequences. For example you can do this with it:
( echo "$days" | tr -s ' ' '\n'; func | grep -oP '(?<=lSecure)[1-6]' ) | sort | uniq -u
Output:
2
3
5

How do i append some text to pipe without temporary file

I am trying to get the max version number from a directory where i have several versions of one program
for example if output of ls is
something01_1.sh
something02_0.1.2.sh
something02_0.1.sh
something02_1.1.sh
something02_1.2.sh
something02_2.0.sh
something02_2.1.sh
something02_2.3.sh
something02_3.1.2.sh
something.sh
I am getting the max version number with the following -
ls somedir | grep some_prefix | cut -d '_' -f2 | sort -t '.' -k1 -r | head -n 1
Now if at the same time i want to check it with the version number which i already have in the system, whats the best way to do it...
in bash i got this working (if 2.5 is the current version)
(ls somedir | grep some_prefix | cut -d '_' -f2; echo 2.5) | sort -t '.' -k1 -r | head -n 1
is there any other correct way to do it?
EDIT: In the above example some_prefix is something02.
EDIT: Actual Problem here is
(ls smthing; echo more) | sort
is it the best way to merge output of two commands/program for piping into third.
I have found the solution. The best way it seems is using process substitution.
cat <(ls smthing) <(echo more) | sort
for my version example
cat <(ls somedir | grep some_prefix | cut -d '_' -f2) <(echo 2.5) | sort -t '.' -k1 -r | head -n 1
for the benefit of future readers, I recommend - please drop the lure of one-liner and use glob as chepner suggested.
Almost similar question is asked on superuser.
more info about process substitution.
Is the following code more suitable to what you're looking for:
#/bin/bash
highest_version=$(ls something* | sort -V | tail -1 | sed "s/something02_\|\.sh//g")
current_version=$(echo $0 | sed "s/something02_\|\.sh//g")
if [ $current_version > $highest_version ]; then
echo "Uh oh! Looks like we need to update!";
fi
You can try something like this :
#! /bin/bash
lastversion() { # prefix
local prefix="$1" a=0 b=0 c=0 r f vmax=0
for f in "$prefix"* ; do
test -f "$f" || continue
read a b c r <<< $(echo "${f#$prefix} 0 0 0" | tr -C '[0-9]' ' ')
v=$(((a*100+b)*100+c))
if ((v>vmax)); then vmax=$v; fi
done
echo $vmax
}
lastversion "something02"
It will print: 30102

Finding the longest word in a text file

I am trying to make a a simple script of finding the largest word and its number/length in a text file using bash. I know when I use awk its simple and straight forward but I want to try and use this method...lets say I know if a=wmememememe and if I want to find the length I can use echo {#a} its word I would echo ${a}. But I want to apply it on this below
for i in `cat so.txt` do
Where so.txt contains words, I hope it makes sense.
bash one liner.
sed 's/ /\n/g' YOUR_FILENAME | sort | uniq | awk '{print length, $0}' | sort -nr | head -n 1
read file and split the words (via sed)
remove duplicates (via sort | uniq)
prefix each word with it's length (awk)
sort the list by the word length
print the single word with greatest length.
yes this will be slower than some of the above solutions, but it also doesn't require remembering the semantics of bash for loops.
Normally, you'd want to use a while read loop instead of for i in $(cat), but since you want all the words to be split, in this case it would work out OK.
#!/bin/bash
longest=0
for word in $(<so.txt)
do
len=${#word}
if (( len > longest ))
then
longest=$len
longword=$word
fi
done
printf 'The longest word is %s and its length is %d.\n' "$longword" "$longest"
Another solution:
for item in $(cat "$infile"); do
length[${#item}]=$item # use word length as index
done
maxword=${length[#]: -1} # select last array element
printf "longest word '%s', length %d" ${maxword} ${#maxword}
longest=""
for word in $(cat so.txt); do
if [ ${#word} -gt ${#longest} ]; then
longest=$word
fi
done
echo $longest
awk script:
#!/usr/bin/awk -f
# Initialize two variables
BEGIN {
maxlength=0;
maxword=0
}
# Loop through each word on the line
{
for(i=1;i<=NF;i++)
# Assign the maxlength variable if length of word found is greater. Also, assign
# the word to maxword variable.
if (length($i)>maxlength)
{
maxlength=length($i);
maxword=$i;
}
}
# Print out the maxword and the maxlength
END {
print maxword,maxlength;
}
Textfile:
[jaypal:~/Temp] cat textfile
AWK utility is a data_extraction and reporting tool that uses a data-driven scripting language
consisting of a set of actions to be taken against textual data (either in files or data streams)
for the purpose of producing formatted reports.
The language used by awk extensively uses the string datatype,
associative arrays (that is, arrays indexed by key strings), and regular expressions.
Test:
[jaypal:~/Temp] ./script.awk textfile
data_extraction 15
Relatively speedy bash function using no external utils:
# Usage: longcount < textfile
longcount ()
{
declare -a c;
while read x; do
c[${#x}]="$x";
done;
echo ${#c[#]} "${c[${#c[#]}]}"
}
Example:
longcount < /usr/share/dict/words
Output:
23 electroencephalograph's
'Modified POSIX shell version of jimis' xargs-based
answer; still very slow, takes two or three minutes:
tr "'" '_' < /usr/share/dict/words |
xargs -P$(nproc) -n1 -i sh -c 'set -- {} ; echo ${#1} "$1"' |
sort -n | tail | tr '_' "'"
Note the leading and trailing tr bit to get around GNU xargs
difficulty with single quotes.
for i in $(cat so.txt); do echo ${#i}; done | paste - so.txt | sort -n | tail -1
Slow because of the gazillion of forks, but pure shell, does not require awk or special bash features:
$ cat /usr/share/dict/words | \
xargs -n1 -I '{}' -d '\n' sh -c 'echo `echo -n "{}" | wc -c` "{}"' | \
sort -n | tail
23 Pseudolamellibranchiata
23 pseudolamellibranchiate
23 scientificogeographical
23 thymolsulphonephthalein
23 transubstantiationalist
24 formaldehydesulphoxylate
24 pathologicopsychological
24 scientificophilosophical
24 tetraiodophenolphthalein
24 thyroparathyroidectomize
You can easily parallelize, e.g. to 4 CPUs by providing -P4 to xargs.
EDIT: modified to work with the single quotes that some dictionaries have. Now it requires GNU xargs because of -d argument.
EDIT2: for the fun of it, here is another version that handles all kinds of special characters, but requires the -0 option to xargs. I also added -P4 to compute on 4 cores:
cat /usr/share/dict/words | tr '\n' '\0' | \
xargs -0 -I {} -n1 -P4 sh -c 'echo ${#1} "$1"' wordcount {} | \
sort -n | tail

Resources