Better way to pick a random entry from args?

Better way to pick a random entry from args? - linux

Was just wondering because I whipped this up last month.
#!/usr/bin/bash
# Collects all of the args, make sure to seperate with ','
IN="$*"
# Takes everything before a ',' and places them each on a single line of tmp file
echo $IN | sed 's/,/\n/g' > /tmp/pick.a.random.word.or.phrase
# Obvious vars are obvious
WORDFILE="/tmp/pick.a.random.word.or.phrase"
# Pick only one of the vars
NUMWORDS=1
## Picks a random line from tmp file
#Number of lines in $WORDFILE
tL=`awk 'NF!=0 {++c} END {print c}' $WORDFILE`
# Expand random
RANDOM_CMD='od -vAn -N4 -tu4 /dev/urandom'
for i in `seq $NUMWORDS`
do
rnum=$((`${RANDOM_CMD}`%$tL+1))
sed -n "$rnum p" $WORDFILE | tr '\n' ' '
done
printf "\n"
rm /tmp/pick.a.random.word.or.phrase
Mainly I ask:
Do I need to have a tmp file?
Is there a way to do this in one line with another program?
How to condense as much as possible?

The command-line argument handling is, to my mind, bizarre. Why not just use normal command line arguments? That makes the problem trivial:
#!/usr/bin/bash
shuf -en1 "$#"
Of course, you could just use shuf -en1, which is only nine keystrokes:
$ shuf -en1 word another_word "random phrase"
another_word
$ shuf -en1 word another_word "random phrase"
word
$ shuf -en1 word another_word "random phrase"
another_word
$ shuf -en1 word another_word "random phrase"
random phrase
shuf command-line flags:
-e Shuffle command line arguments instead of lines in a file/stdin
-n1 Produce only the first random line (or argument in this case)
If you really insist on running the arguments together and then separating them with commas, you can use the following. As with your original, it will exhibit unexpected behaviour if some word in the arguments could be glob-expanded, so I really don't recommend it:
#!/usr/bin/bash
IFS=, read -ra args <<<"$*"
echo $(shuf -en1 "${args[#]}")
The first line combines the arguments and then splits the result at commas into the array args. (The -a option to read.) Since the string is split at commas, spaces (such as though automatically inserted by the argument concatenation) are preserved; to remove the spaces, I word-split the result of shuf by not quoting the command expansion.

You could use shuff to shorten your script and remove temporary file.
#!/usr/bin/bash
# Collects all of the args, make sure to seperate with ','
IN="$*"
# Takes everything before a ',' and places them in an array
words=($(echo $IN | sed 's/,/ /g'))
# Get random indexi in range: 0, length of array: words
index=$(shuf -i 0-"${#words[#]}" -n 1)
# Print the random index
echo ${words[$index]}
If you don't want to use shuff, you could also use $RANDOM:
#!/usr/bin/bash
# Collects all of the args, make sure to seperate with ','
IN="$*"
# Takes everything before a ',' and places them in an array
words=($(echo $IN | sed 's/,/ /g'))
# Print the random index
echo ${words[$RANDOM % ${#words[#]}]}

shuf in coreutils does exactly this, but with multiple command arguments instead of a single comma separated argument.
shuf -n1 -e arg1 arg2 ...
The -n1 option says to choose just one element. The -e option indicates that elements will be passed as arguments (as opposed to through standard input).
Your script then just needs to replace commas with spaces in $*. We can do this using bash parameter substitution:
#!/usr/bin/bash
shuf -n1 -e ${*//,/ }
This won't work with elements with embedded spaces.

Isn't it as simple as generating a number at random between 1 and $# and simply echo the corresponding argument? It depends on what you have; your comment about 'collect arguments; make sure to separate with commas' isn't clear, because the assignment does nothing with commas — and you don't show how you invoke your command.
I've simply cribbed the random number generation from the question: it works OK on my Mac, generating the values 42,405,691 and 1,817,261,076 on successive runs.
n=$(( $(od -vAn -N4 -tu4 /dev/urandom) % $# + 1 ))
eval echo "\${$n}"
You could even reduce that to a single line if you were really determined:
eval echo "\${$(( $(od -vAn -N4 -tu4 /dev/urandom) % $# + 1 ))}"
This use of eval is safe as it involves no user input. The script should check that it is provided at least one argument to prevent a division-by-zero error if $# is 0. The code does an absolute minimum of data movement — in contrast to solutions which shuffle the data in some way.
If that's packaged in a script random_selection, then I can run:
$ bash random_selection Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Feb
$ bash random_selection Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Oct
$ bash random_selection Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Nov
$
If the total number of arguments is big enough that you run out of argument space, then you need to think again, but that restriction is present in the existing code.
The selection is marginally biassed towards the earlier entries in the list; you have to do a better job of rejecting random numbers that are very near the maximum value in the range. For a random 32-bit unsigned value, if it is larger than $# * (0xFFFFFFFF / $#) you should generate another random number.

Related

shell script with xargs and command line argument [duplicate]

I'm trying to write a bash script that allows the user to pass a directory path using wildcards.
For example,
bash show_files.sh *
when executed within this directory
drw-r--r-- 2 root root 4.0K Sep 18 11:33 dir_a
-rw-r--r-- 1 root root 223 Sep 18 11:33 file_b.txt
-rw-rw-r-- 1 root root 106 Oct 18 15:48 file_c.sql
would output:
dir_a
file_b.txt
file_c.sql
The way it is right now, it outputs:
dir_a
contents of show_files.sh:
#!/bin/bash
dirs="$1"
for dir in $dirs
do
echo $dir
done

The parent shell, the one invoking bash show_files.sh *, expands the * for you.
In your script, you need to use:
for dir in "$#"
do
echo "$dir"
done
The double quotes ensure that multiple spaces etc in file names are handled correctly.
See also How to iterate over arguments in a bash shell script.
Potentially confusing addendum
If you're truly sure you want to get the script to expand the *, you have to make sure that * is passed to the script (enclosed in quotes, as in the other answers), and then make sure it is expanded at the right point in the processing (which is not trivial). At that point, I'd use an array.
names=( $# )
for file in "${names[#]}"
do
echo "$file"
done
I don't often use $# without the double quotes, but this is one time when it is more or less the correct thing to do. The tricky part is that it won't handle wild cards with spaces in very well.
Consider:
$ > "double space.c"
$ > "double space.h"
$ echo double\ \ space.?
double space.c double space.h
$
That works fine. But try passing that as a wild-card to the script and ... well, let's just say it gets to be tricky at that point.
If you want to extract $2 separately, then you can use:
names=( $1 )
for file in "${names[#]}"
do
echo "$file"
done
# ... use $2 ...

Quote the wild-card:
bash show_files.sh '*'
or make your script accept a list of arguments, not just one:
for dir in "$#"
do
echo "$dir"
done
It's better to iterate directly over "$#' rather than assigning it to another variable, in order to preserve its special ability to hold elements that themselves contain whitespace.

split file with output file with numeric suffix but without begin zero

Suppose I have a file temp.txt with 100 lines. I would like to split into 10 parts.
I use following command
split a 1 -l 10 -d temp.txt temp_
But I got temp_0, temp_1, temp_2,...,temp_9. I want output like this temp_1,temp_2,..,temp_10.
From man split
I got
-d, --numeric-suffixes
use numeric suffixes instead of alphabetic
I tried to use
split -l 10 --suffix-length=1 --numeric-suffixes=1 Temp.txt temp_
It says split: option '--numeric-suffixes' doesn't allow an argument
Then, I tried to use
split -l 10 --suffix-length=1 --numeric-suffixes 1 Temp.txt temp_
It says
split: extra operandtemp_'`
The output of split --version is
split (GNU coreutils) 8.4
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Torbj�rn Granlund and Richard M. Stallman.

I tried to use split -a 1 -l 10 -d 1 Temp.txt temp_. But it shows error split: extra operand temp_' `
-d doesn't have an argument. It should be written as you originally tried;
split -a 1 -l 10 -d Temp.txt temp_
But, forgetting the syntax variations for a moment;
you're asking it to split a 100 line file into 10 parts, with a suffix length of 1, starting at 1.
^- This scenario is erroneous as it is asking the command to process 100 lines and giving it fixed parameters restricting it to processing only 90 lines.
If you're willing to extend your allowable suffix length to 2, then you will at least get a uniform two digit temp file starting at 01;
split -a 1 -l 10 --numeric-suffixes=1 -d Temp.txt temp_
Will create: temp_01 thru temp_10
You can actually negate the -a and -d argument altogether;
split -l 10 --numeric-suffixes=1 Temp.txt temp_
Will also create: temp_01 thru temp_10
If for some reason this was a fixed and absolute requirement or a permanent solution (i.e. integrating to something else you have no control of), and it was always going to be an exactly 100 line file, then you could always do it in two passes;
cat Temp.txt | head -n90 | split -a 1 -l 10 --numeric-suffixes=1 - temp_
cat Temp.txt | tail -n10 | split -a 2 -l 10 --numeric-suffixes=10 - temp_
Then you would get temp_1 thru temp_10

Just to throw out a possible alternative, you can accomplish this task manually by running a couple of loops. The outer loop iterates over the file chunks and the inner loop iterates over the lines within the chunk.
{
suf=1;
read -r; rc=$?;
while [[ $rc -eq 0 || -n "$REPLY" ]]; do
line=0;
while [[ ($rc -eq 0 || -n "$REPLY") && line -lt 10 ]]; do
printf '%s\n' "$REPLY";
read -r; rc=$?;
let ++line;
done >temp_$suf;
let ++suf;
done;
} <temp.txt;
Notes:
The test $rc -eq 0 || -n "$REPLY" is necessary to continue processing if either we've not yet reached end-of-file (in which case $rc eq 0 is true) or we have reached end-of-file but there was a non-empty final line in the input file (in which case -n "$REPLY" is true). It's good to try to support the case of a non-empty final line with no end-of-line delimiter, which sometimes happens. In this case read will return a failing status code but will still correctly set $REPLY to contain the non-empty final line content. I've tested the split utility and it correctly handles this case as well.
By calling read once prior to the outer loop and then once after each print, we ensure that we always test if the read was successful prior to printing the resulting line. A more naïve design might read and print in immediate succession with no check in between, which would be incorrect.
I've used the -r option of read to prevent backslash interpolation, which you probably don't want; I assume you want to preserve the contents of temp.txt verbatim.
Obviously there are tradeoffs in this solution. On the one hand, it demands a fair amount of complexity and code verbosity (13 lines the way I've written it). But the advantage is complete control over the behavior of the split operation; you can customize the script to your liking, such as dynamically changing the suffix based on the line number, using a prefix or infix or combination thereof, or even taking into account the contents of the individual file lines in $REPLY.

filtering output of who with grep and cut

I have this exercice :
Create a bash script that check if the user passed as a parameter is
connected and if he is display when he connected. Indications : use the command who, the grep filter and the
command cut.
But i have some trouble to solve it.
#!/bin/bash
who>who.txt;
then
grep $1 who.txt
for a in who.txt
do
echo "$a"
done
else
echo "$1 isnt connected"
fi
So first of all i want to only keep the line where i can find the user in a .txt and then i want to cut each part with a loop in the who command to keep only the date but the problem is that i don't know how to cut here because it's seperated with multiple spaces.
So i'am really blocked and i don't see where to go to do this. I'am a beginner with bash.

If I understand you simply want to check to see if a user is logged in, then that is what the users command is for. If you want to wrap it in a short script, then you could do something like the following:
#!/bin/bash
[ -z "$1" ] && { ## validate 1 argument given on command line
printf "error: insufficient input, usage: %s username.\n" "${0##*/}" >&2
exit 1
}
## check if that argument is among the logged in users
if $(users | grep -q "$1" >/dev/null 2>&1) ; then
printf " user: %s is logged in.\n" "$1"
else
printf " user: %s is NOT logged in.\n" "$1"
fi
Example/Use
$ bash chkuser.sh dog
user: dog is NOT logged in.
$ bash chkuser.sh david
user: david is logged in.

cut is a rather awkward tool for parsing who's output, unless you use fixed column positions. In delimiter mode, with -d ' ', each space makes a separate empty field. It's not like awk where fields are separated by a run of spaces.
who(1) output looks like this (and GNU who has no option to cut it down to just the username/time):
$ who
peter tty1 2015-11-13 18:53
john pts/13 2015-11-12 08:44 (10.0.0.1)
john pts/14 2015-11-12 08:44 (10.0.0.1)
john pts/15 2015-11-12 08:44 (10.0.0.1)
john pts/16 2015-11-12 08:44 (10.0.0.1)
peter pts/9 2015-11-14 16:09 (:0)
I didn't check what happens with very long usernames, whether they're truncated or whether they shift the rest of the line over. Parsing it with awk '{print $3, $4} would feel much safer, since it would still work regardless of exact column position.
But since you need to use cut, let's assume that those exact column positions (time starting from 23 and running until 38) are constant across all systems where we want this script to work, and all terminal widths. (who doesn't appear to vary its output for $COLUMNS or the tty-driver column width (the columns field in stty -a output)).
Putting all that together:
#!/bin/sh
who | grep "^$1 " | cut -c 23-38
The regex on the grep command line will only match at the beginning of the line, and has to match a space following the username (to avoid substring matches). Then those lines that match are filtered through cut, to extract only the columns containing the timestamp.
With an empty cmdline arg, will print the login time for every logged-in user. If the pattern doesn't match anything, the output will be empty. To explicitly detect this and print something else, capture the pipeline output with var=$(pipeline), and test if it's the empty-string or not.
This will print a time for every separate login from the same user. You could use grep's count limit arg (see the man page) to stop after one match, but it might not be the most recent time. You might use sort -n | head -1 or something.
If you don't have to write a loop in the shell, don't. It's much better to write a pipeline that makes one pass over the data. The shell itself is slow, but as long as it doesn't have to parse every line of what you're dealing with, that doesn't matter.
Also note how I quoted the expansion of $1 with double quotes, to avoid the shell applying word splitting and glob expansion to it.
For more shell stuff, see the Wooledge Bash FAQ and guide. That's a good place to get started learning idioms that don't suck (i.e. don't break when you have filenames and directories with spaces in them, or filenames containing a ?, or lines with trailing spaces that you want to not munge...).

how to use variables with brace expansion [duplicate]

This question already has answers here:
Brace expansion with variable? [duplicate]
(6 answers)
Closed 4 years ago.
I have four files:
1.txt 2.txt 3.txt 4.txt
in linux shell, I could use :
ls {1..4}.txt to list all the four files
but if I set two variables : var1=1 and var2=4, how to list the four files?
that is:
var1=1
var2=4
ls {$var1..$var2}.txt # error
what is the correct code?

Using variables with the sequence-expression form ({<numFrom>..<numTo>}) of brace expansion only works in ksh and zsh, but, unfortunately, not in bash (and (mostly) strictly POSIX-features-only shells such as dash do not support brace expansion at all, so brace expansion should be avoided with /bin/sh altogether).
Given your symptoms, I assume you're using bash, where you can only use literals in sequence expressions (e.g., {1..3}); from the manual (emphasis mine):
Brace expansion is performed before any other expansions, and any characters special to other expansions are preserved in the result.
In other words: at the time a brace expression is evaluated, variable references have not been expanded (resolved) yet; interpreting literals such as $var1 and $var2 as numbers in the context of a sequence expression therefore fails, so the brace expression is considered invalid and as not expanded.
Note, however, that the variable references are expanded, namely at a later stage of overall expansion; in the case at hand the literal result is the single word '{1..4}' - an unexpanded brace expression with variable values expanded.
While the list form of brace expansion (e.g., {foo,bar)) is expanded the same way, later variable expansion is not an issue there, because no interpretation of the list elements is needed up front; e.g. {$var1,$var2} correctly results in the 2 words 1 and 4.
As for why variables cannot be used in sequence expressions: historically, the list form of brace expansion came first, and when the sequence-expression form was later introduced, the order of expansions was already fixed.
For a general overview of brace expansion, see this answer.
Workarounds
Note: The workarounds focus on numerical sequence expressions, as in the question; the eval-based workaround also demonstrates use of variables with the less common character sequence expressions, which produce ranges of English letters (e.g., {a..c} to produce a b c).
A seq-based workaround is possible, as demonstrated in Jameson's answer.
A small caveat is that seq is not a POSIX utility, but most modern Unix-like platforms have it.
To refine it a little, using seq's -f option to supply a printf-style format string, and demonstrating two-digit zero-padding:
seq -f '%02.f.txt' $var1 $var2 | xargs ls # '%02.f'==zero-pad to 2 digits, no decimal places
Note that to make it fully robust - in case the resulting words contain spaces or tabs - you'd need to employ embedded quoting:
seq -f '"%02.f a.txt"' $var1 $var2 | xargs ls
ls then sees 01 a.txt, 02 a.txt, ... with the argument boundaries correctly preserved.
If you want to robustly collect the resulting words in a Bash array first, e.g., ${words[#]}:
IFS=$'\n' read -d '' -ra words < <(seq -f '%02.f.txt' $var1 $var2)
ls "${words[#]}"
The following are pure Bash workarounds:
A limited workaround using Bash features only is to use eval:
var1=1 var2=4
# Safety check
(( 10#$var1 + 10#$var2 || 1 )) 2>/dev/null || { echo "Need decimal integers." >&2; exit 1; }
ls $(eval printf '%s\ ' "{$var1..$var2}.txt") # -> ls 1.txt 2.txt 3.txt 4.txt
You can apply a similar technique to a character sequence expression;
var1=a var2=c
# Safety check
[[ $var1 == [a-zA-Z] && $var2 == [a-zA-Z] ]] || { echo "Need single letters."; exit 1; }
ls $(eval printf '%s\ ' "{$var1..$var2}.txt") # -> ls a.txt b.txt c.txt
Note:
A check is performed up front to ensure that $var1 and $var2 contain decimal integers or single English letters, which then makes it safe to use eval. Generally, using eval with unchecked input is a security risk and use of eval is therefore best avoided.
Given that output from eval must be passed unquoted to ls here, so that the shell splits the output into individual arguments through words-splitting, this only works if the resulting filenames contain no embedded spaces or other shell metacharacters.
A more robust, but more cumbersome pure Bash workaround to use an array to create the equivalent words:
var1=1 var2=4
# Emulate brace sequence expression using an array.
args=()
for (( i = var1; i <= var2; i++ )); do
args+=( "$i.txt" )
done
ls "${args[#]}"
This approach bears no security risk and also works with resulting filenames with embedded shell metacharacters, such as spaces.
Custom increments can be implemented by replacing i++ with, e.g., i+=2 to step in increments of 2.
Implementing zero-padding would require use of printf; e.g., as follows:
args+=( "$(printf '%02d.txt' "$i")" ) # -> '01.txt', '02.txt', ...

For that particular piece of syntax (a "sequence expression") you're out of luck, see Bash man page:
A sequence expression takes the form {x..y[..incr]}, where x and y are
either integers or single characters, and incr, an optional increment,
is an integer.
However, you could instead use the seq utility, which would have a similar effect -- and the approach would allow for the use of variables:
var1=1
var2=4
for i in `seq $var1 $var2`; do
ls ${i}.txt
done
Or, if calling ls four times instead of once bothers you, and/or you want it all on one line, something like:
for i in `seq $var1 $var2`; do echo ${i}.txt; done | xargs ls
From seq(1) man page:
seq [OPTION]... LAST
seq [OPTION]... FIRST LAST
seq [OPTION]... FIRST INCREMENT LAST

Add blank line after every result in grep

my grep command looks like this
zgrep -B bb -A aa "pattern" *
I would lke to have output as:
file1:line1
file1:line2
file1:line3
file1:pattern
file1:line4
file1:line5
file1:line6
</blank line>
file2:line1
file2:line2
file2:line3
file2:pattern
file2:line4
file2:line5
file2:line6
The problem is that its hard to distinguish when lines corresponding to the first found result end and the lines corresponding to the second found result start.
Note that although man grep says that "--" is added between contiguous group of matches. It works only when multiple matches are found in the same file. but in my search (as above) I am searching multiple files.
also note that adding a new blank line after every bb+aa+1 line won't work because what if a file has less than bb lines before the pattern.

pipe grep output through
awk -F: '{if(f!=$1)print ""; f=$1; print $0;}'

Pipe | any output to:
sed G
Example:
ls | sed G
If you man sed you will see
G Append's a newline character followed by the contents of the hold space to the pattern space.

The problem is that its hard to distinguish when lines corresponding to the first found result end and the lines corresponding to the second found result start.
Note that although man grep says that "--" is added between contiguous group of matches. It works only when multiple matches are found in the same file. but in my search (as above) I am searching multiple files.
If you don't mind a -- in lieu of a </blank line>, add the -0 parameter to your grep/zgrep command. This should allow for the -- to appear even when searching multiple files. You can still use the -A and -B flags as desired.

You can also use the --group-separator parameter, with an empty value, so it'd just add a new-line.
some-stream | grep --group-separator=

I can't test it with the -A and -B parameters so I can't say for sure but you could try using sed G as mentioned here on Unix StackEx. You'll loose coloring though if that's important.

There is no option for this in grep and I don't think there is a way to do it with xargs or tr (I tried), but here is a for loop that will do it (for f in *; do grep -H "1" $f && echo; done):
[ 11:58 jon#hozbox.com ~/test ]$ for f in *; do grep -H "1" $f && echo; done
a:1
b:1
c:1
d:1
[ 11:58 jon#hozbox.com ~/test ]$ ll
-rw-r--r-- 1 jon people 2B Nov 25 11:58 a
-rw-r--r-- 1 jon people 2B Nov 25 11:58 b
-rw-r--r-- 1 jon people 2B Nov 25 11:58 c
-rw-r--r-- 1 jon people 2B Nov 25 11:58 d
The -H is to display file names for grep matches. Change the * to your own file glob/path expansion string if necessary.

Try with -c 2; with printing a context I see grep is separating its found o/p

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string