Using "$RANDOM" to generate a random string in Bash - linux

I am trying to use the Bash variable $RANDOM to create a random string that consists of 8 characters from a variable that contains integer and alphanumeric digits, e.g., var="abcd1234ABCD".
How can I do that?

Use parameter expansion. ${#chars} is the number of possible characters, % is the modulo operator. ${chars:offset:length} selects the character(s) at position offset, i.e. 0 - length($chars) in our case.
chars=abcd1234ABCD
for i in {1..8} ; do
echo -n "${chars:RANDOM%${#chars}:1}"
done
echo

For those looking for a random alpha-numeric string in bash:
LC_ALL=C tr -dc A-Za-z0-9 </dev/urandom | head -c 64
The same as a well-documented function:
function rand-str {
# Return random alpha-numeric string of given LENGTH
#
# Usage: VALUE=$(rand-str $LENGTH)
# or: VALUE=$(rand-str)
local DEFAULT_LENGTH=64
local LENGTH=${1:-$DEFAULT_LENGTH}
LC_ALL=C tr -dc A-Za-z0-9 </dev/urandom | head -c $LENGTH
# LC_ALL=C: required for Mac OS X - https://unix.stackexchange.com/a/363194/403075
# -dc: delete complementary set == delete all except given set
}

Another way to generate a 32 bytes (for example) hexadecimal string:
xxd -l 32 -c 32 -p < /dev/random
add -u if you want uppercase characters instead.

OPTION 1 - No specific length, no openssl needed, only letters and numbers, slower than option 2
sed "s/[^a-zA-Z0-9]//g" <<< $(cat /dev/urandom | tr -dc 'a-zA-Z0-9!##$%*()-+' | fold -w 32 | head -n 1)
DEMO: x=100; while [ $x -gt 0 ]; do sed "s/[^a-zA-Z0-9]//g" <<< $(cat /dev/urandom | tr -dc 'a-zA-Z0-9!##$%*()-+' | fold -w 32 | head -n 1) <<< $(openssl rand -base64 17); x=$(($x-1)); done
Examples:
j0PYAlRI1r8zIoOSyBhh9MTtrhcI6d
nrCaiO35BWWQvHE66PjMLGVJPkZ6GBK
0WUHqiXgxLq0V0mBw2d7uafhZt2s
c1KyNeznHltcRrudYpLtDZIc1
edIUBRfttFHVM6Ru7h73StzDnG
OPTION 2 - No specific length, openssl needed, only letters and numbers, faster than option 1
openssl rand -base64 12 # only returns
rand=$(openssl rand -base64 12) # only saves to var
sed "s/[^a-zA-Z0-9]//g" <<< $(openssl rand -base64 17) # leave only letters and numbers
# The last command can go to a var too.
DEMO: x=100; while [ $x -gt 0 ]; do sed "s/[^a-zA-Z0-9]//g" <<< $(openssl rand -base64 17); x=$(($x-1)); done
Examples:
9FbVwZZRQeZSARCH
9f8869EVaUS2jA7Y
V5TJ541atfSQQwNI
V7tgXaVzmBhciXxS
Others options not necessarily related:
uuidgen or cat /proc/sys/kernel/random/uuid
After generating 1 billion UUIDs every second for the next 100 years,
the probability of creating just one duplicate would be about 50%. The
probability of one duplicate would be about 50% if every person on
earth owns 600 million UUIDs 😇 source

Not using $RANDOM, but worth mentioning.
Using shuf as source of entropy (a.k.a randomness) (which, in turn, may use /dev/random as source of entropy. As in `shuf -i1-10 --random-source=/dev/urandom) seems like a solution that use less resources:
$ shuf -er -n8 {A..Z} {a..z} {0..9} | paste -sd ""
tf8ZDZ4U

head -1 <(fold -w 20 <(tr -dc 'a-zA-Z0-9' < /dev/urandom))
This is safe to use in bash script if you have safety options turned on:
set -eou pipefail
This is a workaround of bash exit status 141 when you use pipes
tr -dc 'a-zA-Z0-9' < /dev/urandom | fold -w 20 | head -1

Little bit obscure but short to write solution is
RANDSTR=$(mktemp XXXXX) && rm "$RANDSTR"
expecting you have write access to current directory ;-)
mktemp is part of coreutils
UPDATE:
As Bazi pointed out in the comment, mktemp can be used without creating the file ;-) so the command can be even shorter.
RANDSTR=$(mktemp --dry-run XXXXX)

Using sparse array to shuffle characters.
#!/bin/bash
array=()
for i in {a..z} {A..Z} {0..9}; do
array[$RANDOM]=$i
done
printf %s ${array[#]::8} $'\n'
(Or alot of random strings)
#!/bin/bash
b=()
while ((${#b[#]} <= 32768)); do
a=(); for i in {a..z} {A..Z} {0..9}; do a[$RANDOM]=$i; done; b+=(${a[#]})
done
tr -d ' ' <<< ${b[#]} | fold -w 8 | head -n 4096

An abbreviated safe pipe workaround based on Radu Gabriel's answer and tested with GNU bash version 4.4.20 and set -euxo pipefail:
head -c 20 <(tr -dc [:alnum:] < /dev/urandom)

Related

How to create multiple files(100 files) and they can contain 30 random characters in each of them

I need to create 100 files using script Linux which contain 30 random characters password and that password contain only strings, small letters and big letters.
// #!/bin/bash
for n in {1..100}; do
dd if=/dev/urandom of=/mnt/mymnt/passwords/only_numbers_and_lettersNUM.txt$( printf %03d "$n" ).bin bs=1 count=$(( RANDOM + 1024 ))
head /dev/urandom | tr -dc A-Za-z0-9 | head -c 30 ; echo ''
done //
Thanks in advance.
#!/bin/bash
for n in {1..100}; do
{ < /dev/urandom tr -dc A-Za-z | head -c${1:-30};echo; } > /mnt/mymnt/passwords/$n
done

How to generate 10 million random strings in bash

I need to make a big test file for a sorting algorithm. For that I need to generate 10 million random strings. How do I do that? I tried using cat on /dev/urandom but it keeps going for minutes and when I look in the file, there are only around 8 pages of strings. How do I generate 10 million strings in bash? Strings should be 10 characters long.
Using openssl:
#!/bin/bash
openssl rand -hex $(( 100000000 * 4 )) | \
while IFS= read -rn8 -d '' r; do
echo "$r"
done
This will not guarantee uniquness, but gives you 10 million random lines in a file. Not too fast, but ran in under 30 sec on my machine:
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 10 | head -n 10000000 > file
Update, if you have shuf from GNU coreutils you can use:
shuf -i 1-10000000 > file
Takes 2 sec on my computer. (Thanks rici!)
You can use awk to generate sequential numbers and shuffle them with shuf:
awk 'BEGIN{for(i=1;i<10000001;i++){print i}}' | shuf > big-file.txt
This takes ~ 5 sec on my computer
If they don't need to be uniq, you can do:
$ awk -v n=10000000 'BEGIN{for (i=1; i<=n; i++) printf "%010d\n", int(rand()*n)}' >big_file
That runs in about 3 seconds on my iMac.
Don't generate it, download it. For example Nic funet fi has file 100Mrnd (size 104857600) in its /dev (just funet below). 10M rows, 10 bytes on each row is 100M but using xxd to convert from bin to hex (\x12 -> 12) we'll only need 50M bytes, so:
$ wget -S -O - ftp://funet/100Mrnd | head -c 50000000 | xxd -p | fold -w 10 > /dev/null
$ head -5 file
f961b3ef0e
dc0b5e3b80
513e7c37e1
36d2e4c7b0
0514e626e5
(replace funet with the domain name and path given and /dev/null with your desired filename.)

How to count the number of numbers/letters in file?

I try to count the number of numbers and letters in my file in Bash.
I know that I can use wc -c file to count the number of characters but how can I fix it to only letters and secondly numbers?
Here's a way completely avoiding pipes, just using tr and the shell's way to give the length of a variable with ${#variable}:
$ cat file
123 sdf
231 (3)
huh? 564
242 wr =!
$ NUMBERS=$(tr -dc '[:digit:]' < file)
$ LETTERS=$(tr -dc '[:alpha:]' < file)
$ ALNUM=$(tr -dc '[:alnum:]' < file)
$ echo ${#NUMBERS} ${#LETTERS} ${#ALNUM}
13 8 21
To count the number of letters and numbers you can combine grep with wc:
grep -o [a-z] myfile | wc -c
grep -o [0-9] myfile | wc -c
With little bit of tweaking you can modify it to count numbers or alphabetic words or alphanumeric words like this,
grep -o [a-z]+ myfile | wc -c
grep -o [0-9]+ myfile | wc -c
grep -o [[:alnum:]]+ myfile | wc -c
You can use sed to replace all characters that are not of the kind that you are looking for and then word count the characters of the result.
# 1h;1!H will place all lines into the buffer that way you can replace
# newline characters
sed -n '1h;1!H;${;g;s/[^a-zA-Z]//g;p;}' myfile | wc -c
It's easy enough to just do numbers as well.
sed -n '1h;1!H;${;g;s/[^0-9]//g;p;}' myfile | wc -c
Or why not both.
sed -n '1h;1!H;${;g;s/[^0-9a-zA-Z]//g;p;}' myfile | wc -c
There are a number of ways to approach analyzing the line, word, and character frequency of a text file in bash. Utilizing the bash builtin character case filters (e.g. [:upper:], and so on), you can drill down to the frequency of each occurrence of each character type in a text file. Below is a simple script that reads from stdin and provides the normal wc output as it first line of output, and then outputs the number of upper, lower, digits, punct and whitespace.
#!/bin/bash
declare -i lines=0
declare -i words=0
declare -i chars=0
declare -i upper=0
declare -i lower=0
declare -i digit=0
declare -i punct=0
oifs="$IFS"
# Read line with new IFS, preserve whitespace
while IFS=$'\n' read -r line; do
# parse line into words with original IFS
IFS=$oifs
set -- $line
IFS=$'\n'
# Add up lines, words, chars, upper, lower, digit
lines=$((lines + 1))
words=$((words + $#))
chars=$((chars + ${#line} + 1))
for ((i = 0; i < ${#line}; i++)); do
[[ ${line:$((i)):1} =~ [[:upper:]] ]] && ((upper++))
[[ ${line:$((i)):1} =~ [[:lower:]] ]] && ((lower++))
[[ ${line:$((i)):1} =~ [[:digit:]] ]] && ((digit++))
[[ ${line:$((i)):1} =~ [[:punct:]] ]] && ((punct++))
done
done
echo " $lines $words $chars $file"
echo " upper: $upper, lower: $lower, digit: $digit, punct: $punct, \
whitespace: $((chars-upper-lower-digit-punct))"
Test Input
$ cat dat/captnjackn.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
(along with 2357 other pirates)
Example Use/Output
$ bash wcount3.sh <dat/captnjackn.txt
5 21 108
upper: 12, lower: 68, digit: 4, punct: 3, whitespace: 21
You can customize the script to give you as little or as much detail as you like. Let me know if you have any questions.
You can use tr to preserve only alphanumeric characters by combining the the -c (complement) and -d (delete) flags. From there on, it's just a question of some piping:
$ cat myfile.txr | tr -cd [:alnum:] | wc -c

Finding the longest word in a text file

I am trying to make a a simple script of finding the largest word and its number/length in a text file using bash. I know when I use awk its simple and straight forward but I want to try and use this method...lets say I know if a=wmememememe and if I want to find the length I can use echo {#a} its word I would echo ${a}. But I want to apply it on this below
for i in `cat so.txt` do
Where so.txt contains words, I hope it makes sense.
bash one liner.
sed 's/ /\n/g' YOUR_FILENAME | sort | uniq | awk '{print length, $0}' | sort -nr | head -n 1
read file and split the words (via sed)
remove duplicates (via sort | uniq)
prefix each word with it's length (awk)
sort the list by the word length
print the single word with greatest length.
yes this will be slower than some of the above solutions, but it also doesn't require remembering the semantics of bash for loops.
Normally, you'd want to use a while read loop instead of for i in $(cat), but since you want all the words to be split, in this case it would work out OK.
#!/bin/bash
longest=0
for word in $(<so.txt)
do
len=${#word}
if (( len > longest ))
then
longest=$len
longword=$word
fi
done
printf 'The longest word is %s and its length is %d.\n' "$longword" "$longest"
Another solution:
for item in $(cat "$infile"); do
length[${#item}]=$item # use word length as index
done
maxword=${length[#]: -1} # select last array element
printf "longest word '%s', length %d" ${maxword} ${#maxword}
longest=""
for word in $(cat so.txt); do
if [ ${#word} -gt ${#longest} ]; then
longest=$word
fi
done
echo $longest
awk script:
#!/usr/bin/awk -f
# Initialize two variables
BEGIN {
maxlength=0;
maxword=0
}
# Loop through each word on the line
{
for(i=1;i<=NF;i++)
# Assign the maxlength variable if length of word found is greater. Also, assign
# the word to maxword variable.
if (length($i)>maxlength)
{
maxlength=length($i);
maxword=$i;
}
}
# Print out the maxword and the maxlength
END {
print maxword,maxlength;
}
Textfile:
[jaypal:~/Temp] cat textfile
AWK utility is a data_extraction and reporting tool that uses a data-driven scripting language
consisting of a set of actions to be taken against textual data (either in files or data streams)
for the purpose of producing formatted reports.
The language used by awk extensively uses the string datatype,
associative arrays (that is, arrays indexed by key strings), and regular expressions.
Test:
[jaypal:~/Temp] ./script.awk textfile
data_extraction 15
Relatively speedy bash function using no external utils:
# Usage: longcount < textfile
longcount ()
{
declare -a c;
while read x; do
c[${#x}]="$x";
done;
echo ${#c[#]} "${c[${#c[#]}]}"
}
Example:
longcount < /usr/share/dict/words
Output:
23 electroencephalograph's
'Modified POSIX shell version of jimis' xargs-based
answer; still very slow, takes two or three minutes:
tr "'" '_' < /usr/share/dict/words |
xargs -P$(nproc) -n1 -i sh -c 'set -- {} ; echo ${#1} "$1"' |
sort -n | tail | tr '_' "'"
Note the leading and trailing tr bit to get around GNU xargs
difficulty with single quotes.
for i in $(cat so.txt); do echo ${#i}; done | paste - so.txt | sort -n | tail -1
Slow because of the gazillion of forks, but pure shell, does not require awk or special bash features:
$ cat /usr/share/dict/words | \
xargs -n1 -I '{}' -d '\n' sh -c 'echo `echo -n "{}" | wc -c` "{}"' | \
sort -n | tail
23 Pseudolamellibranchiata
23 pseudolamellibranchiate
23 scientificogeographical
23 thymolsulphonephthalein
23 transubstantiationalist
24 formaldehydesulphoxylate
24 pathologicopsychological
24 scientificophilosophical
24 tetraiodophenolphthalein
24 thyroparathyroidectomize
You can easily parallelize, e.g. to 4 CPUs by providing -P4 to xargs.
EDIT: modified to work with the single quotes that some dictionaries have. Now it requires GNU xargs because of -d argument.
EDIT2: for the fun of it, here is another version that handles all kinds of special characters, but requires the -0 option to xargs. I also added -P4 to compute on 4 cores:
cat /usr/share/dict/words | tr '\n' '\0' | \
xargs -0 -I {} -n1 -P4 sh -c 'echo ${#1} "$1"' wordcount {} | \
sort -n | tail

Convert binary data to hexadecimal in a shell script

I want to convert binary data to hexadecimal, just that, no fancy formatting and all. hexdump seems too clever, and it "overformats" for me. I want to take x bytes from the /dev/random and pass them on as hexadecimal.
Preferably I'd like to use only standard Linux tools, so that I don't need to install it on every machine (there are many).
Perhaps use xxd:
% xxd -l 16 -p /dev/random
193f6c54814f0576bc27d51ab39081dc
Watch out!
hexdump and xxd give the results in a different endianness!
$ echo -n $'\x12\x34' | xxd -p
1234
$ echo -n $'\x12\x34' | hexdump -e '"%x"'
3412
Simply explained. Big-endian vs. little-endian :D
With od (GNU systems):
$ echo abc | od -A n -v -t x1 | tr -d ' \n'
6162630a
With hexdump (BSD systems):
$ echo abc | hexdump -ve '/1 "%02x"'
6162630a
From Hex dump, od and hexdump:
"Depending on your system type, either or both of these two utilities will be available--BSD systems deprecate od for hexdump, GNU systems the reverse."
Perhaps you could write your own small tool in C, and compile it on-the-fly:
int main (void) {
unsigned char data[1024];
size_t numread, i;
while ((numread = read(0, data, 1024)) > 0) {
for (i = 0; i < numread; i++) {
printf("%02x ", data[i]);
}
}
return 0;
}
And then feed it from the standard input:
cat /bin/ls | ./a.out
You can even embed this small C program in a shell script using the heredoc syntax.
All the solutions seem to be hard to remember or too complex. I find using printf the shortest one:
$ printf '%x\n' 256
100
But as noted in comments, this is not what author wants, so to be fair, below is the full answer.
... to use above to output actual binary data stream:
printf '%x\n' $(cat /dev/urandom | head -c 5 | od -An -vtu1)
What it does:
printf '%x\n' .... - prints a sequence of integers , i.e. printf '%x,' 1 2 3, will print 1,2,3,
$(...) - this is a way to get output of some shell command and process it
cat /dev/urandom - it outputs random binary data
head -c 5 - limits binary data to 5 bytes
od -An -vtu1 - octal dump command, converts binary to decimal
As a testcase ('a' is 61 hex, 'p' is 70 hex, ...):
$ printf '%x\n' $(echo "apple" | head -c 5 | od -An -vtu1)
61
70
70
6c
65
Or to test individual binary bytes, on input let’s give 61 decimal ('=' char) to produce binary data ('\\x%x' format does it). The above command will correctly output 3d (decimal 61):
$printf '%x\n' $(echo -ne "$(printf '\\x%x' 61)" | head -c 5 | od -An -vtu1)
3d
If you need a large stream (no newlines) you can use tr and xxd (part of Vim) for byte-by-byte conversion.
head -c1024 /dev/urandom | xxd -p | tr -d $'\n'
Or you can use hexdump (POSIX) for word-by-word conversion.
head -c1024 /dev/urandom | hexdump '-e"%x"'
Note that the difference is endianness.
dd + hexdump will also work:
dd bs=1 count=1 if=/dev/urandom 2>/dev/null | hexdump -e '"%x"'
Sometimes perl5 works better for portability if you target more than one platform. It comes with every Linux distribution and Unix OS. You can often find it in container images where other tools like xxd or hexdump are not available. Here's how to do the same thing in Perl:
$ head -c8 /dev/urandom | perl -0777 -ne 'print unpack "H*"'
5c9ed169dabf33ab
$ echo -n $'\x01\x23\xff' | perl -0777 -ne 'print unpack "H*"'
0123ff
$ echo abc | perl -0777 -ne 'print unpack "H*"'
6162630a
Note that this uses slurp more, which causes Perl to read the entire input into memory, which may be suboptimal when the input is large.
These three commands will print the same (0102030405060708090a0b0c):
n=12
echo "$a" | xxd -l "$n" -p
echo "$a" | od -N "$n" -An -tx1 | tr -d " \n" ; echo
echo "$a" | hexdump -n "$n" -e '/1 "%02x"'; echo
Given that n=12 and $a is the byte values from 1 to 26:
a="$(printf '%b' "$(printf '\\0%o' {1..26})")"
That could be used to get $n random byte values in each program:
xxd -l "$n" -p /dev/urandom
od -vN "$n" -An -tx1 /dev/urandom | tr -d " \n" ; echo
hexdump -vn "$n" -e '/1 "%02x"' /dev/urandom ; echo

Resources