how to random lines in txt with bash - linux

I have a txt file with some lines such as:
a
b
c
f
e
f
1
2
3
4
5
6
now I want to random lines and print it to another txt file for example:
f
6
e
1
and so on...
could any body help me?
I am new in bash scripting

You could use shuf (a part of GNU coreutils).
shuf inputfile > outfile
For example:
$ seq 10 | shuf
7
5
8
3
9
4
10
1
6
2

There is an option for that
sort -R /your/file.txt
Expanation
-R, --random-sort
sort by random hash of keys

Iterate over the file, outputting each line with a certain probability (in this example, with roughly a 10% chance for each line:
while read line; do
if (( RANDOM % 10 == 0 )); then
echo "$line"
fi
done < file.txt
(I say "roughly", because the value of RANDOM ranges between 0 and 32767. As such, there are slightly more values that will produce a remainder of 0-7 than there are that will produce a remainder of 8 or 9 when divided by 10. Other probabilities are have similar problems; you can fine-tune the expression to be more precise, but I leave that as an exercise to the reader.)

For less fortunates systems without GNU utils like BSD/OSX you can use this code:
for ((i=0; i<10; i++)); do
n=$((RANDOM%10))
sed $n'q;d' file
done

Related

How can I randomly partition, by rows ,in two text file in Bash? Example 70% and 30%

input:
5 a
5 b
5 c
4 d
6 t
1 f
7 h
5 i
6 j
5 k
output 1:
5 b
6 t
5 k
Output 2 contains the remaining values
You can use shuf to generate random permutations of the file and then use split to generate the two files:
shuf input | split -l $(( $(wc -l <input) * 70 / 100 ))
The default prefix for split is x, so after running the command you should have two files: xaa (70%), and xab (remaining 30%).
You can control the output files for the split command:
-a, --suffix-length=N generate suffixes of length N (default 2)
--additional-suffix=SUFFIX append an additional SUFFIX to file names.
-d, --numeric-suffixes[=FROM] use numeric suffixes instead of alphabetic.
FROM changes the start value (default 0).
So you can use this:
shuf input | split -a1 -d -l $(( $(wc -l <input) * 70 / 100 )) - output
Which will generate output0 (70%), and output1 (remaining 30%).

using -sort in linux

I want to sort the input of the user with sort in a case (and function).
But I never used this before. Do I have to use an array or something?
For example the user does:
bash test.sh 50 20 35 50
Normally in my script this would happen:
ping c -1 "192.168.0.$i"
That results in
192.168.0.50
192.168.0.20
192.168.0.35
192.168.0.50
Now I want that the last numbers are sorted and also pinged from smallest to the biggest number like this: 20 35 50 and also that if you have 2 times the same number, the script only pings that number one time.
SortNumbers(){
}
...
case
-sort ) SortNumbers;;
esac
You can use this:
#!/bin/bash
array=($(printf '%s\n' "$#"|sort -nu))
echo ${array[#]}
If you run test.sh 34 1 45 1 5 6 6 6, it will give output:
1 5 6 34 45
Now you can use the variable $array with a for loop like:
for i in ${array[#]};do
#do something with $i
done
Explanation:
The arguments of the script is piped to the command sort and the output is assigned into an array named array. The options -n is for numerical sort and -u is for unique.
Assumed complete code for you (for clarification):
#!/bin/bash
array=($(printf '%s\n' "$#"|sort -nu))
for i in ${array[#]};do
ping -c -1 "192.168.0.$i"
done
Using a function:
sortNumbers(){
array=($(printf '%s\n' "$#"|sort -nu))
}
sortNumbers 43 1 2 8 2 4 98 45
echo ${array[#]} ##this is just a sample use, you can put for loop here
So you can declare an array array=($#) at the begining of your script. then call the sortNumbers function with the arguments (remember to exclude -sort from the argument) when needed to sort them (it will change the variable $array with sorted content). Put the for loop outside the function so it takes whatever in the variable $array (sorted or unsorted), that way you will have it your way (choice to do sort or not).
Try this:
#!/bin/bash
# 1. copy the scripts arguments into an array
array=($#)
# 2. Set internal field separator to newline
IFS=$'\n'
# 3. pass the array contents to sort's stdin using here-string
sorted=($(sort <<<"${array[*]}"))
# 4. pass the output of sort to uniq utility using the same technique
uniq=($(uniq <<<"${sorted[*]}"))
# 5. print the final array
printf "%s\n" "${uniq[#]}"
lcd047's shorter version:
IFS=$'\n' sorted=($(sort -nu <<<"$*"))
set "${sorted[#]}"
printf "%s\n" "$#"
Run result:
$> bash test.sh 3 2 1 45 45 3 4 4 4 1 1 1 1
1
2
3
4
45

AWK--Comparing the value of two variables in two different files

I have two text files A.txt and B.txt. Each line of A.txt
A.txt
100
222
398
B.txt
1 2 103 2
4 5 1026 74
7 8 209 55
10 11 122 78
What I am looking for is something like this:
for each line of A
search B;
if (the value of third column in a line of B - the value of the variable in A > 10)
print that line of B;
Any awk for doing that??
How about something like this,
I had some troubles understanding your question, but maybe this will give you some pointers,
#!/bin/bash
# Read intresting values from file2 into an array,
for line in $(cat 2.txt | awk '{print $3}')
do
arr+=($line)
done
# Linecounter,
linenr=0
# Loop through every line in file 1,
for val in $(cat 1.txt)
do
# Increment linecounter,
((linenr++))
# Loop through every element in the array (containing values from 3 colum from file2)
for el in "${!arr[#]}";
do
# If that value - the value from file 1 is bigger than 10, print values
if [[ $((${arr[$el]} - $val )) -gt 10 ]]
then
sed -n "$(($el+1))p" 2.txt
# echo "Value ${arr[$el]} (on line $(($el+1)) from 2.txt) - $val (on line $linenr from 1.txt) equals $((${arr[$el]} - $val )) and is hence bigger than 10"
fi
done
done
Note,
This is a quick and dirty thing, there is room for improvements. But I think it'll do the job.
Use awk like this:
cat f1
1
4
9
16
cat f2
2 4 10 8
3 9 20 8
5 1 15 8
7 0 30 8
awk 'FNR==NR{a[NR]=$1;next} $3-a[FNR] < 10' f1 f2
2 4 10 8
5 1 15 8
UPDATE: Based on OP's edited question:
awk 'FNR==NR{a[NR]=$1;next} {for (i in a) if ($3-a[i] > 10) print}'
and see how simple awk based solution is as compared to nested for loops.

How to extract every N columns and write into new files?

I've been struggling to write a code for extracting every N columns from an input file and write them into output files according to their extracting order.
(My real world case is to extract every 800 columns from a total 24005 columns file starting at column 6, so I need a loop)
In a simpler case below, extracting every 3 columns(fields) from an input file with a start point of the 2nd column.
for example, if the input file looks like:
aa 1 2 3 4 5 6 7 8 9
bb 1 2 3 4 5 6 7 8 9
cc 1 2 3 4 5 6 7 8 9
dd 1 2 3 4 5 6 7 8 9
and I want the output to look like this:
output_file_1:
1 2 3
1 2 3
1 2 3
1 2 3
output_file_2:
4 5 6
4 5 6
4 5 6
4 5 6
output_file_3:
7 8 9
7 8 9
7 8 9
7 8 9
I tried this, but it doesn't work:
awk 'for(i=2;i<=10;i+a) {{printf "%s ",$i};a=3}' <inputfile>
It gave me syntax error and the more I fix the more problems coming out.
I also tried the linux command cut but while I was dealing with large files this seems effortless. And I wonder if cut would do a loop cut of every 3 fields just like the awk.
Can someone please help me with this and give a quick explanation? Thanks in advance.
Actions to be performed by awk on the input data must be included in curled braces, so the reason the awk one-liner you tried results in a syntax error is that the for cycle does not respect this rule. A syntactically correct version will be:
awk '{for(i=2;i<=10;i+a) {printf "%s ",$i};a=3}' <inputfile>
This is syntactically correct (almost, see end of this post.), but does not do what you think.
To separate the output by columns on different files, the best thing is to use awk redirection operator >. This will give you the desired output, given that your input files always has 10 columns:
awk '{ print $2,$3,$4 > "file_1"; print $5,$6,$7 > "file_2"; print $8,$9,$10 > "file_3"}' <inputfile>
mind the " " to specify the filenames.
EDITED: REAL WORLD CASE
If you have to loop along the columns because you have too many of them, you can still use awk (gawk), with two loops: one on the output files and one on the columns per file. This is a possible way:
#!/usr/bin/gawk -f
BEGIN{
CTOT = 24005 # total number of columns, you can use NF as well
DELTA = 800 # columns per file
START = 6 # first useful column
d = CTOT/DELTA # number of output files.
}
{
for ( i = 0 ; i < d ; i++)
{
for ( j = 0 ; j < DELTA ; j++)
{
printf("%f\t",$(START+j+i*DELTA)) > "file_out_"i
}
printf("\n") > "file_out_"i
}
}
I have tried this on the simple input files in your example. It works if CTOT can be divided by DELTA. I assumed you had floats (%f) just change that with what you need.
Let me know.
P.s. going back to your original one-liner, note that the loop is an infinite one, as i is not incremented: i+a must be substituted by i+=a, and a=3 must be inside the inner braces:
awk '{for(i=2;i<=10;i+=a) {printf "%s ",$i;a=3}}' <inputfile>
this evaluates a=3 at every cycle, which is a bit pointless. A better version would thus be:
awk '{for(i=2;i<=10;i+=3) {printf "%s ",$i}}' <inputfile>
Still, this will just print the 2nd, 5th and 8th column of your file, which is not what you wanted.
awk '{ print $2, $3, $4 >"output_file_1";
print $5, $6, $7 >"output_file_2";
print $8, $9, $10 >"output_file_3";
}' input_file
This makes one pass through the input file, which is preferable to multiple passes. Clearly, the code shown only deals with the fixed number of columns (and therefore a fixed number of output files). It can be modified, if necessary, to deal with variable numbers of columns and generating variable file names, etc.
(My real world case is to extract every 800 columns from a total 24005 columns file starting at column 6, so I need a loop)
In that case, you're correct; you need a loop. In fact, you need two loops:
awk 'BEGIN { gap = 800; start = 6; filebase = "output_file_"; }
{
for (i = start; i < start + gap; i++)
{
file = sprintf("%s%d", filebase, i);
for (j = i; j <= NF; j += gap)
printf("%s ", $j) > file;
printf "\n" > file;
}
}' input_file
I demonstrated this to my satisfaction with an input file with 25 columns (numbers 1-25 in the corresponding columns) and gap set to 8 and start set to 2. The output below is the resulting 8 files pasted horizontally.
2 10 18 3 11 19 4 12 20 5 13 21 6 14 22 7 15 23 8 16 24 9 17 25
2 10 18 3 11 19 4 12 20 5 13 21 6 14 22 7 15 23 8 16 24 9 17 25
2 10 18 3 11 19 4 12 20 5 13 21 6 14 22 7 15 23 8 16 24 9 17 25
2 10 18 3 11 19 4 12 20 5 13 21 6 14 22 7 15 23 8 16 24 9 17 25
With GNU awk:
$ awk -v d=3 '{for(i=2;i<NF;i+=d) print gensub("(([^ ]+ +){" i-1 "})(([^ ]+( +|$)){" d "}).*","\\3",""); print "----"}' file
1 2 3
4 5 6
7 8 9
----
1 2 3
4 5 6
7 8 9
----
1 2 3
4 5 6
7 8 9
----
1 2 3
4 5 6
7 8 9
----
Just redirect the output to files if desired:
$ awk -v d=3 '{sfx=0; for(i=2;i<NF;i+=d) print gensub("(([^ ]+ +){" i-1 "})(([^ ]+( +|$)){" d "}).*","\\3","") > ("output_file_" ++sfx)}' file
The idea is just to tell gensub() to skip the first few (i-1) fields then print the number of fields you want (d = 3) and ignore the rest (.*). If you're not printing exact multiples of the number of fields you'll need to massage how many fields get printed on the last loop iteration. Do the math...
Here's a version that'd work in any awk. It requires 2 loops and modifies the spaces between fields but it's probably easier to understand:
$ awk -v d=3 '{sfx=0; for(i=2;i<=NF;i+=d) {str=fs=""; for(j=i;j<i+d;j++) {str = str fs $j; fs=" "}; print str > ("output_file_" ++sfx)} }' file
I was successful using the following command line. :) It uses a for loop and pipes the awk program into it's stdin using -f -. The awk program itself is created using bash variable math.
for i in 0 1 2; do
echo "{print \$$((i*3+2)) \" \" \$$((i*3+3)) \" \" \$$((i*3+4))}" \
| awk -f - t.file > "file$((i+1))"
done
Update: After the question has updated I tried to hack a script that creates the requested 800-cols-awk script dynamically ( a version according to Jonathan Lefflers answer) and pipe that to awk. Although the scripts looks good (for me ) it produces an awk syntax error. The question is, is this too much for awk or am I missing something? Would really appreciate feedback!
Update: Investigated this and found documentation that says awk has a lot af restrictions. They told to use gawk in this situations. (GNU's awk implementation). I've done that. But still I'll get an syntax error. Still feedback appreciated!
#!/bin/bash
# Note! Although the script's output looks ok (for me)
# it produces an awk syntax error. is this just too much for awk?
# open pipe to stdin of awk
exec 3> >(gawk -f - test.file)
# verify output using cat
#exec 3> >(cat)
echo '{' >&3
# write dynamic script to awk
for i in {0..24005..800} ; do
echo -n " print " >&3
for (( j=$i; j <= $((i+800)); j++ )) ; do
echo -n "\$$j " >&3
if [ $j = 24005 ] ; then
break
fi
done
echo "> \"file$((i/800+1))\";" >&3
done
echo "}"

Generate 12 Digit HEX number in KSH

I need to generate 12 digit Hex numbers in KSH on Solaris
Thanks
#!/bin/ksh
set -A hex 0 1 2 3 4 5 6 7 8 9 A B C D E F
for i in {1..12}
do
printf ${hex[$((RANDOM%16))]}
done
Start with this Python program, hex12.py.
hex12.py
#!/usr/bin/env python
import random
import hashlib
h= hashlib.sha1(str(random.random())).hexdigest()
print h[:12]
In your shell you can now use hex.py to create 12 hex digits on standard out.
Try this one:
DIGITS=`head -c 6 /dev/urandom | od -x | head -n 1 | sed -e 's/^0* //' -e 's/ //g'
As RANDOM variable generates a 15 bit number (from 0 to 32767) you can concatenate several RANDOM values.
You will need a 48 bit number as 12 hex digits are 12 * 4 = 48 bits.
Either:
$ printf '%x\n' $(( ((RANDOM<<15|RANDOM)<<15|RANDOM)<<3|RANDOM%8 ))
9142467b46d3
Or:
$ printf '%x' $((RANDOM%4096)) $((RANDOM%4096)) $((RANDOM%4096)) $((RANDOM%4096)); echo
808878c21e19

Resources