Bash searching for words in file with same characters [duplicate] - string
is it possible to write a bash script that can read in each line from a file and generate permutations (without repetition) for each? Using awk / perl is fine.
File
----
ab
abc
Output
------
ab
ba
abc
acb
bac
bca
cab
cba
I know I am a little late to the game but why not brace expansion?
For example:
echo {a..z}{0..9}
Outputs:
a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 i0 i1 i2 i3 i4 i5 i6 i7 i8 i9 j0 j1 j2 j3 j4 j5 j6 j7 j8 j9 k0 k1 k2 k3 k4 k5 k6 k7 k8 k9 l0 l1 l2 l3 l4 l5 l6 l7 l8 l9 m0 m1 m2 m3 m4 m5 m6 m7 m8 m9 n0 n1 n2 n3 n4 n5 n6 n7 n8 n9 o0 o1 o2 o3 o4 o5 o6 o7 o8 o9 p0 p1 p2 p3 p4 p5 p6 p7 p8 p9 q0 q1 q2 q3 q4 q5 q6 q7 q8 q9 r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 s0 s1 s2 s3 s4 s5 s6 s7 s8 s9 t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 u0 u1 u2 u3 u4 u5 u6 u7 u8 u9 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 y0 y1 y2 y3 y4 y5 y6 y7 y8 y9 z0 z1 z2 z3 z4 z5 z6 z7 z8 z9
Another useful example:
for X in {a..z}{a..z}{0..9}{0..9}{0..9}
do echo $X;
done
Pure bash (using local, faster, but can't beat the other answer using awk below, or the Python below):
perm() {
local items="$1"
local out="$2"
local i
[[ "$items" == "" ]] && echo "$out" && return
for (( i=0; i<${#items}; i++ )) ; do
perm "${items:0:i}${items:i+1}" "$out${items:i:1}"
done
}
while read line ; do perm $line ; done < File
Pure bash (using subshell, much slower):
perm() {
items="$1"
out="$2"
[[ "$items" == "" ]] && echo "$out" && return
for (( i=0; i<${#items}; i++ )) ; do
( perm "${items:0:i}${items:i+1}" "$out${items:i:1}" )
done
}
while read line ; do perm $line ; done < File
Since asker mentioned Perl is fine, I think Python 2.6+/3.X is fine, too:
python -c "from itertools import permutations as p ; print('\n'.join([''.join(item) for line in open('File') for item in p(line[:-1])]))"
For Python 2.5+/3.X:
#!/usr/bin/python2.5
# http://stackoverflow.com/questions/104420/how-to-generate-all-permutations-of-a-list-in-python/104436#104436
def all_perms(str):
if len(str) <=1:
yield str
else:
for perm in all_perms(str[1:]):
for i in range(len(perm)+1):
#nb str[0:1] works in both string and list contexts
yield perm[:i] + str[0:1] + perm[i:]
print('\n'.join([''.join(item) for line in open('File') for item in all_perms(line[:-1])]))
On my computer using a bigger test file:
First Python code
Python 2.6: 0.038s
Python 3.1: 0.052s
Second Python code
Python 2.5/2.6: 0.055s
Python 3.1: 0.072s
awk: 0.332s
Bash (local): 2.058s
Bash (subshell): 22+s
Using the crunch util, and bash:
while read a; do crunch 0 0 -p "$a"; done 2> /dev/null < File
Output:
ab
ba
abc
acb
bac
bca
cab
cba
Tutorial here https://pentestlab.blog/2012/07/12/creating-wordlists-with-crunch/
A faster version using awk
function permute(s, st, i, j, n, tmp) {
n = split(s, item,//)
if (st > n) { print s; return }
for (i=st; i<=n; i++) {
if (i != st) {
tmp = item[st]; item[st] = item[i]; item[i] = tmp
nextstr = item[1]
for (j=2; j<=n; j++) nextstr = nextstr delim item[j]
}else {
nextstr = s
}
permute(nextstr, st+1)
n = split(s, item, //)
}
}
{ permute($0,1) }
usage:
$ awk -f permute.awk file
See the Perl Cookbook for permutation examples. They're word/number oriented but a simple split()/join() on your above example will suffice.
Bash word-list/dictionary/permutation generator:
The following Bash code generates 3 character permutation over 0-9, a-z, A-Z. It gives you (10+26+26)^3 = 238,328 words in output.
It's not very scalable as you can see you need to increase the number of for loop to increase characters in combination. It would be much faster to write such thing in assembly or C using recursion to increase speed. The Bash code is only for demonstration.
P.S.
You can populate $list variable with list=$(cat input.txt)
#!/bin/bash
list=`echo {0..9} {a..z} {A..Z}`
for c1 in $list
do
for c2 in $list
do
for c3 in $list
do
echo $c1$c2$c3
done
done
done
SAMPLE OUTPUT:
000
001
002
003
004
005
...
...
...
ZZU
ZZV
ZZW
ZZX
ZZY
ZZZ
[babil#quad[13:27:37][~]> wc -l t.out
238328 t.out
$ ruby -ne '$_.chomp.chars.to_a.permutation{|x| puts x.join}' file # ver 1.9.1
Because you can never have enogh cryptic Bash-oneliners:
while read s;do p="$(echo "$s"|sed -e 's/./&,/g' -e 's/,$//')";eval "printf "%s\\\\n" "$(eval 'echo "$(printf "{'"$p"'}%.0s" {0..'"$((${#s}-1))"'})"')"|grep '\(.\)\1*.*\1' -v";echo;done <f
It's pretty fast - at least on my machine here:
$ time while read s;do p="$(echo "$s"|sed -e 's/./&,/g' -e 's/,$//')";eval "printf "%s\\\\n" "$(eval 'echo "$(printf "{'"$p"'}%.0s" {0..'"$((${#s}-1))"'})"')"|grep '\(.\)\1*.*\1' -v";echo;done <f >/dev/null
real 0m0.021s
user 0m0.000s
sys 0m0.004s
But be aware that this one will eat a lot of memory when you go beyond 8 characters...
file named input:
a
b
c
d
If you want the output:
a b
a c
a d
b b
b c
b d
c c
c d
d d
You can try the following bash script:
lines=$(wc -l input | awk '{print $1}')
for ((i=1 ; i<=$lines ; i++)); do
x=$(sed -n ''$i' p' input)
sed -n ''$i',$ p' input > tmp
for j in $(cat tmp) ; do
echo $x $j
done
done
Hows about this one
lines="a b c"
for i in $lines; do
echo $i >tmp
for j in $lines ; do
echo $i $j
done
done
it will print
a a
a b
a c
b a
b b
b c
c a
c b
c c
Just a 4-lines-bash joke - permutation of 4 letters/names:
while read line
do
[ $(sed -r "s/(.)/\1\n/g" <<<$line | sort | uniq | wc -l) -eq 5 ] && echo $line
done <<<$(echo -e {A..D}{A..D}{A..D}{A..D}"\n") | sed -r "s/A/Adams /;s/B/Barth /; s/C/Cecil /; s/D/Devon /;"
Adams Barth Cecil Devon
Adams Barth Devon Cecil
...
I like Bash! :-)
Related
Convert excel formula to linux bash script
I'm trying to convert the following excel-formula into a simple bash script: =IF(B3*10>B5,1,IF(TRUNC(B5/B3/10)>3,3,TRUNC(B5/B3/10))) Examples: If B3="2" and B5="39"; the output ("VAL") should be 1 If B3="2" and B5="40"; the output ("VAL") should be 2 If B3="2" and B5="60"; the output ("VAL") should be 3 This is what I've tried, but the output is not correct: if (( $B3 \* 10 > $B5 )); then if (( $B5 \/ $B3 \/ 10 > 3 )); then VAL="3" else VAL=`expr $B5 \/ $B3 \/ 10` fi fi Where is the error? :)
The main problem is that you're not setting VAL to 1 in the first "true" branch. if (( B3 * 10 > B5 )); then VAL=1 elif (( B5 / B3 / 10 > 3 )); then VAL=3 else VAL=$((B5 / B3 / 10)) fi bash lets you refer to variables without the $ inside an arithmetic expression. Testing: B3=2 for B5 in 10 39 40 60 80; do if (( B3 * 10 > B5 )); then branch="first" VAL=1 elif (( B5 / B3 / 10 > 3 )); then branch="second" VAL=3 else branch="third" VAL=$((B5 / B3 / 10)) fi echo "$B3 $B5 $VAL $branch" done 2 10 1 first 2 39 1 third 2 40 2 third 2 60 3 third 2 80 3 second
Combine all the columns of two files using bash
I have two files A B C D E F B D F A C E D E F A B C and 1 2 3 4 5 6 2 4 6 1 3 5 4 5 6 1 2 3 I want to have something like this: A1 B2 C3 D4 E5 F6 B2 D4 F6 A1 C3 E5 D4 E5 F6 A1 B2 C3 I mean, combine both files pasting the content of all columns. Thank you very much!
Here's a bash solution: paste -d' ' file1 file2 \ | while read -a fields ; do (( width=${#fields[#]}/2 )) for ((i=0; i<width; ++i)) ; do printf '%s%s ' "${fields[i]}" "${fields[ i + width ]}" done printf '\n' done paste outputs the files side by side. read -a reads the columns into an array. in the for loop, we iterate over the array and print the corresponding values.
Could you please try following, trying to do some fun with combinations of xargs + paste here. xargs -n6 < <(paste -d'\0' <(xargs -n1 < Input_file1) <(xargs -n1 < Input_file2))
BashScript: Read a file and process it
I have a file with this structure: Text... A B C A1 57,624,609,830 20.99 A2 49,837,119,260 20.90 A3 839,812,303 20.88 A4 843,568,192 20.87 ... 1,016,104,564 20.82 A29 1,364,178,406 16.62 A line of text Blank Text Text A B C A1 57,624,609,830 20.99 A2 49,837,119,260 20.90 A3 839,812,303 20.88 A4 843,568,192 20.87 ... 1,016,104,564 20.82 A29 1,364,178,406 16.62 and I want to get all the A1s with it's values, then all the A2s with its values and so on. What I'm doing so far is cat myFile.csv | awk '{if (NR > 5 && NR <= 29) printf $1"\t"}' > tmp1.csv I get the A1 A2 A3... in different cells in a new file tmp1.csv and then cat myFile.csv | grep A1 | awk '{print $2}' to get tthe values of A1, copy paste to the column A1 in tmp1 file. I tried #!/bin/bash input="myFile.csv" while IFS= read -r line do awk '{if (NR > 4 && NR <= 28) | grep A1 | awk print $2 }' done < "$input" but cannot make it to produce the same result as A1 A2 A3 A4 ... 57,624,609,830 49,837,119,260 839,812,303 839,812,303 ... 57,624,609,830 49,837,119,260 839,812,303 839,812,303 ... ... in a file. In other words it would be ideal for me to get from the 5th to the 28th line the $1 in different cells and their $2 in each column accordingly. UPDATE cat myFile.csv | awk '{if (NR > 5 && NR <= 29) printf $1"\t"}' gives me the the content of the lines I care about. How can I loop into the entire file, in all lines to get all the contents? For instance instead of NR>5 && NR<=29 to have x=1 NR>x+4 && NR<=x+28 and eventually get the content.
awk to the rescue! $ awk '/A[0-9]+/' file | sed -r 's/^ +//g' | sort -k1.1,1.1 -k1.2n A1 57,624,609,830 20.99 A1 57,624,609,830 20.99 A2 49,837,119,260 20.90 A2 49,837,119,260 20.90 A3 839,812,303 20.88 A3 839,812,303 20.88 A4 843,568,192 20.87 A4 843,568,192 20.87 A29 1,364,178,406 16.62 A29 1,364,178,406 16.62 or if your sort supports version sort, it will work too. You can restrict pattern match perhaps with adding && NF==3 If you need to transpose layout, you can pipe the output of the first script to $ ... | awk 'NR%2{h=h FS $1; r1=r1 FS $2} !(NR%2){r2=r2 FS $2} END{print h; print r1; print r2}' | column -t A1 A2 A3 A4 A29 57,624,609,830 49,837,119,260 839,812,303 843,568,192 1,364,178,406 57,624,609,830 49,837,119,260 839,812,303 843,568,192 1,364,178,406 or combine both into a single script, especially if your records are already sorted. UPDATE Combined script starting from the original input file $ awk '/A[0-9]+/ && NF==3{if (!a[$1]++) {h=h FS $1; r1=r1 FS $2} else {r2=r2 FS $2}} END{print h; print r1; print r2}' file | column -t A1 A2 A3 A4 A29 57,624,609,830 49,837,119,260 839,812,303 843,568,192 1,364,178,406 57,624,609,830 49,837,119,260 839,812,303 843,568,192 1,364,178,406
In linux bash reverse file lines order but for blocks each 3 lines
I would like to reverse a file however in this file I have records 3 lines each a1 a2 a3 ... x1 x2 x3 and I would like to get such file x1 x2 x3 ... a1 a2 a3 I use Linux so tail -r doesn't work for me.
You can do this all in awk, using an associative array: BEGIN { j=1 } ++i>3 { i=1; ++j } { a[j,i]=$0 } END{ for(m=j;m>0;--m) for(n=1;n<=3;++n) print a[m,n] } Run it like this: awk -f script.awk file.txt or of course, if you prefer a one-liner, you can use this: awk 'BEGIN{j=1}++i>3{i=1;++j}{a[j,i]=$0}END{for(m=j;m>0;--m)for(n=1;n<=3;++n)print a[m,n]}' file.txt Explanation This uses two counters: i which runs from 1 to 3 and j, which counts the number of groups of 3 lines. All lines are stored in the associative array a and printed in reverse in the END block. Testing it out $ cat file a1 a2 a3 b1 b2 b3 x1 x2 x3 $ awk 'BEGIN{j=1}++i>3{i=1;++j}{a[j,i]=$0}END{for(m=j;m>0;--m)for(n=1;n<=3;++n)print a[m,n]}' file x1 x2 x3 b1 b2 b3 a1 a2 a3
This is so ugly that I'm kinda ashamed to even post it... so I guess I'll delete it as soon as a more decent answer pops up. tac /path/to/file | awk '{ a[(NR-1)%3]=$0; if (NR%3==0) { print a[2] "\n" a[1] "\n" a[0] }}'
With the file: ~$ cat f 1 2 3 4 5 6 7 8 9 with awk: store the first line in a, then append each line on top of a and for the third line print/reinitialise: ~$ awk '{a=$0"\n"a}NR%3==0{print a}NR%3==1{a=$0}' f 3 2 1 6 5 4 9 8 7 then use tac to reverse again: ~$ awk '{a=$0"\n"a}NR%3==0{print a}NR%3==1{a=$0}' f | tac 7 8 9 4 5 6 1 2 3
Another way in awk awk '{a[i]=a[i+=(NR%3==1)]?a[i]"\n"$0:$0}END{for(i=NR/3;i>0;i--)print a[i]}' file Input a1 a2 a3 x1 x2 x3 b1 b2 b3 Output b1 b2 b3 x1 x2 x3 a1 a2 a3
Here's a pure Bash (Bash≥4) possibility that should be okay for files that are not too large. We also assume that the number of lines in your file is a multiple of 3. mapfile -t ary < /path/to/file for((i=3*(${#ary[#]}/3-1);i>=0;i-=3)); do printf '%s\n' "${ary[#]:i:3}" done
Count unique elements in a file per line
Let's say i have a file with 5 elements on each line. $ cat myfile.txt e1 e2 e3 e4 e5 e1 e1 e2 e2 e1 e1 e1 e4 e4 e4 for each line i want to do the following command to count the unique elements on each line.: tr \\t \\n | sort -u | wc I can't figure out the first part of the command - can somebody help me? Disclaimer: The file really looks like shown below - but i do xargs -L 5 to get the output as shown in the first part. e1 e2 e3 e4 e5
Given your input file: $ cat file e1 e2 e3 e4 e5 e1 e1 e2 e2 e1 e1 e1 e4 e4 e4 Unique elements in the file using awk: awk '{for(i=1;i<=NF;i++) a[$i]} END{for (keys in a) print keys}' e1 e2 e3 e4 e5 Unique elements in the file using grep instead of tr: $ grep -Eo '\w+' file | sort -u e1 e2 e3 e4 e5 Unique elements per line in the file: Using awk: $ awk '{for(i=1;i<=NF;i++) a[$i]; print length(a); delete a}' file 5 2 2 awk solutions really are the way to go here but using bash since you tagged it: #!/bin/bash while read line; do echo $line | grep -Eo '\w+' | sort -u | wc -l done < file Output: 5 2 2
You can use this: perl -F -lane '$count{$_}++ for (#F);print scalar values %count;undef %count' your_file Tested below: > cat temp e1 e2 e3 e4 e5 e1 e1 e2 e2 e1 e1 e1 e4 e4 e4 > perl -F -lane '$count{$_}++ for (#F);print scalar values %count;undef %count' temp 5 2 2 >
Here's a perl version if you fancy one: perl -F'\s' -pane '%H=map{$_=>1}#F; $_=keys(%H)."\n"' myfile.txt