Count unique elements in a file per line

Count unique elements in a file per line - linux

Let's say i have a file with 5 elements on each line.
$ cat myfile.txt
e1 e2 e3 e4 e5
e1 e1 e2 e2 e1
e1 e1 e4 e4 e4
for each line i want to do the following command to count the unique elements on each line.:
tr \\t \\n | sort -u | wc
I can't figure out the first part of the command - can somebody help me?
Disclaimer: The file really looks like shown below - but i do xargs -L 5 to get the output as shown in the first part.
e1
e2
e3
e4
e5

Given your input file:
$ cat file
e1 e2 e3 e4 e5
e1 e1 e2 e2 e1
e1 e1 e4 e4 e4
Unique elements in the file using awk:
awk '{for(i=1;i<=NF;i++) a[$i]} END{for (keys in a) print keys}'
e1
e2
e3
e4
e5
Unique elements in the file using grep instead of tr:
$ grep -Eo '\w+' file | sort -u
e1
e2
e3
e4
e5
Unique elements per line in the file:
Using awk:
$ awk '{for(i=1;i<=NF;i++) a[$i]; print length(a); delete a}' file
5
2
2
awk solutions really are the way to go here but using bash since you tagged it:
#!/bin/bash
while read line; do
echo $line | grep -Eo '\w+' | sort -u | wc -l
done < file
Output:
5
2
2

You can use this:
perl -F -lane '$count{$_}++ for (#F);print scalar values %count;undef %count' your_file
Tested below:
> cat temp
e1 e2 e3 e4 e5
e1 e1 e2 e2 e1
e1 e1 e4 e4 e4
> perl -F -lane '$count{$_}++ for (#F);print scalar values %count;undef %count' temp
5
2
2
>

Here's a perl version if you fancy one:
perl -F'\s' -pane '%H=map{$_=>1}#F; $_=keys(%H)."\n"' myfile.txt

Related

Combine all the columns of two files using bash

I have two files
A B C D E F
B D F A C E
D E F A B C
and
1 2 3 4 5 6
2 4 6 1 3 5
4 5 6 1 2 3
I want to have something like this:
A1 B2 C3 D4 E5 F6
B2 D4 F6 A1 C3 E5
D4 E5 F6 A1 B2 C3
I mean, combine both files pasting the content of all columns.
Thank you very much!

Here's a bash solution:
paste -d' ' file1 file2 \
| while read -a fields ; do
(( width=${#fields[#]}/2 ))
for ((i=0; i<width; ++i)) ; do
printf '%s%s ' "${fields[i]}" "${fields[ i + width ]}"
done
printf '\n'
done
paste outputs the files side by side.
read -a reads the columns into an array.
in the for loop, we iterate over the array and print the corresponding values.

Could you please try following, trying to do some fun with combinations of xargs + paste here.
xargs -n6 < <(paste -d'\0' <(xargs -n1 < Input_file1) <(xargs -n1 < Input_file2))

Bash searching for words in file with same characters [duplicate]

is it possible to write a bash script that can read in each line from a file and generate permutations (without repetition) for each? Using awk / perl is fine.
File
----
ab
abc
Output
------
ab
ba
abc
acb
bac
bca
cab
cba

I know I am a little late to the game but why not brace expansion?
For example:
echo {a..z}{0..9}
Outputs:
a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 i0 i1 i2 i3 i4 i5 i6 i7 i8 i9 j0 j1 j2 j3 j4 j5 j6 j7 j8 j9 k0 k1 k2 k3 k4 k5 k6 k7 k8 k9 l0 l1 l2 l3 l4 l5 l6 l7 l8 l9 m0 m1 m2 m3 m4 m5 m6 m7 m8 m9 n0 n1 n2 n3 n4 n5 n6 n7 n8 n9 o0 o1 o2 o3 o4 o5 o6 o7 o8 o9 p0 p1 p2 p3 p4 p5 p6 p7 p8 p9 q0 q1 q2 q3 q4 q5 q6 q7 q8 q9 r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 s0 s1 s2 s3 s4 s5 s6 s7 s8 s9 t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 u0 u1 u2 u3 u4 u5 u6 u7 u8 u9 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 y0 y1 y2 y3 y4 y5 y6 y7 y8 y9 z0 z1 z2 z3 z4 z5 z6 z7 z8 z9
Another useful example:
for X in {a..z}{a..z}{0..9}{0..9}{0..9}
do echo $X;
done

Pure bash (using local, faster, but can't beat the other answer using awk below, or the Python below):
perm() {
local items="$1"
local out="$2"
local i
[[ "$items" == "" ]] && echo "$out" && return
for (( i=0; i<${#items}; i++ )) ; do
perm "${items:0:i}${items:i+1}" "$out${items:i:1}"
done
}
while read line ; do perm $line ; done < File
Pure bash (using subshell, much slower):
perm() {
items="$1"
out="$2"
[[ "$items" == "" ]] && echo "$out" && return
for (( i=0; i<${#items}; i++ )) ; do
( perm "${items:0:i}${items:i+1}" "$out${items:i:1}" )
done
}
while read line ; do perm $line ; done < File
Since asker mentioned Perl is fine, I think Python 2.6+/3.X is fine, too:
python -c "from itertools import permutations as p ; print('\n'.join([''.join(item) for line in open('File') for item in p(line[:-1])]))"
For Python 2.5+/3.X:
#!/usr/bin/python2.5
# http://stackoverflow.com/questions/104420/how-to-generate-all-permutations-of-a-list-in-python/104436#104436
def all_perms(str):
if len(str) <=1:
yield str
else:
for perm in all_perms(str[1:]):
for i in range(len(perm)+1):
#nb str[0:1] works in both string and list contexts
yield perm[:i] + str[0:1] + perm[i:]
print('\n'.join([''.join(item) for line in open('File') for item in all_perms(line[:-1])]))
On my computer using a bigger test file:
First Python code
Python 2.6: 0.038s
Python 3.1: 0.052s
Second Python code
Python 2.5/2.6: 0.055s
Python 3.1: 0.072s
awk: 0.332s
Bash (local): 2.058s
Bash (subshell): 22+s

Using the crunch util, and bash:
while read a; do crunch 0 0 -p "$a"; done 2> /dev/null < File
Output:
ab
ba
abc
acb
bac
bca
cab
cba
Tutorial here https://pentestlab.blog/2012/07/12/creating-wordlists-with-crunch/

A faster version using awk
function permute(s, st, i, j, n, tmp) {
n = split(s, item,//)
if (st > n) { print s; return }
for (i=st; i<=n; i++) {
if (i != st) {
tmp = item[st]; item[st] = item[i]; item[i] = tmp
nextstr = item[1]
for (j=2; j<=n; j++) nextstr = nextstr delim item[j]
}else {
nextstr = s
}
permute(nextstr, st+1)
n = split(s, item, //)
}
}
{ permute($0,1) }
usage:
$ awk -f permute.awk file

See the Perl Cookbook for permutation examples. They're word/number oriented but a simple split()/join() on your above example will suffice.

Bash word-list/dictionary/permutation generator:
The following Bash code generates 3 character permutation over 0-9, a-z, A-Z. It gives you (10+26+26)^3 = 238,328 words in output.
It's not very scalable as you can see you need to increase the number of for loop to increase characters in combination. It would be much faster to write such thing in assembly or C using recursion to increase speed. The Bash code is only for demonstration.
P.S.
You can populate $list variable with list=$(cat input.txt)
#!/bin/bash
list=`echo {0..9} {a..z} {A..Z}`
for c1 in $list
do
for c2 in $list
do
for c3 in $list
do
echo $c1$c2$c3
done
done
done
SAMPLE OUTPUT:
000
001
002
003
004
005
...
...
...
ZZU
ZZV
ZZW
ZZX
ZZY
ZZZ
[babil#quad[13:27:37][~]> wc -l t.out
238328 t.out

$ ruby -ne '$_.chomp.chars.to_a.permutation{|x| puts x.join}' file # ver 1.9.1

Because you can never have enogh cryptic Bash-oneliners:
while read s;do p="$(echo "$s"|sed -e 's/./&,/g' -e 's/,$//')";eval "printf "%s\\\\n" "$(eval 'echo "$(printf "{'"$p"'}%.0s" {0..'"$((${#s}-1))"'})"')"|grep '\(.\)\1*.*\1' -v";echo;done <f
It's pretty fast - at least on my machine here:
$ time while read s;do p="$(echo "$s"|sed -e 's/./&,/g' -e 's/,$//')";eval "printf "%s\\\\n" "$(eval 'echo "$(printf "{'"$p"'}%.0s" {0..'"$((${#s}-1))"'})"')"|grep '\(.\)\1*.*\1' -v";echo;done <f >/dev/null
real 0m0.021s
user 0m0.000s
sys 0m0.004s
But be aware that this one will eat a lot of memory when you go beyond 8 characters...

file named input:
a
b
c
d
If you want the output:
a b
a c
a d
b b
b c
b d
c c
c d
d d
You can try the following bash script:
lines=$(wc -l input | awk '{print $1}')
for ((i=1 ; i<=$lines ; i++)); do
x=$(sed -n ''$i' p' input)
sed -n ''$i',$ p' input > tmp
for j in $(cat tmp) ; do
echo $x $j
done
done

Hows about this one
lines="a b c"
for i in $lines; do
echo $i >tmp
for j in $lines ; do
echo $i $j
done
done
it will print
a a
a b
a c
b a
b b
b c
c a
c b
c c

Just a 4-lines-bash joke - permutation of 4 letters/names:
while read line
do
[ $(sed -r "s/(.)/\1\n/g" <<<$line | sort | uniq | wc -l) -eq 5 ] && echo $line
done <<<$(echo -e {A..D}{A..D}{A..D}{A..D}"\n") | sed -r "s/A/Adams /;s/B/Barth /; s/C/Cecil /; s/D/Devon /;"
Adams Barth Cecil Devon
Adams Barth Devon Cecil
...
I like Bash! :-)

Shell Command 'join' not working

I am trying to join 2 sorted simple file, but for some strange reason, its not working.
f1.txt:
f1 abc
f2 mno
f3 pqr
f2.txt
abc a1
mno a2
pqr a3
Command:
join -t '\t' f1.txt f2.txt -1 2 -2 1 > f3.txt
FYI in f1, f2 the space is actually a tab.
I don't know why this is not working. F3.txt is forming empty.
Please provide any valuable insights.

Using bash join on 2nd column of 1st file and 1st column on 2nd file
$ join -1 2 -2 1 file1 file2 > file3
$ cat file3
abc f1 a1
mno f2 a2
pqr f3 a3
Also join by default de-limits on tab-space characters. The man page of join says the following about the -t flag.
-t CHAR
use CHAR as input and output field separator.
Unless -t CHAR is given, leading blanks separate fields and are ignored,

BashScript: Read a file and process it

I have a file with this structure:
Text...
A B C
A1 57,624,609,830 20.99
A2 49,837,119,260 20.90
A3 839,812,303 20.88
A4 843,568,192 20.87
... 1,016,104,564 20.82
A29 1,364,178,406 16.62
A line of text
Blank
Text
Text
A B C
A1 57,624,609,830 20.99
A2 49,837,119,260 20.90
A3 839,812,303 20.88
A4 843,568,192 20.87
... 1,016,104,564 20.82
A29 1,364,178,406 16.62
and I want to get all the A1s with it's values, then all the A2s with its values and so on.
What I'm doing so far is
cat myFile.csv | awk '{if (NR > 5 && NR <= 29) printf $1"\t"}' > tmp1.csv
I get the A1 A2 A3... in different cells in a new file tmp1.csv
and then
cat myFile.csv | grep A1 | awk '{print $2}'
to get tthe values of A1, copy paste to the column A1 in tmp1 file.
I tried
#!/bin/bash
input="myFile.csv"
while IFS= read -r line
do
awk '{if (NR > 4 && NR <= 28) | grep A1 | awk print $2 }'
done < "$input"
but cannot make it to produce the same result as
A1 A2 A3 A4 ...
57,624,609,830 49,837,119,260 839,812,303 839,812,303 ...
57,624,609,830 49,837,119,260 839,812,303 839,812,303 ...
...
in a file. In other words it would be ideal for me to get from the 5th to the 28th line the $1 in different cells and their $2 in each column accordingly.
UPDATE
cat myFile.csv | awk '{if (NR > 5 && NR <= 29) printf $1"\t"}'
gives me the the content of the lines I care about. How can I loop into the entire file, in all lines to get all the contents? For instance instead of
NR>5 && NR<=29 to have x=1
NR>x+4 && NR<=x+28 and eventually get the content.

awk to the rescue!
$ awk '/A[0-9]+/' file | sed -r 's/^ +//g' | sort -k1.1,1.1 -k1.2n
A1 57,624,609,830 20.99
A1 57,624,609,830 20.99
A2 49,837,119,260 20.90
A2 49,837,119,260 20.90
A3 839,812,303 20.88
A3 839,812,303 20.88
A4 843,568,192 20.87
A4 843,568,192 20.87
A29 1,364,178,406 16.62
A29 1,364,178,406 16.62
or if your sort supports version sort, it will work too. You can restrict pattern match perhaps with adding && NF==3
If you need to transpose layout, you can pipe the output of the first script to
$ ... | awk 'NR%2{h=h FS $1; r1=r1 FS $2} !(NR%2){r2=r2 FS $2}
END{print h; print r1; print r2}' | column -t
A1 A2 A3 A4 A29
57,624,609,830 49,837,119,260 839,812,303 843,568,192 1,364,178,406
57,624,609,830 49,837,119,260 839,812,303 843,568,192 1,364,178,406
or combine both into a single script, especially if your records are already sorted.
UPDATE
Combined script starting from the original input file
$ awk '/A[0-9]+/ && NF==3{if (!a[$1]++) {h=h FS $1; r1=r1 FS $2} else {r2=r2 FS $2}}
END{print h; print r1; print r2}' file |
column -t
A1 A2 A3 A4 A29
57,624,609,830 49,837,119,260 839,812,303 843,568,192 1,364,178,406
57,624,609,830 49,837,119,260 839,812,303 843,568,192 1,364,178,406

In linux bash reverse file lines order but for blocks each 3 lines

I would like to reverse a file however in this file I have records 3 lines each
a1
a2
a3
...
x1
x2
x3
and I would like to get such file
x1
x2
x3
...
a1
a2
a3
I use Linux so tail -r doesn't work for me.

You can do this all in awk, using an associative array:
BEGIN { j=1 }
++i>3 { i=1; ++j }
{ a[j,i]=$0 }
END{ for(m=j;m>0;--m)
for(n=1;n<=3;++n) print a[m,n]
}
Run it like this:
awk -f script.awk file.txt
or of course, if you prefer a one-liner, you can use this:
awk 'BEGIN{j=1}++i>3{i=1;++j}{a[j,i]=$0}END{for(m=j;m>0;--m)for(n=1;n<=3;++n)print a[m,n]}' file.txt
Explanation
This uses two counters: i which runs from 1 to 3 and j, which counts the number of groups of 3 lines. All lines are stored in the associative array a and printed in reverse in the END block.
Testing it out
$ cat file
a1
a2
a3
b1
b2
b3
x1
x2
x3
$ awk 'BEGIN{j=1}++i>3{i=1;++j}{a[j,i]=$0}END{for(m=j;m>0;--m)for(n=1;n<=3;++n)print a[m,n]}' file
x1
x2
x3
b1
b2
b3
a1
a2
a3

This is so ugly that I'm kinda ashamed to even post it... so I guess I'll delete it as soon as a more decent answer pops up.
tac /path/to/file | awk '{ a[(NR-1)%3]=$0; if (NR%3==0) { print a[2] "\n" a[1] "\n" a[0] }}'

With the file:
~$ cat f
1
2
3
4
5
6
7
8
9
with awk: store the first line in a, then append each line on top of a and for the third line print/reinitialise:
~$ awk '{a=$0"\n"a}NR%3==0{print a}NR%3==1{a=$0}' f
3
2
1
6
5
4
9
8
7
then use tac to reverse again:
~$ awk '{a=$0"\n"a}NR%3==0{print a}NR%3==1{a=$0}' f | tac
7
8
9
4
5
6
1
2
3

Another way in awk
awk '{a[i]=a[i+=(NR%3==1)]?a[i]"\n"$0:$0}END{for(i=NR/3;i>0;i--)print a[i]}' file
Input
a1
a2
a3
x1
x2
x3
b1
b2
b3
Output
b1
b2
b3
x1
x2
x3
a1
a2
a3

Here's a pure Bash (Bash≥4) possibility that should be okay for files that are not too large.
We also assume that the number of lines in your file is a multiple of 3.
mapfile -t ary < /path/to/file
for((i=3*(${#ary[#]}/3-1);i>=0;i-=3)); do
printf '%s\n' "${ary[#]:i:3}"
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Count unique elements in a file per line - linux

You can use this: perl -F -lane '$count{$_}++ for (#F);print scalar values %count;undef %count' your_file Tested below: > cat temp e1 e2 e3 e4 e5 e1 e1 e2 e2 e1 e1 e1 e4 e4 e4 > perl -F -lane '$count{$_}++ for (#F);print scalar values %count;undef %count' temp 5 2 2 >

Here's a perl version if you fancy one: perl -F'\s' -pane '%H=map{$_=>1}#F; $_=keys(%H)."\n"' myfile.txt

Related

Combine all the columns of two files using bash

Bash searching for words in file with same characters [duplicate]

Shell Command 'join' not working

BashScript: Read a file and process it

In linux bash reverse file lines order but for blocks each 3 lines

Categories

Resources