Uniq and counts - linux

Have a file with 2 columns,
need to use uniq on column 1 only and print
both the columns in the results as well as the count of the occurrences
(with -c).
Example:
1 a
1 a
2 a
3 c
4 d
2 1 a
1 2 a
1 3 c
1 4 d

echo '1 a
1 a
2 a
3 c
4 d
' | uniq -c
outputs exactly your 2nd block.

It's not clear to me what you mean by "use uniq on column 1 only." What do you want to happen if column 1 appears multiple times with different column 2 values? If this can happen, your question probably needs a little clarification. If this can't happen in your scenario, then the easiest solution is probably
uniq -c filename

if this in a file then
cat filename.txt|awk '{print $1}'|uniq -c

Related

Recode column values in unix

1859115 2258379 24636 Yes 06S14028968 13 1 1 2
1859115 2258379 24636 Yes 06S14028968 13 1 1 2
1859116 2255037 21608 Yes 06S14028969 11 0 2 3
1859117 2268746 34027 Yes 06S14028970 10 0 2 1
Above is the example of my data set. I want to replace the values of 7th column in a way that 1 should be replaced by 2 and 0 should be replaced by 1. So the outcome i am expecting should be like following.
1859115 2258379 24636 Yes 06S14028968 13 2 1 2
1859115 2258379 24636 Yes 06S14028968 13 2 1 2
1859116 2255037 21608 Yes 06S14028969 11 1 2 3
1859117 2268746 34027 Yes 06S14028970 10 1 2 1
I have tried using this approach
awk 'NR==1{$10="Pheno";print;next}\
$7 == "1" {$10="2"};\
$7 == "0" {$10="1"}1' old.txt |column -t > new.txt
and then removing the first row and extracting columns of interest. But i need straight forward way.
This could be simply done by putting a simple condition to check if NF(number of fields in each line) is greater OR equal to 7 then increment 7th field with 1 and print edited/non-edited line then(by doing this we can avoid adding 1 if number of fields are lesser than 7 in any line).
awk 'NF>=7{$7+=1} 1' Input_file

unix join command to return all columns in one file

I have two files that I am joining on one column. After the join, I just want the output to be all of the columns, in the original order, from only one of the files. For example:
cat file1.tsv
1 a ant
2 b bat
3 c cat
8 d dog
9 e eel
cat file2.tsv
1 I
2 II
3 III
4 IV
5 V
join -1 1 -2 1 file1.tsv file2.tsv -t $'\t' -o 1.1,1.2,1.3
1 a ant
2 b bat
3 c cat
I know I an use -o 1.1,1.2.. notation but my file has over two dozen columns. Is there some wildcard that I can use to say -o 1.* or something?
I'm not aware of wildcards in the format string.
From your desired output I think that what you want may be achievable like so without having to specify all the enumerations:
grep -f <(awk '{print $1}' file2.tsv ) file1.tsv
1 a ant
2 b bat
3 c cat
Or as an awk-only solution:
awk '{if(NR==FNR){a[$1]++}else{if($1 in a){print}}}' file2.tsv file1.tsv
1 a ant
2 b bat
3 c cat

linux text film, minus number of one column use other column or add one certain number?

like these two column,
1 2
2 3
1 4
1) column2 minus column1
the result will be
1
1
3
2) add 10 to column1
the result will be
11 2
12 3
11 4
Anyone has ideas about how to realize this two result, prefer by "awk" or "sed"?
Thanks in advance.
These two...
$ awk '{print $2-$1}' file
$ awk '{$1+=10}1' file

How to add number of identical line next to the line itself? [duplicate]

This question already has answers here:
Find duplicate lines in a file and count how many time each line was duplicated?
(7 answers)
Closed 7 years ago.
I have file file.txt which look like this
a
b
b
c
c
c
I want to know the command to which get file.txt as input and produces the output
a 1
b 2
c 3
I think uniq is the command you are looking for. The output of uniq -c is a little different from your format, but this can be fixed easily.
$ uniq -c file.txt
1 a
2 b
3 c
If you want to count the occurrences you can use uniq with -c.
If the file is not sorted you have to use sort first
$ sort file.txt | uniq -c
1 a
2 b
3 c
If you really need the line first followed by the count, swap the columns with awk
$ sort file.txt | uniq -c | awk '{ print $2 " " $1}'
a 1
b 2
c 3
You can use this awk:
awk '!seen[$0]++{ print $0, (++c) }' file
a 1
b 2
c 3
seen is an array that holds only uniq items by incrementing to 1 first time an index is populated. In the action we are printing the record and an incrementing counter.
Update: Based on comment below if intent is to get a repeat count in 2nd column then use this awk command:
awk 'seen[$0]++{} END{ for (i in seen) print i, seen[i] }' file
a 1
b 2
c 3

Preserve original order if numeric value is equal in coreutils sort?

Consider this snippet:
echo '7 a
3 c
3 b
2 first
2 second
2 third
2 fourth
2 fifth
9 d
2 sixth
' | sort -n -k 1
It gives an output of:
2 fifth
2 first
2 fourth
2 second
2 sixth
2 third
3 b
3 c
7 a
9 d
While the list is correctly ordered numerically keyed by first character, also for those values which are contiguous and equal, the original order has been shuffled. I would like to obtain:
2 first
2 second
2 third
2 fourth
2 fifth
2 sixth
3 c
3 b
7 a
9 d
Is this possible to do with sort? If not, what would be the easiest way to achieve this kind of sorting using shell tools?
Just add the -s (stable sort) flag, this disables last-resort comparison
echo '7 a
3 c
3 b
2 first
2 second
2 third
2 fourth
2 fifth
9 d
2 sixth
' | sort -k 1,1n -s
2 first
2 second
2 third
2 fourth
2 fifth
2 sixth
3 c
3 b
7 a
9 d
Add line numbers with nl, pipe to sort -k2,1 to use the line numbers as the secondary key, then cut the numbers off with cut. Or use sort -s. :p
You're looking for a "stable" sort. Try the sort -s option (or better yet, check the man page on your system).

Resources