Uniq and counts

Uniq and counts - linux

Have a file with 2 columns,
need to use uniq on column 1 only and print
both the columns in the results as well as the count of the occurrences
(with -c).
Example:
1 a
1 a
2 a
3 c
4 d
2 1 a
1 2 a
1 3 c
1 4 d

echo '1 a
1 a
2 a
3 c
4 d
' | uniq -c
outputs exactly your 2nd block.

It's not clear to me what you mean by "use uniq on column 1 only." What do you want to happen if column 1 appears multiple times with different column 2 values? If this can happen, your question probably needs a little clarification. If this can't happen in your scenario, then the easiest solution is probably
uniq -c filename

if this in a file then
cat filename.txt|awk '{print $1}'|uniq -c

Related

Recode column values in unix

1859115 2258379 24636 Yes 06S14028968 13 1 1 2
1859115 2258379 24636 Yes 06S14028968 13 1 1 2
1859116 2255037 21608 Yes 06S14028969 11 0 2 3
1859117 2268746 34027 Yes 06S14028970 10 0 2 1
Above is the example of my data set. I want to replace the values of 7th column in a way that 1 should be replaced by 2 and 0 should be replaced by 1. So the outcome i am expecting should be like following.
1859115 2258379 24636 Yes 06S14028968 13 2 1 2
1859115 2258379 24636 Yes 06S14028968 13 2 1 2
1859116 2255037 21608 Yes 06S14028969 11 1 2 3
1859117 2268746 34027 Yes 06S14028970 10 1 2 1
I have tried using this approach
awk 'NR==1{$10="Pheno";print;next}\
$7 == "1" {$10="2"};\
$7 == "0" {$10="1"}1' old.txt |column -t > new.txt
and then removing the first row and extracting columns of interest. But i need straight forward way.

This could be simply done by putting a simple condition to check if NF(number of fields in each line) is greater OR equal to 7 then increment 7th field with 1 and print edited/non-edited line then(by doing this we can avoid adding 1 if number of fields are lesser than 7 in any line).
awk 'NF>=7{$7+=1} 1' Input_file

unix join command to return all columns in one file

I have two files that I am joining on one column. After the join, I just want the output to be all of the columns, in the original order, from only one of the files. For example:
cat file1.tsv
1 a ant
2 b bat
3 c cat
8 d dog
9 e eel
cat file2.tsv
1 I
2 II
3 III
4 IV
5 V
join -1 1 -2 1 file1.tsv file2.tsv -t $'\t' -o 1.1,1.2,1.3
1 a ant
2 b bat
3 c cat
I know I an use -o 1.1,1.2.. notation but my file has over two dozen columns. Is there some wildcard that I can use to say -o 1.* or something?

I'm not aware of wildcards in the format string.
From your desired output I think that what you want may be achievable like so without having to specify all the enumerations:
grep -f <(awk '{print $1}' file2.tsv ) file1.tsv
1 a ant
2 b bat
3 c cat
Or as an awk-only solution:
awk '{if(NR==FNR){a[$1]++}else{if($1 in a){print}}}' file2.tsv file1.tsv
1 a ant
2 b bat
3 c cat

linux text film, minus number of one column use other column or add one certain number?

like these two column,
1 2
2 3
1 4
1) column2 minus column1
the result will be
1
1
3
2) add 10 to column1
the result will be
11 2
12 3
11 4
Anyone has ideas about how to realize this two result, prefer by "awk" or "sed"?
Thanks in advance.

These two...
$ awk '{print $2-$1}' file
$ awk '{$1+=10}1' file

How to add number of identical line next to the line itself? [duplicate]

This question already has answers here:
Find duplicate lines in a file and count how many time each line was duplicated?
(7 answers)
Closed 7 years ago.
I have file file.txt which look like this
a
b
b
c
c
c
I want to know the command to which get file.txt as input and produces the output
a 1
b 2
c 3

I think uniq is the command you are looking for. The output of uniq -c is a little different from your format, but this can be fixed easily.
$ uniq -c file.txt
1 a
2 b
3 c

If you want to count the occurrences you can use uniq with -c.
If the file is not sorted you have to use sort first
$ sort file.txt | uniq -c
1 a
2 b
3 c
If you really need the line first followed by the count, swap the columns with awk
$ sort file.txt | uniq -c | awk '{ print $2 " " $1}'
a 1
b 2
c 3

You can use this awk:
awk '!seen[$0]++{ print $0, (++c) }' file
a 1
b 2
c 3
seen is an array that holds only uniq items by incrementing to 1 first time an index is populated. In the action we are printing the record and an incrementing counter.
Update: Based on comment below if intent is to get a repeat count in 2nd column then use this awk command:
awk 'seen[$0]++{} END{ for (i in seen) print i, seen[i] }' file
a 1
b 2
c 3

Preserve original order if numeric value is equal in coreutils sort?

Consider this snippet:
echo '7 a
3 c
3 b
2 first
2 second
2 third
2 fourth
2 fifth
9 d
2 sixth
' | sort -n -k 1
It gives an output of:
2 fifth
2 first
2 fourth
2 second
2 sixth
2 third
3 b
3 c
7 a
9 d
While the list is correctly ordered numerically keyed by first character, also for those values which are contiguous and equal, the original order has been shuffled. I would like to obtain:
2 first
2 second
2 third
2 fourth
2 fifth
2 sixth
3 c
3 b
7 a
9 d
Is this possible to do with sort? If not, what would be the easiest way to achieve this kind of sorting using shell tools?

Just add the -s (stable sort) flag, this disables last-resort comparison
echo '7 a
3 c
3 b
2 first
2 second
2 third
2 fourth
2 fifth
9 d
2 sixth
' | sort -k 1,1n -s
2 first
2 second
2 third
2 fourth
2 fifth
2 sixth
3 c
3 b
7 a
9 d

Add line numbers with nl, pipe to sort -k2,1 to use the line numbers as the secondary key, then cut the numbers off with cut. Or use sort -s. :p

You're looking for a "stable" sort. Try the sort -s option (or better yet, check the man page on your system).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Uniq and counts - linux

Have a file with 2 columns, need to use uniq on column 1 only and print both the columns in the results as well as the count of the occurrences (with -c). Example: 1 a 1 a 2 a 3 c 4 d 2 1 a 1 2 a 1 3 c 1 4 d

echo '1 a 1 a 2 a 3 c 4 d ' | uniq -c outputs exactly your 2nd block.

if this in a file then cat filename.txt|awk '{print $1}'|uniq -c

Related

Recode column values in unix

unix join command to return all columns in one file

linux text film, minus number of one column use other column or add one certain number?

How to add number of identical line next to the line itself? [duplicate]

Preserve original order if numeric value is equal in coreutils sort?

Categories

Resources