strange uniq output on Mac - linux

Here is the input file and output, I think characters like c and g should not be output?
$ uniq c.txt
a
g
b
g
c
v
c
$ cat c.txt
a
g
b
b
g
g
c
v
c
thanks in advance,
Lin

From the uniq man page:
Repeated lines in the input will not be detected if they are not
adjacent, so it may be necessary to sort the files first.
macbook:stackoverflow joeyoung$ cat c.txt
a
g
b
b
g
g
c
v
c
macbook:stackoverflow joeyoung$ uniq c.txt
a
g
b
g
c
v
c
macbook:stackoverflow joeyoung$ sort -u c.txt
a
b
c
g
v
macbook:stackoverflow joeyoung$ sort c.txt | uniq
a
b
c
g
v

Related

Use awk with two files as a filter [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I was wondering if this could be possible:
I have two files:
file a:
100005282 C
100016196 G
100011755 C
100012890 G
100016339 C
100013563 C
100015603 G
100008436 G
100004906 C
and file b:
rs10904494 100004906 A C
rs11591988 100005282 C T
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
rs9419478 100015603 G C
rs11253562 100016196 G T
rs4881551 100016339 C A
Based on the numbers in $1 from file a and $2 from file b, comparing the letters in $2 in file a with the same numbers in file b, at the end the result must be like this:
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
Showing only the results that dont match.
Can be possible do this with awk?
If you're having trouble with awk, perhaps using grep would be simpler, e.g.
cat file1.txt
100005282 C
100016196 G
100011755 C
100012890 G
100016339 C
100013563 C
100015603 G
100008436 G
100004906 C
cat file2.txt
rs10904494 100004906 A C
rs11591988 100005282 C T
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
rs9419478 100015603 G C
rs11253562 100016196 G T
rs4881551 100016339 C A
grep -vFwf file1.txt file2.txt
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
Otherwise, this should work for your use-case:
awk -F'\t' 'NR==FNR {A[$1,$2]; next} !($2,$3) in A' file1.txt file2.txt
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
this seems like the logic you're looking for
$ awk 'NR==FNR{a[$1]=$2; next} a[$2]!=$3' file1 file2
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
match file1 $1 with file2 $2 AND print when file1 $2 != file2 $3

remove values from a file present in another file using bash

I have a tab separated file A containing several values per row:
A B C D E
F G H I
J K L M
N O P
Q R S T
U V
X Y Z
I want to remove from file A the elements contained in the following file B:
A D
J M
U V
resulting in a file C:
B C E
F G H I
K L
N O P
Q R S T
X Y Z
Is there a way of doing this using bash?
In case the entries do not contain any special symbols for sed (for instance ()[]/\.*?+) you can use the following command:
mapfile -t array < <(<B tr '\t' '\n')
(IFS='|'; sed -r "s/(${array[*]})\t?//g;/^$/d" A > C)
This command reads file B into an array. From the array a sed command is constructed. The sed command will filter out all entries and delete blank lines.
In your example, the constructed command ...
sed -r 's/(A|D|J|M|U|V)\t?//g;/^$/d' A > C
... generates the following file C (spaces are actually tabs)
B C E
F G H I
K L
N O P
Q R S T
X Y Z
awk solution:
awk 'NR == FNR{ pat = sprintf("%s%s|%s", (pat? pat "|":""), $1, $2); next }
{
gsub("^(" pat ")[[:space:]]*|[[:space:]]*(" pat ")", "");
if (NF) print
}' file_b file_a
The output:
B C E
F G H I
K L
N O P
Q R S T
X Y Z

Split a single row of data (dat file) into multiple columns

I want to split a row of data into multiple columns like
a.dat
A B C D E F G H I J K L M N O P Q R S T U
into
b.dat
A B C D E F G
H I J K L M N
O P Q R S T U
I have tried using the pr function
pr -ts" " --columns 7 --across a.dat > b.dat
But it doesn't work, b.dat is the same as a.dat
I like fold for these thingies:
$ fold -w 14 file
A B C D E F G
H I J K L M N
O P Q R S T U
With -w you set the width you desire to have.
Although xargs is more useful if you want to split based on number of fields instead of characters:
$ xargs -n 7 < file
A B C D E F G
H I J K L M N
O P Q R S T U
Regarding your attempt in pr: I don't really know why it is not working, although from some examples I see it doesn't look like the tool for such job.

Sed replace all matches after some patterns

I want to replace all B after '='.
echo A, B = A B B A B B B | sed 's/=\(.*\)B\(.*\)/=\1C\2/g'
The expected result should be
A, B = A C C A C C C
But I got this result:
A, B = A B B A B B C
Only the last matched pattern be replaced. How to resolve it?
Use this sed:
sed ':loop; s/\(=.*\)B\(.*\)/\1C\2/; t loop'
Test:
$ echo A, B = A B B A B B B | sed ':loop; s/\(=.*\)B\(.*\)/\1C\2/; t loop'
A, B = A C C A C C C
Same kind of idea as #sat but starting from beginning of string
sed -e ':cycle' -e 's/\(.*=.*\)B/\1C/;t cycle'
posix compliant so should works on any sed

regex - append file contents after first match

Say I have the following kinds of files:
file1.txt:
a a c
b b c
c c c
d d c
e e c
a a c
b b c
c c c
d d c
e e c
file2.txt:
—————
—————
—————
How do I get the contents from file2.txt so that I end up with file1.txt that says:
a a c
b b c
c c c
—————
—————
—————
d d c
e e c
a a c
b b c
c c c
d d c
e e c
...without just adding the contents after the 3rd line (first line with c c c).
Using GNU sed (The command needs to be spread across multiple lines):
sed '0,/c c c/ {
/c c c/r file2.txt
}' file1.txt
a a c
b b c
c c c
—————
—————
—————
d d c
e e c
a a c
b b c
c c c
d d c
e e c
awk 'NR==FNR{buf = buf $0 RS;next} {print} /c c c/ && !done{ printf "%s", buf; done=1 }' file2.txt file1.txt

Resources