Say I have the following kinds of files:
file1.txt:
a a c
b b c
c c c
d d c
e e c
a a c
b b c
c c c
d d c
e e c
file2.txt:
—————
—————
—————
How do I get the contents from file2.txt so that I end up with file1.txt that says:
a a c
b b c
c c c
—————
—————
—————
d d c
e e c
a a c
b b c
c c c
d d c
e e c
...without just adding the contents after the 3rd line (first line with c c c).
Using GNU sed (The command needs to be spread across multiple lines):
sed '0,/c c c/ {
/c c c/r file2.txt
}' file1.txt
a a c
b b c
c c c
—————
—————
—————
d d c
e e c
a a c
b b c
c c c
d d c
e e c
awk 'NR==FNR{buf = buf $0 RS;next} {print} /c c c/ && !done{ printf "%s", buf; done=1 }' file2.txt file1.txt
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I was wondering if this could be possible:
I have two files:
file a:
100005282 C
100016196 G
100011755 C
100012890 G
100016339 C
100013563 C
100015603 G
100008436 G
100004906 C
and file b:
rs10904494 100004906 A C
rs11591988 100005282 C T
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
rs9419478 100015603 G C
rs11253562 100016196 G T
rs4881551 100016339 C A
Based on the numbers in $1 from file a and $2 from file b, comparing the letters in $2 in file a with the same numbers in file b, at the end the result must be like this:
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
Showing only the results that dont match.
Can be possible do this with awk?
If you're having trouble with awk, perhaps using grep would be simpler, e.g.
cat file1.txt
100005282 C
100016196 G
100011755 C
100012890 G
100016339 C
100013563 C
100015603 G
100008436 G
100004906 C
cat file2.txt
rs10904494 100004906 A C
rs11591988 100005282 C T
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
rs9419478 100015603 G C
rs11253562 100016196 G T
rs4881551 100016339 C A
grep -vFwf file1.txt file2.txt
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
Otherwise, this should work for your use-case:
awk -F'\t' 'NR==FNR {A[$1,$2]; next} !($2,$3) in A' file1.txt file2.txt
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
this seems like the logic you're looking for
$ awk 'NR==FNR{a[$1]=$2; next} a[$2]!=$3' file1 file2
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
match file1 $1 with file2 $2 AND print when file1 $2 != file2 $3
I have a tab separated file A containing several values per row:
A B C D E
F G H I
J K L M
N O P
Q R S T
U V
X Y Z
I want to remove from file A the elements contained in the following file B:
A D
J M
U V
resulting in a file C:
B C E
F G H I
K L
N O P
Q R S T
X Y Z
Is there a way of doing this using bash?
In case the entries do not contain any special symbols for sed (for instance ()[]/\.*?+) you can use the following command:
mapfile -t array < <(<B tr '\t' '\n')
(IFS='|'; sed -r "s/(${array[*]})\t?//g;/^$/d" A > C)
This command reads file B into an array. From the array a sed command is constructed. The sed command will filter out all entries and delete blank lines.
In your example, the constructed command ...
sed -r 's/(A|D|J|M|U|V)\t?//g;/^$/d' A > C
... generates the following file C (spaces are actually tabs)
B C E
F G H I
K L
N O P
Q R S T
X Y Z
awk solution:
awk 'NR == FNR{ pat = sprintf("%s%s|%s", (pat? pat "|":""), $1, $2); next }
{
gsub("^(" pat ")[[:space:]]*|[[:space:]]*(" pat ")", "");
if (NF) print
}' file_b file_a
The output:
B C E
F G H I
K L
N O P
Q R S T
X Y Z
I want to replace all B after '='.
echo A, B = A B B A B B B | sed 's/=\(.*\)B\(.*\)/=\1C\2/g'
The expected result should be
A, B = A C C A C C C
But I got this result:
A, B = A B B A B B C
Only the last matched pattern be replaced. How to resolve it?
Use this sed:
sed ':loop; s/\(=.*\)B\(.*\)/\1C\2/; t loop'
Test:
$ echo A, B = A B B A B B B | sed ':loop; s/\(=.*\)B\(.*\)/\1C\2/; t loop'
A, B = A C C A C C C
Same kind of idea as #sat but starting from beginning of string
sed -e ':cycle' -e 's/\(.*=.*\)B/\1C/;t cycle'
posix compliant so should works on any sed
Here is the input file and output, I think characters like c and g should not be output?
$ uniq c.txt
a
g
b
g
c
v
c
$ cat c.txt
a
g
b
b
g
g
c
v
c
thanks in advance,
Lin
From the uniq man page:
Repeated lines in the input will not be detected if they are not
adjacent, so it may be necessary to sort the files first.
macbook:stackoverflow joeyoung$ cat c.txt
a
g
b
b
g
g
c
v
c
macbook:stackoverflow joeyoung$ uniq c.txt
a
g
b
g
c
v
c
macbook:stackoverflow joeyoung$ sort -u c.txt
a
b
c
g
v
macbook:stackoverflow joeyoung$ sort c.txt | uniq
a
b
c
g
v
My count if function won't work for the letter "C". I checked for spaces with len function and I am super confused. Thanks for the help.
#of Accident Type
A 28
B 19
C =COUNTIF(A2:A101, "*C*")
D 17
E 9
F 9
Accidents
A
B
D
A
A
F
C
A
C
B
E
B
A
C
F
D
B
C
D
A
A
C
B
E
B
C
E
A
B
A
A
A
B
C
C
D
F
D
B
B
A
F
C
B
A
C
B
E
E
D
A
B
C
E
A
A
F
C
B
D
D
D
B
D
C
A
F
A
A
B
D
E
A
E
D
B
C
A
F
A
C
D
D
A
A
B
A
F
D
C
A
C
B
F
D
A
E
A
C
D
Seems to work fine, but I did not put the asterisks. Each cell (copied form the data example you gave) only has the character : no spaces...
Probably it does not work, because the C is written in Cyrillic. To make sure, whether this is the case, write C in English additionally and try to change the font to something Fancy - e.g. Algerian. Then the two C will be obviously different:
=COUNTIF(A2:A101,"*C*" )
For some reason its working now... I changed the font and messed around with it.
Thanks again for the help!!!!