Command line to consider common values in only in specific column

Command line to consider common values in only in specific column - linux

I am looking for an simple command line to help me with the following task.
I have two files and I would like to print the lines for which they have a value in Col2 in common.
For instance File1 is similar to the following 3-column tab separated example
File1
cat big 24
cat small 13
cat red 63
File2
dog big 34
chicken plays 39
fish red 294
desired output
big
red
I have tried commands using the commsyntax: comm /path/to/file1/ /path/to/file2
However, it does not output me anything because the values in Col1 and Col3 will very rarely be in common.
Does anyone have a suggestion as to how this can be solved, maybe awk is a better solution?

if you read the man page of comm, you will see it works with sorted files. But awk is flexible, you can control what you want:
awk 'NR==FNR{a[$2]=1;next}a[$2]{print $2}' file1 file2

You could do it in a single pass with paste and awk:
paste file1 file2 | awk '$2 == $5 { print $2 }'
Output:
big
red

Related

Using cat and grep to print line and its number but ignore at the same time blank lines

I have created a simple script that prints the contents of a text file using cat command. Now I want to print a line along with its number, but at the same time I need to ignore blank lines. The following format is desired:
1 George Jones Berlin 2564536877
2 Mike Dixon Paris 2794321976
I tried using
cat -n catalog.txt | grep -v '^$' catalog.txt
But I get the following results:
George Jones Berlin 2564536877
Mike Dixon Paris 2794321976
I have managed to get rid of the blank lines, but line's number is not printed. What am I doing wrong?
Here are the contents of catalog.txt:
George Jones Berlin 2564536877
Mike Dixon Paris 2794321976

Your solution doesn't work because cat -n catalog.txt is already giving you non-blank lines.
You can pipe grep's output to cat -n:
grep -v '^$' yourFile | cat -n
Example:
test.txt:
Hello
how
are
you
?
$ grep -v '^$' test | cat -n
1 Hello
2 how
3 are
4 you
5 ?

At first glance, you should drop the file name in the command line to grep to make grep read from stdin:
cat -n catalog.txt | grep -v '^$'
^^^
In your code, you supplied catalog.txt to grep, which made it read from the file and ignore its standard input. So you're basically grepping from the file instead of the output of cat piped to its stdin.
To correctly ignore blank lines the prepend line numbers, switch the order of grep and cat:
grep -v '^$' catalog.txt | cat -n

Another awk
$ awk 'NF{$0=FNR " " $0}NF' 48488182
1 George Jones Berlin 2564536877
3 Mike Dixon Paris 2794321976
The second line was blank in this case.

single, simple, basic awk solution could help you here.
Solution 1st:
awk 'NF{print FNR,$0}' Input_file
Solution 2nd: Above will print line number including the line number of NULL lines, in case you want to leave empty lines line number then following may help you in same.
awk '!NF{FNR--;next} NF{print FNR,$0}' Input_file
Solution 3rd: Using only grep, though output will have a colon in between line number and the line.
grep -v '^$' Input_file | grep -n '.*'
Explanation of Solution 1st:
NF: Checking condition here if NF(Number of fields in current line, it is awk's out of the box variable which has the value of number of fields in a line) is NOT NULL, if this condition is TRUE then following the actions mentioned next to it.
{print FNR,$0}: Using print function of awk here to print FNR(Line number, which will have the line's number in it, it is awk's out of box variable) then print $0 which means current line.
By this we satisfy OP's both the conditions of leaving empty lines and print the line numbers along with lines too. I hope this helps you.

Append a file in the middle of another file in bash

I need to append a file in a specific location of another file.
I got the line number so, my file is:
file1.txt:
I
am
Cookie
While the second one is
file2.txt:
a
black
dog
named
So, after the solution, file1.txt should be like
I
am
a
black
dog
named
Cookie
The solution should be compatible with the presence of characters like " and / in both files.
Any tool is ok as long as it's native (I mean, no new software installation).

Another option apart from what RavinderSingh13 suggested using sed:
To add the text of file2.txt into file1.txt after a specific line:
sed -i '2 r file2.txt' file1.txt
Output:
I
am
a
black
dog
named
Cookie
Further to add the file after a matched pattern:
sed -i '/^YourPattern/ r file2.txt' file1.txt

Could you please try following and let me know if this helps you.
awk 'FNR==3{system("cat file2.txt")} 1' file1.txt
Output will be as follows.
I
am
a
black
dog
named
Cookie
Explanation: Checking here if line number is 3 while reading Input_file named file1.txt, if yes then using system utility of awk which will help us to call shell's commands, then I am printing the file2.txt with use of cat command. Then mentioning 1 will be printing all the lines from file1.txt. Thus we could concatenate lines from file2.txt into file1.txt.

How about
head -2 file1 && cat file2 && tail -1 file1
You can count the number of lines to decide head and tail parameters in file1 using
wc -l file1

move everything after the 6th backslash up one line with a linux script

http://www.somesite/play/episodes/xyz/fred-episode-110
http://www.somesite/play/episodes/abc/simon-episode-266
http://www.somesite/play/episodes/qwe/mum-episode-39
http://www.somesite/play/episodes/zxc/dad-episode-41
http://www.somesite/play/episodes/asd/bob-episode-57
i have many url's saved in a txt file like show above
i want to move everything after the 6th backslash up one line with a script
the txt after the 6th backslash is the title and always different
i need to select the title so i can play it
so i need it to look like this
fred-episode-110
http://www.somesite/play/episodes/xyz/fred-episode-110
simon-episode-266
http://www.somesite/play/episodes/abc/simon-episode-266
mum-episode-39
http://www.somesite/play/episodes/qwe/mum-episode-39
dad-episode-41
http://www.somesite/play/episodes/zxc/dad-episode-41
bob-episode-57
http://www.somesite/play/episodes/asd/bob-episode-57
i have
sed
awk
wget
can this be done

Use this command:
awk -F/ '{print $7; print $0}'
E.g.:
awk -F/ '{print $7; print $0}' < file.txt > new-file.txt

just to add to this
awk -F/ '{print $7; print $0}' < file.txt > new-file.txt
is there anyway to remove all the hyphens from just the title and leave a space
some of the titles have lots of hyphens
and it makes it a bit hard to read with the hyphens there
change these
simon-episode-2-playing-football-in-the-park
fred-episode-110-the-big-clash-tonight
bob-episode-57
to
simon episode 2 playing football in the park
fred episode 110 the big clash tonight
bob episode 57
thanks for your expertise and time

how to subtract the two files in linux

I have two files like below:
file1
"Connect" CONNECT_ID="12"
"Connect" CONNECT_ID="11"
"Connect" CONNECT_ID="122"
"Connect" CONNECT_ID="109"
file2
"Quit" CONNECT_ID="12"
"Quit" CONNECT_ID="11"
The file contents are not exactly same but similar to above and the number of records are minimum 100,000.
Now i want to get the result as show below into file1 (means the final result should be there in file1)
"Connect" CONNECT_ID="122"
"Connect" CONNECT_ID="109"
I have used a while loop something like below:
awk {'print $2'} file2 | sed "s/CONNECTION_ID=//g" > sample.txt
while read actual; do
grep -w -v $actual file1 > file1_tmp
mv -f file1_tmp file1
done < sample.txt
Here I have adjusted my code according to example. So it may or may not work.
My problem is the loop is repeating for more than 1 hour to complete the process.
So can any one suggest me how to achieve the same with any other ways like using diff or comm or sed or awk or any other linux command which will run faster?
Here mainly I want to eliminate this big typical while loop.

Most UNIX tools are line based and as you don't have whole line matches that means grep, comm and diff are out the window. To extract field based information like you want awk is perfect:
$ awk 'NR==FNR{a[$2];next}!($2 in a)' file2 file1
"Connect" CONNECT_ID="122"
"Connect" CONNECT_ID="109"
To store the results back to file1 you'll need to redict the output to a temporary file and then move the file into file1 like so:
$ awk 'NR==FNR{a[$2];next}!($2 in a)' file2 file1 > tmp && mv tmp file1
Explanation:
The awk variable NR increments for every record read, that is each line in every file. The FNR variable increments for every record but gets reset for every file.
NR==FNR # This condition is only true when reading file1
a[$2] # Add the second field in file1 into array as a lookup table
next # Get the next line in file1 (skips any following blocks)
!($2 in a) # We are now looking at file2 if the second field not in the look up
# array execute the default block i.e print the line
To modify this command you just need to change the fields that matched. In your real case if you want to match field 1 from file1 with field 4 from file2 then you would do:
$ awk 'NR==FNR{a[$1];next}!($4 in a)' file2 file1

This might work for you (GNU sed):
sed -r 's|\S+\s+(\S+)|/\1/d|' file2 | sed -f - -i file1

The tool best suited to this job is join(1). It joins two files based on values in a given column of each file. Normally it just outputs the lines that match across the two files, but it also has a mode to output the lines from one of the files that do not match the other file.
join requires that the files be sorted on the field(s) you are joining on, so either pre-sort the files, or use process substitution (a bash feature - as in the example below) to do it on the one command line:
$ join -j 2 -v 1 -o "1.1 1.2" <(sort -k2,2 file1) <(sort -k2,2 file2)
"Connect" CONNECT_ID="122"
"Connect" CONNECT_ID="109"
-j 2 says to join the files on the second field for both files.
-v 1 says to only output fields from file 1 that do not match any in file 2
-o "1.1 1.2" says to order the output with the first field of file 1 (1.1) followed by the second field of file 1 (1.2). Without this, join will output the join column first followed by the remaining columns.

You may need to analyze file2 at fist, and append all ID which have appered to a cache(eg. memory)
Than scan file1 line by line to adjust whether the ID in the cache.
python code like this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
p = re.compile(r'CONNECT_ID="(.*)"')
quit_ids = set([])
for line in open('file2'):
m = p.search(line)
if m:
quit_ids.add(m.group(1))
output = open('output_file', 'w')
for line in open('file1'):
m = p.search(line)
if m and m.group(1) not in quit_ids:
output.write(line)
output.close()

The main bottleneck is not really the while loop, but the fact that you rewrite the output file thousands of times.
In your particular case, you might be able to get away with just this:
cut -f2 file2 | grep -Fwvf - file1 >tmp
mv tmp file1
(I don't think the -w option to grep is useful here, but since you had it in your example, I retained it.)
This presupposes that file2 is tab-delimited; if not, the awk '{ print $2 }' file2 you had there is fine.

How do I randomly merge two input files to one output file using unix tools?

I have two text files, of different sizes, which I would like to merge into one file, but with the content mixed randomly; this is to create some realistic data for some unit tests. One text file contains the true cases, while the other the false.
I would like to use standard Unix tools to create the merged output. How can I do this?

Random sort using -R:
$ sort -R file1 file2 -o file3

My version of sort also does not support -R. So here is an alternative using awk by inserting a random number in front of each line and sorting according to those numbers, then strip off the number.
awk '{print int(rand()*1000), $0}' file1 file2 | sort -n | awk '{$1="";print $0}'

This adds a random number to the beginning of each line with awk, sorts based on that number, and then removes it. This will even work if you have duplicates (as pointed out by choroba) and is slightly more cross platform.
awk 'BEGIN { srand() } { print rand(), $0 }' file1 file2 |
sort -n |
cut -f2- -d" "

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Command line to consider common values in only in specific column - linux

if you read the man page of comm, you will see it works with sorted files. But awk is flexible, you can control what you want: awk 'NR==FNR{a[$2]=1;next}a[$2]{print $2}' file1 file2

You could do it in a single pass with paste and awk: paste file1 file2 | awk '$2 == $5 { print $2 }' Output: big red

Related

Using cat and grep to print line and its number but ignore at the same time blank lines

Append a file in the middle of another file in bash

move everything after the 6th backslash up one line with a linux script

how to subtract the two files in linux

How do I randomly merge two input files to one output file using unix tools?

Categories

Resources