Join two files including unmatched lines in Shell - linux

File1.log
207.46.13.90 37556
157.55.39.51 34268
40.77.167.109 21824
157.55.39.253 19683
File2.log
207.46.13.90 62343
157.55.39.51 58451
157.55.39.200 37675
40.77.167.109 21824
Below should be expected
Output.log
207.46.13.90 37556 62343
157.55.39.51 34268 58451
157.55.39.200 ----- 37675
40.77.167.109 21824 21824
157.55.39.253 19683 -----
I tried with the below 'join' command - but it skips the missing line
join --nocheck-order File1.log File2.log
outputting like below (not as expected)
207.46.13.90 37556 62343
157.55.39.51 34268 58451
40.77.167.109 21824 21824
Could someone please help with the proper command for the desired output. Thanks in advance

Could you please try following.
awk '
FNR==NR{
a[$1]=$2
next
}
($1 in a){
print $0,a[$1]
b[$1]
next
}
{
print $1,$2 " ----- "
}
END{
for(i in a){
if(!(i in b)){
print i" ----- "a[i]
}
}
}
' Input_file2 Input_file1
Output will be as follows.
207.46.13.90 37556 62343
157.55.39.51 34268 58451
40.77.167.109 21824 21824
157.55.39.253 19683 -----
157.55.39.200 ----- 37675

The following is just enough if you don't care about sorting order of the output:
join -a1 -a2 -e----- -oauto <(sort file1.log) <(sort file2.log) |
column -t -s' ' -o' '
with recreation of the input files:
cat <<EOF >file1.log
207.46.13.90 37556
157.55.39.51 34268
40.77.167.109 21824
157.55.39.253 19683
EOF
cat <<EOF >file2.log
207.46.13.90 62343
157.55.39.51 58451
157.55.39.200 37675
40.77.167.109 21824
EOF
outputs:
157.55.39.200 ----- 37675
157.55.39.253 19683 -----
157.55.39.51 34268 58451
207.46.13.90 37556 62343
40.77.167.109 21824 21824
join by default joins by the first columns. The -a1 -a2 make it print the unmatched lines from both inputs. The -e----- prints unknown columns as dots. The -oauto determinates the output from the columns of the inputs. Because we want to sort on the first column, we don't need to specif -k1 to sort, but sort -s -k1 could speed things up. To match the expected output, I also piped to column.
You can sort the output by ports by pipeing it to for example to sort -rnk2,3.

Related

Linux: how do I turn a text file into a variable that is a list of strings?

I used the split command to generate the files: fileaa .. fileaz and fileba ... filebd. I have written these names to the file filenames.list.txt that looks like this:
fileaa fileab fileac filead fileae fileaf fileag fileah fileai fileaj fileak fileal fileam filean fileao fileap fileaq filear fileas fileat fileau fileav fileaw fileax fileay fileaz fileba filebb filebc filebd
I want to write this list from the text file into a variable in the following script:
file='fileaa fileab fileac filead fileae fileaf fileag fileah fileai fileaj fileak fileal fileam filean fileao fileap fileaq filear fileas fileat fileau fileav fileaw fileax fileay fileaz fileba filebb filebc filebd'
for k in {1..30}
do
cat header$k.txt $file > run_mash$k.sh
done
The final result that I want is the following
cat header1.txt fileaa > run_mash1.sh
cat header2.txt fileab > run_mash2.sh
.
.
cat header26.txt fileaz > run_mash26.sh
cat header27.txt fileba > run_mash27.sh
.
.
cat header30.txt filebd > run_mash30.sh
I got it working:
File list: /tmp/listfiles.txt
~ > cat /tmp/listfiles.txt at 14:35:53
fileaa fileab fileac filead fileae fileaf fileag fileah fileai fileaj fileak fileal fileam filean fileao fileap fileaq filear fileas fileat fileau fileav fileaw fileax fileay fileaz fileba filebb filebc filebd
Script: /tmp/script.sh
~ > cat /tmp/script.sh at 14:35:56
#!/bin/bash
file=`cat /tmp/listfiles.txt`
IDX=1
for i in $file
do
echo "cat header${IDX}.txt $i > run_mash${IDX}.sh"
((IDX++))
done
Example execution:
~ > bash /tmp/script.sh INT at 14:36:56
cat header1.txt fileaa > run_mash1.sh
cat header2.txt fileab > run_mash2.sh
cat header3.txt fileac > run_mash3.sh
cat header4.txt filead > run_mash4.sh
cat header5.txt fileae > run_mash5.sh
cat header6.txt fileaf > run_mash6.sh
cat header7.txt fileag > run_mash7.sh
cat header8.txt fileah > run_mash8.sh
cat header9.txt fileai > run_mash9.sh
cat header10.txt fileaj > run_mash10.sh
cat header11.txt fileak > run_mash11.sh
cat header12.txt fileal > run_mash12.sh
cat header13.txt fileam > run_mash13.sh
cat header14.txt filean > run_mash14.sh
cat header15.txt fileao > run_mash15.sh
cat header16.txt fileap > run_mash16.sh
cat header17.txt fileaq > run_mash17.sh
cat header18.txt filear > run_mash18.sh
cat header19.txt fileas > run_mash19.sh
cat header20.txt fileat > run_mash20.sh
cat header21.txt fileau > run_mash21.sh
cat header22.txt fileav > run_mash22.sh
cat header23.txt fileaw > run_mash23.sh
cat header24.txt fileax > run_mash24.sh
cat header25.txt fileay > run_mash25.sh
cat header26.txt fileaz > run_mash26.sh
cat header27.txt fileba > run_mash27.sh
cat header28.txt filebb > run_mash28.sh
cat header29.txt filebc > run_mash29.sh
cat header30.txt filebd > run_mash30.sh
Just remove the echo inside the script to make the script execute all the cat commands`.|
Regards.

How to forcefully copy a file from Hdfs to linux file system?

For the command, -copyFromLocal there is an option with -f which will forcefully copy the data from Local file system to Hdfs. Similarly with -copyToLocal option I tried with -f option but, it didn't work. So, can anyone please guide me on that.
Thanks,
Karthik
There is not such -f for copytolocal
$ hadoop fs -help
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
Pls refer this for more info Hadoop hdfs commands

How to duplicate string until another string in bash

I have a text file that is structure like below :
293.0 2305.3 1508.0
2466.3 1493.0
2669.5 1578.6
3497.2 1768.9
4265.5 2092.4
5940.8 2558.6
7308.7 3015.4
9377.7 3814.6
295.0 2331.4 1498.1
3617.0 1893.2
I'm still new in Linux, is there anyway for it to be output as desire like an example below :
293.0 2305.3 1508.0
293.0 2466.3 1493.0
293.0 2669.5 1578.6
293.0 3497.2 1768.9
293.0 4265.5 2092.4
293.0 5940.8 2558.6
293.0 7308.7 3015.4
293.0 9377.7 3814.6
295.0 2331.4 1498.1
295.0 3617.0 1893.2
So basically, I want it to duplicate until it meets another variable.
With Barmar's idea:
If row contains three columns, save first column to a variable and print all three columns. If row contains two columns, print variable and column one and two:
awk 'NF==3{c=$1; print $1,$2,$3}; NF==2{print c,$1,$2}' file
Output:
293.0 2305.3 1508.0
293.0 2466.3 1493.0
293.0 2669.5 1578.6
293.0 3497.2 1768.9
293.0 4265.5 2092.4
293.0 5940.8 2558.6
293.0 7308.7 3015.4
293.0 9377.7 3814.6
295.0 2331.4 1498.1
295.0 3617.0 1893.2
You can also use the following command:
$ sed 's/ \+/|/g' dupl.in | awk 'BEGIN{FS="|"}{if($1){buff=$1; printf "%.1f ", buff;} else {printf "%.1f ", buff;} printf "%1.f %1.f \n",$2, $3}'
293.0 2305 1508
293.0 2466 1493
293.0 2670 1579
293.0 3497 1769
293.0 4266 2092
293.0 5941 2559
293.0 7309 3015
293.0 9378 3815
295.0 2331 1498
295.0 3617 1893
You can also use pure bash style thanks to its array capacities:
$ while read -a f; do \
if [ ! -z ${f[2]} ]; then \
c=${f[0]}; echo ${f[#]}; \
else \
echo $c ${f[#]}; \
fi; \
done <file
I have cut lines for reading purpose.
Line by line (while read), I save fields in an array "f". If the third field is not null (if [ ! -z...]), I save first column and print all fields, else, I print c and the 2 other fields.

How to print most frequent lines of file in linux terminal?

I have file with lines:
<host>\t<ip>\n
and I need to print first 5 most frequent IPs. How can I do that?
For example, if I needed to print 3 most frequent IPs from this file:
host1 192.168.0.26
host2 192.168.0.26
host3 192.168.0.23
host4 192.168.0.24
host5 192.168.0.26
host6 192.168.0.26
host7 192.168.0.25
host8 192.168.0.26
host9 192.168.0.26
host18 192.168.0.22
host22 192.168.0.22
host24 192.168.0.23
I would print:
192.168.0.26
192.168.0.22
192.168.0.23
The following should work. Note that it returns 5 lines, even if there are 10 IPs with the same frequency.
cut -f2 file | sort | uniq -c | sort -n | head -n5

View "cvs diff" output in two columns with vim

I have "cvs diff" output (for all files in project) in unified diff format.
Format could be like this:
Index: somefile.cpp
===================================================================
RCS file: /CVS_repo/SomeProject/somefile.cpp,v
retrieving revision 1.19
diff -r1.19 somefile.cpp
31c31
< return "Read line four times";
---
> return "Read line five times";
36c36
< return "Make a bad thing";
---
> return "Make a good thing";
Index: otherfile.cpp
===================================================================
RCS file: /CVS_repo/SomeProject/otherfile.cpp,v
retrieving revision 1.19
< ........
---
> ........
or even like this:
Index: somefile.cpp
===================================================================
RCS file: /CVS_repo/SomeProject/somefile.cpp,v
retrieving revision 1.19
diff -u -r1.19 somefile.cpp
--- somefile.cpp 13 Mar 2013 08:45:18 -0000 1.19
+++ somefile.cpp 26 Mar 2013 08:10:33 -0000
## -28,12 +28,12 ##
//---------------------------------------------------------------------------
extern "C" char *FuncGetSomeText()
{
- return "Read line four times";
+ return "Read line five times";
}
//---------------------------------------------------------------------------
extern "C" char *FuncGetAwesomeText()
{
- return "Make a bad thing";
+ return "Make a good thing";
}
//---------------------------------------------------------------------------
Index: otherfile.cpp
===================================================================
RCS file: /CVS_repo/SomeProject/otherfile.cpp,v
retrieving revision 1.19
diff -u -r1.19 otherfile.cpp
--- otherfile.cpp 13 Mar 2013 08:45:18 -0000 1.19
+++ otherfile.cpp 26 Mar 2013 08:10:33 -0000
## -28,12 +28,12 ##
//---------------------------------------------------------------------------
extern "C" char *Func()
{
- .......
+ .......
}
//---------------------------------------------------------------------------
Is there any way to view this text side-by-side with vim?
Or maybe it's possible to change default diff tool in cvs to vimdiff?
Some sed magic help me:
\cvs -n diff -u > ~diff.tmp; vim -O <(sed -r -e 's/^\+[^\+]+.*$//g;s/^\+$//g' ~diff.tmp) <(sed -r -e 's/^-[^-]+.*$//g;s/^-$//g' ~diff.tmp) +'set scb | set nowrap | wincmd w | set scb | set nowrap'
It is not perfect solution, but better then nothing. Here what this script doing:
\cvs -n diff -u > ~diff.tmp;
Write CVS diff output in unified format (-u option) to temp file ~diff.tmp. '\' char prevent from taking alias of "cvs" command.
(sed -r -e 's/^\+[^\+]+.*$//g;s/^\+$//g' ~diff.tmp)
(sed -r -e 's/^-[^-]+.*$//g;s/^-$//g' ~diff.tmp)
This commands output text from ~diff.tmp, replace lines beginning with '+' and '-' symbols with empty line.
vim -O <(sed...) <(sed...) +'set scb | set nowrap | wincmd w | set scb | set nowrap'
Open two windows (-O option) with sed's output in each. Command followed '+' set srollbind on and nowrap for first window, then switch to second window (with 'wincmd w') and do same things

Resources