how to write a bash script for matching files [duplicate] - linux

This question already has answers here:
Inner join on two text files
(5 answers)
Closed 6 years ago.
I would like to write a script to match two files. I have a file which is always change and a file acts as a database.
Input file1:
1
3
5
7
9
Database matched file1:
A B C D E F
1 0.27776079 0.302853938 1.52415756 2.751714059 1.363932416 2.286189771
2 0.332465 0.777918524 0.705056607 0.484138872 0.443787105 0.848742839
3 0.941768856 0.19125 0.573714912 0.5040488 0.526207725 1.554118026
4 1.717348092 0.19642752 0.315945 0.1331712 0.28427498 0.30113875
5 0.802253697 0.3768849 0.426688 0.27693 0.591697038 0.3832675
6 0.2752232 0.570078 0.3847095 0.659548575 0.327469824 0.3346875
7 0.153272 0.36594447 0.19125 0.526602427 0.44771265 0.31136
8 0.637448551 0.735756919 1.284158594 0.464060016 0.259459816 0.887975536
9 0.397221469 0.20808 0.268226 0.710250679 0.493069267 0.47672443
10 0.196928 0.492713856 0.22302 0.783853054 0.303534 1.736908487
11 0.510789888 0.14948712 0.26432 0.684485438 0.683017627 0.614033957
desired output file1:
A B C D E F
1 0.27776079 0.302853938 1.52415756 2.751714059 1.363932416 2.286189771
3 0.941768856 0.19125 0.573714912 0.5040488 0.526207725 1.554118026
5 0.802253697 0.3768849 0.426688 0.27693 0.591697038 0.3832675
7 0.153272 0.36594447 0.19125 0.526602427 0.44771265 0.31136
9 0.397221469 0.20808 0.268226 0.710250679 0.493069267 0.47672443
I would like to extract the matched lines from the database.
head -1 database1.txt > output1.txt
grep -wf inputfile1.txt database1.txt >> output1.txt
head -1 database1.txt > output2.txt
grep -wf inputfile2.txt database1.txt >> output2.txt
head -1 database2.txt > output3.txt
grep -wf inputfile3.txt database2.txt >> output3.txt
I try to use nano command but every time need to change the syntax.

You can use the join command to join the 2 files on 1st column:
$ cat file1
1
3
5
7
9
$ cat file2
A B C D E F
1 0.27776079 0.302853938 1.52415756 2.751714059 1.363932416 2.286189771
2 0.332465 0.777918524 0.705056607 0.484138872 0.443787105 0.848742839
3 0.941768856 0.19125 0.573714912 0.5040488 0.526207725 1.554118026
4 1.717348092 0.19642752 0.315945 0.1331712 0.28427498 0.30113875
5 0.802253697 0.3768849 0.426688 0.27693 0.591697038 0.3832675
6 0.2752232 0.570078 0.3847095 0.659548575 0.327469824 0.3346875
7 0.153272 0.36594447 0.19125 0.526602427 0.44771265 0.31136
8 0.637448551 0.735756919 1.284158594 0.464060016 0.259459816 0.887975536
9 0.397221469 0.20808 0.268226 0.710250679 0.493069267 0.47672443
10 0.196928 0.492713856 0.22302 0.783853054 0.303534 1.736908487
11 0.510789888 0.14948712 0.26432 0.684485438 0.683017627 0.614033957
$ sed -n '1p' file2 && join --nocheck-order file1 <(sed -n '1!p' file2)
A B C D E F
1 0.27776079 0.302853938 1.52415756 2.751714059 1.363932416 2.286189771
3 0.941768856 0.19125 0.573714912 0.5040488 0.526207725 1.554118026
5 0.802253697 0.3768849 0.426688 0.27693 0.591697038 0.3832675
7 0.153272 0.36594447 0.19125 0.526602427 0.44771265 0.31136
9 0.397221469 0.20808 0.268226 0.710250679 0.493069267 0.47672443
$

Related

Use awk command to get information below a pattern

I have a file with a wide range of information and I want to extract some data from here. I only will post here the interesting part. I want to extract IQ and JQ values as well as the J_ij[meV] value which is two lines above. I read this question How to print 5 consecutive lines after a pattern in file using awk where a pattern is used to extract information bellow and I was thinking doing something similar. My initial idea was:
awk '/IQ =/ { print $6,$12 } /IQ =/ {for(i=2; i<=2; i++){ getline; print $11 }}' input.text > output.txt
Loop appears not to working
IT IQ JT JQ N1 N2 N3 DRX DRY DRZ DR J_ij [mRy] J_ij [meV]
IT = 1 IQ = **1** JT = 1 JQ = **1**
->Q = ( -0.250, 0.722, 0.203) ->Q = ( -0.250, 0.722, 0.203)
1 1 1 1 0 0 0 0.000 0.000 0.000 0.000 0.000000000 **0.000000000**
IT = 1 IQ = **1** JT = 6 JQ = **6**
->Q = ( -0.250, 0.722, 0.203) ->Q = ( 0.000, 1.443, 0.609)
1 1 6 6 -1 0 -1 -0.250 -0.144 -0.406 0.498 0.135692822 **1.846194885**
IT = 1 IQ = **1** JT = 8 JQ = **8**
->Q = ( -0.250, 0.722, 0.203) ->Q = ( 0.000, 0.577, 0.609)
1 1 8 8 0 0 -1 0.250 -0.144 -0.406 0.498 0.017676555 **0.240501782**
My expected output is:
IQ JQ J_ij [meV]
1 1 0.000000000
1 6 1.846194885
1 8 0.240501782
It comes from the bold words (** **), first line is only indicative.
Could you please try following. Written and tested with shown examples.
awk '
BEGIN{
print "IQ JQ J_ij [meV]"
}
FNR>1 && /IQ =/{
value=$6 OFS $12
found=1
next
}
found && NF && !/ ->Q/{
if(value){
print value OFS $NF
}
value=found=""
}' Input_file
Output will be as follows.
IQ JQ J_ij [meV]
1 1 0.000000000
1 6 1.846194885
1 8 0.240501782

Difference between obj-m+=[modules_list] and obj-m=[modules_list]

What's a difference between obj-m=md1.o md2.o and obj-m+=md1.o md2.o in a kernel make-file?
You can try to infer it with an example:
obj-m= a.o b.o
obj-m1= 1 2 3
obj-m= c.o d.o
obj-m1+= 4 5 6
ALL :
echo $(obj-m) -- $(obj-m1)
Its output is
c.o d.o -- 1 2 3 4 5 6
So: = assigns (overwriting), and += appends.

How to get a nice output when calling jconsole?

I've recently started to learn J.
If find it useful when learning a new language to be able to quickly
map a bit of source code to an output and store it for later reference in Emacs org-mode.
But I'm having trouble with the cryptic jconsole when I want to do the evaluation.
For instance jconsole --help doesn't work.
And man jconsole brings up something about a Java tool. Same applies to googling.
I have for instance this bit of code from the tutorial saved in temp.ijs:
m =. i. 3 4
1 { m
23 23 23 23 (1}) m
Now when I run jconsole < temp.ijs, the output is:
4 5 6 7
0 1 2 3
23 23 23 23
8 9 10 11
Ideally, I'd like the output to be:
4 5 6 7
0 1 2 3
23 23 23 23
8 9 10 11
Again, ideally I'd like to have this without changing the source code at all,
i.e. just by passing some flag to jconsole.
Is there a way to do this?
I'm currently going with solving the problem on Emacs side, instead of on jconsole side.
I intersperse the source code with echo'':
(defun org-babel-expand-body:J (body params)
"Expand BODY according to PARAMS, return the expanded body."
(mapconcat #'identity (split-string body "\n") "\necho''\n"))
Execute it like this:
(j-strip-whitespace
(org-babel-eval
(format "jconsole < %s" tmp-script-file) ""))
And post-process assuming that only first row of each array is misaligned
(that has been my experience so far). Here's the result:
#+begin_src J
m =. i. 3 4
1 { m
23 23 23 23 (1}) m
#+end_src
#+RESULTS:
: 4 5 6 7
:
: 0 1 2 3
: 23 23 23 23
: 8 9 10 11
And here's the post-processing code:
(defun whitespacep (str)
(string-match "^ *$" str))
(defun match-second-space (s)
(and (string-match "^ *[^ ]+\\( \\)" s)
(match-beginning 1)))
(defun strip-leading-ws (s)
(and (string-match "^ *\\([^ ].*\\)" s)
(match-string 1 s)))
(defun j-print-block (x)
(if (= 1 (length x))
(strip-leading-ws (car x))
;; assume only first row misaligned
(let ((n1 (match-second-space (car x)))
(n2 (match-second-space (cadr x))))
(setcar
x
(if (and n1 n2)
(substring (car x) (- n1 n2))
(strip-leading-ws (car x))))
(mapconcat #'identity x "\n"))))
(defun j-strip-whitespace (str)
(let ((strs (split-string str "\n" t))
out cur s)
(while (setq s (pop strs))
(if (whitespacep s)
(progn (push (nreverse cur) out)
(setq cur))
(push s cur)))
(mapconcat #'j-print-block
(delq nil (nreverse out))
"\n\n")))
You need to use echo for explicit output, rather than rely on implicit output which is the case for the REPL function of jconsole normally.
Create the script, which I'm calling "tst2.js" below, and place the following code in it:
#!/Applications/j64/bin/jconsole
9!:7'+++++++++|-'
m =. i. 3 4
echo 1 { m
echo ''
echo 23 23 23 23 (1}) m
exit''
Of course, if your path to jconsole is different, then update the "shebang" line to be the actual path for your system.
Next, make sure the script is executable:
$ chmod +x tst2.js
or whatever you called your script.
Next, invoke it:
$ ./tst2.js
4 5 6 7
0 1 2 3
23 23 23 23
8 9 10 11
Note that the above output is identical to the output generated when you are in the interactive jconsole.
The problem is with loose declarations. Every time you give the console a command, it replies with the answer. You should format your code in a verb and have it echo what you need.
foo =: 3 : 0
m =. i. 3 4
echo ''
echo 1 { m
echo ''
echo 23 23 23 23 (1}) m
''
)
foo''
It can also be nameless and self executing if you're in a hurry:
3 : 0 ''
m =. i. 3 4
echo ''
echo 1 { m
echo ''
echo 23 23 23 23 (1}) m
''
)

find all users who has over N process and echo them in shell

I'm writing script is ksh. Need to find all users who has over N process and echo them in shell.
N reads from ksh.
I know what I should use ps -elf but how parse it, find users with >N process and create array with them. Little troubles with array in ksh. Please help. Maybe simple solutions can help me instead of array creating.
s162103#helios:/home/s162103$ ps -elf
0 S s153308 4804 1 0 40 20 ? 17666 ? 11:03:08 ? 0:00 /usr/lib/gnome-settings-daemon --oa
0 S root 6546 1327 0 40 20 ? 3584 ? 11:14:06 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPor
0 S webservd 15646 485 0 40 20 ? 2823 ? п╪п╟я─я ? 0:23 /opt/csw/sbin/nginx
0 S s153246 6746 6741 0 40 20 ? 18103 ? 11:14:21 ? 0:00 iiim-panel --disable-crash-dialog
0 S s153246 23512 1 0 40 20 ? 17903 ? 09:34:08 ? 0:00 /usr/bin/metacity --sm-client-id=de
0 S root 933 861 0 40 20 ? 5234 ? 10:26:59 ? 0:00 dtgreet -display :14
...
when i type
ps -elf | awk '{a[$3]++;}END{for(i in a)if (a[i]>N)print i, a[i];}' N=1
s162103#helios:/home/s162103$ ps -elf | awk '{a[$3]++;}END{for(i in a)if (a[i]>N)print i, a[i];}' N=1
root 118
/usr/sadm/lib/smc/bin/smcboot 3
/usr/lib/autofs/automountd 2
/opt/SUNWut/lib/utsessiond 2
nasty 31
dima 22
/opt/oracle/product/Oracle_WT1/ohs/ 7
/usr/lib/ssh/sshd 5
/usr/bin/bash 11
that is not user /usr/sadm/lib/smc/bin/smcboot
there is last field in ps -elf ,not user
Something like this(assuming 3rd field of your ps command gives the user id):
ps -elf |
awk '{a[$3]++;}
END {
for(i in a)
if (a[i]>N)
print i, a[i];
}' N=3
The minimal ps command you want to use here is ps -eo user=. This will just print the username for each process and nothing more. The rest can be done with awk:
ps -eo user= |
awk -v max=3 '{ n[$1]++ }
END {
for (user in n)
if (n[user]>max)
print n[user], user
}'
I recommend to put the count in the first column for readability.
read number
ps -elfo user= | sort | uniq -c | while read count user
do
if (( $count > $number ))
then
echo $user
fi
done
That is best solution and it works!

How can I close a netcat connection after a certain character is returned in the response?

We have a very simple tcp messaging script that cats some text to a server port which returns and displays a response.
The part of the script we care about looks something like this:
cat someFile | netcat somehost 1234
The response the server returns is 'complete' once we get a certain character code (specifically &001C) returned.
How can I close the connection when I receive this special character?
(Note: The server won't close the connection for me. While I currently just CTRL+C the script when I can tell it's done, I wish to be able to send many of these messages, one after the other.)
(Note: netcat -w x isn't good enough because I wish to push these messages through as fast as possible)
Create a bash script called client.sh:
#!/bin/bash
cat someFile
while read FOO; do
echo $FOO >&3
if [[ $FOO =~ `printf ".*\x00\x1c.*"` ]]; then
break
fi
done
Then invoke netcat from your main script like so:
3>&1 nc -c ./client.sh somehost 1234
(You'll need bash version 3 for the regexp matching).
This assumes that the server is sending data in lines - if not you'll have to tweak client.sh so that it reads and echoes a character at a time.
How about this?
Client side:
awk -v RS=$'\x1c' 'NR==1;{exit 0;}' < /dev/tcp/host-ip/port
Testing:
# server side test script
while true; do ascii -hd; done | { netcat -l 12345; echo closed...;}
# Generate 'some' data for testing & pipe to netcat.
# After netcat connection closes, echo will print 'closed...'
# Client side:
awk -v RS=J 'NR==1; {exit;}' < /dev/tcp/localhost/12345
# Changed end character to 'J' for testing.
# Didn't wish to write a server side script to generate 0x1C.
Client side produces:
0 NUL 16 DLE 32 48 0 64 # 80 P 96 ` 112 p
1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q
2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r
3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s
4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t
5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u
6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v
7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w
8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x
9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y
10 LF 26 SUB 42 * 58 : 74
After 'J' appears, server side closes & prints 'closed...', ensuring that the connection has indeed closed.
Try:
(cat somefile; sleep $timeout) | nc somehost 1234 | sed -e '{s/\x01.*//;T skip;q;:skip}'
This requires GNU sed.
How it works:
{
s/\x01.*//; # search for \x01, if we find it, kill it and the rest of the line
T skip; # goto label skip if the last s/// failed
q; # quit, printing current pattern buffer
:skip # label skip
}
Note that this assumes there'll be a newline after \x01 - sed won't see it otherwise, as sed operates line-by-line.
Maybe have a look at Ncat as well:
"Ncat is the culmination of many key features from various Netcat incarnations such as Netcat 1.x, Netcat6, SOcat, Cryptcat, GNU Netcat, etc. Ncat has a host of new features such as "Connection Brokering", TCP/UDP Redirection, SOCKS4 client and server supprt, ability to "Chain" Ncat processes, HTTP CONNECT proxying (and proxy chaining), SSL connect/listen support, IP address/connection filtering, plus much more."
http://nmap-ncat.sourceforge.net
This worked best for me. Just read the output with a while loop and then check for "0x1c" using an if statement.
while read i; do
if [ "$i" = "0x1c" ] ; then # Read until "0x1c". Then exit
break
fi
echo $i;
done < <(cat someFile | netcat somehost 1234)

Resources