Extracting checksum from singe line - linux

I'm trying to get the MD5 of the specified Java package by going through: http://www.oracle.com/technetwork/java/javase/downloads/java-se-binaries-checksum-1956892.html
However, that entire table is just a one-liner in the HTML code, so that makes it a little trickier.

Since you have tagded your question with sed, grep, etc. I am assuming you'll do it from Linux. So you can use Perl's one liner for this.
perl -MLWP::Simple -e "$\ = $/; print for get('http://www.oracle.com/technetwork/java/javase/downloads/java-se-binaries-checksum-1956892.html') =~ m|<td>([a-f0-9]{32})</td>|g;"
This is first downloading the html into $_ variable. Then its parsing the hash from the <td> tags using regex. Pretty simple, yet powerful!

This is from your request, using curl + gnu grep
curl -s http://www.oracle.com/technetwork/java/javase/downloads/java-se-binaries-checksum-1956892.html|grep -Po '(?<=<td>)[a-f0-9]{32}'
Explanation
curl command will get the hmtl to stout and pipe to grep command
grep -Po '(?<=<td>)[a-f0-9]{32}' is a positive look-behind assertion, get only md5 sum. It should be supported in JAVA as well.
For your new request, I recommend to use lynx (text-based web browser). So if you have it ready, run this command:
lynx -dump http://www.oracle.com/technetwork/java/javase/downloads/java-se-binaries-checksum-1956892.html |grep jdk-7u51-solaris-sparc.tar.Z
jdk-7u51-solaris-sparc.tar.Z eb2ebfe3217d306f0ee549edc1875a93
explanation
1) lynx is text-base web browser, here are its homepage and related introduces.
http://lynx.isc.org/lynx2.8.7/index.html
http://en.wikipedia.org/wiki/Lynx_(web_browser)
http://en.wikipedia.org/wiki/Text-based_web_browser
2) lynx with -dump option will take a snapshot on that webpage with reserved format. I used it as html2txt tool. Here is the sample webpage for your reference.
Java SE Binaries Checksum
Checksum for Java SE 7u51 binaries
Filename MD5 Checksum
jdk-7u51-linux-arm-vfp-hflt.tar.gz 80e14facc0aa784f44d8f142025dd020
jdk-7u51-linux-arm-vfp-sflt.tar.gz a2965bc7591a257da8c09772f15f6195
jdk-7u51-linux-i586.rpm 457fb449a4486860ec5bde6c28ce8ec4
jdk-7u51-linux-i586.tar.gz 909d353c1caf6b3b54cc20767a7778ef
jdk-7u51-linux-x64.rpm c523e7339d925c1e6c5994813f7c9e86
jdk-7u51-linux-x64.tar.gz 764f96c4b078b80adaa5983e75470ff2
jdk-7u51-macosx-x64.dmg 73e9cc08d590021706e117c81bc9a4a9
jdk-7u51-solaris-i586.tar.Z 9127418718bec67a4146c5dc1da15155
jdk-7u51-solaris-i586.tar.gz cd914ce06ff537a3acb249d23baf6244
jdk-7u51-solaris-x64.tar.Z 5ee1d6b0d607f80ac0e376485d70e9e4
jdk-7u51-solaris-x64.tar.gz 6e00698dc72b707580f11c4e0288ab2b
jdk-7u51-solaris-sparc.tar.Z eb2ebfe3217d306f0ee549edc1875a93
jdk-7u51-solaris-sparc.tar.gz 60bdb8a9b19db80848d8b6c27466276b
jdk-7u51-solaris-sparcv9.tar.Z 9da60e11238b288a5339688acd64abe0
jdk-7u51-solaris-sparcv9.tar.gz 1cb3c5e8cdcad6c9bfaffc3874187786
jdk-7u51-windows-i586.exe 121b2a740e18bc00b0e13f4537e5f1bc
jdk-7u51-windows-x64.exe d1367410be659f1b47e554e7bd011ea0
jre-7u51-linux-i586.rpm 28d0ee36020023904e64afeebc9555cc
jre-7u51-linux-i586.tar.gz f133f125ca93acef3f70d1912cc2f4b0
jre-7u51-linux-x64.rpm d914baffa3cb378a6054969d7d9bbbd0
jre-7u51-linux-x64.tar.gz 1f6a93cc5ef5f66bb01bc39fd731cd9f
jre-7u51-macosx-x64.dmg b66f5af9e3607dc5727f752a9d28b7fd
jre-7u51-macosx-x64.tar.gz cbd57817ea302be8b2c44968e130bb9b
jre-7u51-solaris-i586.tar.gz 61c5daacea83dc1b267e84bf21e22645
jre-7u51-solaris-x64.tar.gz f03c4d69124f0595db32e20f2aa517f2
jre-7u51-solaris-sparc.tar.gz f9b459dabd97428e95275e259422d6a7
jre-7u51-solaris-sparcv9.tar.gz 32cb98b794bc01ca79f1b6e51fe09c9c
jre-7u51-windows-i586-iftw.exe 5e8cb14f5264af82f66008306e56eaa8
jre-7u51-windows-i586.exe 1af9e2aa8264b023404a76d3fb6751fe
jre-7u51-windows-i586.tar.gz 3921c19528d180902939b9f4c9ac92f1
jre-7u51-windows-x64.exe b0f3a9c0f4c2c66127223ba3644b54f6
jre-7u51-windows-x64.tar.gz 1931de2341f22408be9d6639205675c9
server-jre-7u51-linux-x64.tar.gz c5a034f4222bac326101799bcb20509c
server-jre-7u51-solaris-i586.tar.gz 955d2884960124e93699008236d736fe
server-jre-7u51-solaris-x64.tar.gz b858f9326986cfc7f7cceb4b166c0bfa
server-jre-7u51-solaris-sparc.tar.gz 04c708b162e6210b546b0eef188d4adb
server-jre-7u51-solaris-sparcv9.tar.gz 7ae0e51f5836289d71ad614326c5e9c8
server-jre-7u51-windows-x64.tar.gz 4d9855b5b54cbae9d04318eae9b8e11e
Use the md5sum command line utility on Linux to verify the integrity of
the downloaded file.
Use the md5 command line utility on Mac OS X to verify the integrity of
the downloaded file
See the following articles for guidance on how to verify these
checksums on other platforms:
* Microsoft Windows: [29]Availability and description of the File
Checksum Integrity Verifier utility
Left Curve
Java SDKs and Tools
Right Curve
[30]Java SE
[31]Java EE and Glassfish
[32]Java ME
[33]JavaFX
[34]Java Card
[35]NetBeans IDE
[36]Java Mission Control
Left Curve
Java Resources

Finally went with sed:
$ curl -s http://www.oracle.com/technetwork/java/javase/downloads/javase8-binaries-checksum-2133161.html | sed -nr 's|.*>jdk-8-linux-x64.tar.gz</td><td>(<*[^<]*).*|\1|p'
7e9e5e5229c6603a4d8476050bbd98b1

Related

How to execute svn command along with grep on windows?

Trying to execute svn command on windows machine and capture the output for the same.
Code:
import subprocess
cmd = "svn log -l1 https://repo/path/trunk | grep ^r | awk '{print \$3}'"
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
'grep' is not recognized as an internal or external command,
operable program or batch file.
I do understand that 'grep' is not windows utility.
Getting error as "grep' is not recognized as an internal or external command,
operable program or batch file."
Is it only limited to execute on Linux?
Can we execute the same on Windows?
Is my code right?
For windows your command will look something like the following
svn log -l1 https://repo/path/trunk | find "string_to_find"
You need to use the find utility in windows to get the same effect as grep.
svn --version | find "ra"
* ra_svn : Module for accessing a repository using the svn network protocol.
* ra_local : Module for accessing a repository on local disk.
* ra_serf : Module for accessing a repository via WebDAV protocol using serf.
Use svn log --search FOO instead of grep-ing the command's output.
grep and awk are certainly available for Windows as well, but there is really no need to install them -- the code is easy to replace with native Python.
import subprocess
p = subprocess.run(["svn", "log", "-l1", "https://repo/path/trunk"],
capture_output=True, text=True)
for line in p.stdout.splitlines():
# grep ^r
if line.startswith('r'):
# awk '{ print $3 }'
print(line.split()[2])
Because we don't need a pipeline, and just run a single static command, we can avoid shell=True.
Because we don't want to do the necessary plumbing (which you forgot anyway) for Popen(), we prefer subprocess.run(). With capture_output=True we conveniently get its output in the resulting object's stdout atrribute; because we expect text output, we pass text=True (in older Python versions you might need to switch to the old, slightly misleading synonym universal_newlines=True).
I guess the intent is to search for the committer in each revision's output, but this will incorrectly grab the third token on any line which starts with an r (so if you have a commit message like "refactored to use Python native code" the code will extract use from that). A better approach altogether is to request machine-readable output from svn and parse that (but it's unfortunately rather clunky XML, so there's another not entirely trivial rabbithole for you). Perhaps as middle ground implement a more specific pattern for finding those lines -- maybe look for a specific number of fields, and static strings where you know where to expect them.
if line.startswith('r'):
fields = line.split()
if len(fields) == 13 and fields[1] == '|' and fields[3] == '|':
print(fields[2])
You could also craft a regular expression to look for a date stamp in the third |-separated field, and the number of changed lines in the fourth.
For the record, a complete commit message from Subversion looks like
------------------------------------------------------------------------
r16110 | tripleee | 2020-10-09 10:41:13 +0300 (Fri, 09 Oct 2020) | 4 lines
refactored to use native Python instead of grep + awk
(which is a useless use of grep anyway; see http://www.iki.fi/era/unix/award.html#grep)

sed behavior: Pattern space and Address range [duplicate]

I noticed something a bit odd while fooling around with sed. If you try to remove multiple line intervals (by number) from a file, but any interval specified later in the list is fully contained within an interval earlier in the list, then an additional single line is removed after the specified (larger) interval.
seq 10 > foo.txt
sed '2,7d;3,6d' foo.txt
1
9
10
This behaviour was behind an annoying bug for me, since in my script I generated the interval endpoints on the fly, and in some cases the intervals produced were redundant. I can clean this up, but I can't think of a good reason why sed would behave this way on purpose.
Since this question was highlighted as needing an answer in the Stack Overflow Weekly Newsletter email for 2015-02-24, I'm converting the comments above (which provide the answer) into a formal answer. Unattributed comments here were made by me in essentially equivalent form.
Thank you for a concise, complete question. The result is interesting. I can reproduce it with your script. Intriguingly, sed '3,6d;2,7d' foo.txt (with the delete operations in the reverse order) produces the expected answer with 8 included in the output. That makes it look like it might be a reportable bug in (GNU) sed, especially as BSD sed (on Mac OS X 10.10.2 Yosemite) works correctly with the operations in either order. I tested using 'sed (GNU sed) 4.2.2' from an Ubuntu 14.04 derivative.
More data points for you/them. Both of these include 8 in the output:
sed -e '/2/,/7/d' -e '/3/,/6/d' foo.txt
sed -e '2,7d' -e '/3/,/6/d' foo.txt
By contrast, this does not:
sed -e '/2/,/7/d' -e '3,6d' foo.txt
The latter surprised me (even accepting the basic bug).
Beats me. I thought given some of sed's arcane constructs that you might be missing the batman symbol or something from the middle of your command but sed -e '2,7d' -e '3,6d' foo.txt behaves the same way and swapping the order produces the expected results (GNU sed 4.2.2 on Cygwin). /bin/sed on Solaris always produces the expected result and interestingly so does GNU sed 3.02. Ed Morton
More data: it only seems to happen with sed 4.2.2 if the 2nd range is a subset of the first: sed '2,5d;2,5d' shows the bug, sed '2,5d;1,5d' and sed '2,5d;2,6d' do not. glenn jackman
The GNU sed home page says "Please send bug reports to bug-sed at gnu.org" (except it has an # in place of ' at '). You've got a good reproduction; be explicit about the output you expect vs the output you get (they'll get the point, but it's best to make sure they can't misunderstand). Point out that the reverse ordering of the commands works as expected, and give the various other commands as examples of working or not working. (You could even give this Q&A URL as a cross-reference, but make sure that the bug report is self-contained so that it can be understood even if no-one follows the URL.)
You can also point to BSD sed (and the Solaris version, and the older GNU 3.02 sed) as behaving as expected. With the old version GNU sed working, it means this is arguably a regression. […After a little experimentation…] The breakage occurred in the 4.1 release; the 4.0.9 release is OK. (I also checked 4.1.5 and 4.2.1; both are broken.) That will help the maintainers if they want to find the trouble by looking at what changed.
The OP noted:
Thanks everyone for comments and additional tests. I'll submit a bug report to GNU sed and post their response. santayana

programmatically access IME

Is there a way to access Japanese or chinese IME either from the command line or python? I have Linux/osx/win8 boxes, so which ever system exposes the easiest accessible api is fine.
I'm experimenting with building a Japanese kana-kanji conversion algorithm and would like to establish a baseline using existing tools. I also have some collections of kana I would like to process.
Preferably I would like something along the lines of
$ ime JP "きしゃのきしゃがきしゃできしゃした"
貴社の記者が汽車で帰社した
I've looked at anthy, mozc and dbus on Linux but can't find anyway to interact with them via the terminal or scripting (such as python)
Anthy provides a cli tool
Personally, I prefer google's IME / mozc for better results, but perhaps this helps.
The source for anthy (sourceforge, file anthy-9100h.tar.gz) includes a simple cli program for testing. Download the source file, extract it, run
./configure && make
Enter the directory test which contains the binary anthy. By default, it reads from test.txt and uses EUC_JP encoding.
Simple test:
Input file test.txt
*にほんごにゅうりょく
*もももすももももものうち。
Run (using iconv to convert to UTF-8:
./anthy --all | iconv -f EUC-JP -t UTF-8
Output:
1:(にほんごにゅうりょく)
|にほんご|にゅうりょく
にほんご(日本語:(1,1000,N,72089)2500,001 ,にほんご:(N,0,-)2 ,ニホンゴ:(N,0,-)1 ,):
にゅうりょく(入力:(1,1000,N,62394)2500,001 ,にゅうりょく:(N,0,-)2 ,ニュウリョク:(N,0,-)1 ,):
2:(もももすももももものうち。)
|ももも|すももも|もものうち|。
ももも(桃も:(,1000,Ny,72089)225,279 ,ももも:(N,1000,Ny,72089)220,773 ,モモも:(,1000,Ny,72089)205,004 ,腿も:(,1000,Ny,72089)204,722 ,股も:(,1000,Ny,72089)146,431 ,モモモ:(N,0,-)1 ,):
すももも(すももも:(N,1000,Ny,72089)202,751 ,スモモも:(,1000,Ny,72089)168,959 ,李も:(,1000,Ny,72089)168,677 ,スモモモ:(N,0,-)1 ,):
もものうち(桃のうち:(,1000,N,655)2,047 ,もものうち:(N,1000,N,655)2,006 ,モモのうち:(,1000,N,655)1,863 ,腿のうち:(,1000,N,655)1,861 ,股のうち:(,1000,N,655)1,331 ,モモノウチ:(N,0,-)1 ,):
。(。:(1N,100,N,70203)57,040 ,.:(1,100,N,70203)52,653 ,.:(1,100,N,70203)3,840 ,):
You can uncomment some printf statements in the source files test/main.c and src-main/context.c to make the output more readable/parsable, eg:
1 にほんごにゅうりょく
にほんご 日本語
にゅうりょく 入力
2 もももすももももものうち。
ももも 桃も
すももも すももも
もものうち 桃のうち
。 。

Trying to 'grep' links from downloaded html pages in bash shell environment without cut, sed, tr commands (only e/grep)

In Linux shell, I am trying to return links to JPG files from the downloaded HTML script file. So far I only got to this point:
grep 'http://[:print:]*.jpg' 'www_page.html'
I don't want to use auxiliary commands like 'tr', 'cut', 'sed' etc...'lynx' is okay!
Using grep alone without massaging the file is doable but not recommended as many have pointed out in the comments.
If you can loosen up your requirements a bit then you can use html tidy to massage the downloaded HTML file so that each html entities are on a single line so that the regular expression can be simpler like you wanted, something like this:
$ tidy file.html|grep -o 'http://[[:print:]]*.jpg'
Note the use of "-o" option to grep to print only the matching part of the input

Converting a PCAP trace to NetFlow format

I would like to convert some PCAP traces to Netflow format for further analysis with netflow tools. Is there any way to do that?
Specifically, I want to use "flow-export" tool in order to extract some fields of interest from a netflow trace as follows:
$ flow-export -f2 -mUNIX_SECS,SYSUPTIME,DPKTS,DOCTETS < mynetflow.trace
In this case, the mynetflow.trace file is taken by converting a PCAP file using the following commands:
$ nfcapd -p 12345 -l ./
$ softflowd -n localhost:12345 -r mytrace.pcap
This, generates a netflow trace but it cannot be used by flow-export correctly, since it is not in the right format. I tried also to pipe the output of the following command to flow-export as follows:
$ flow-import -V1 -z0 -f0 <mynetflow.trace | flow-export -f2 -mUNIX_SECS,SYSUPTIME,DPKTS,DOCTETS
but the output of the first command generated zero timestamps.
Any ideas?
I took at look at the flow-export documentation and there are some acknowledged bugs with the pcap implementation. Not sure if they are fixed yet.
Depending on the content of your capture, you have a couple of other options:
If you captured straight-up traffic from a link and you want to turn that into NetFlow format you can download a free netflow exporter tool that reads PCAP here:
FlowTraq Free Exporter
or here:
NProbe
If you captured NetFlow traffic in transit (say UDP/2055), then you can replay it with a tool like 'tcpreplay', available in any linux distribution.
If you are using a Linux environment, you can use the argus Linux package. Just install argus using apt or your distribution's package manager, and then you can use this with Argus' ra client to get the binetflow format.
Here is the command:
argus -F /mnt/argus.conf -r " +f+" -w - | ra -F /mnt/ra.conf -Z b -n >"+f.split(".")[0]+".binetflow

Resources