Converting a PCAP trace to NetFlow format - linux

I would like to convert some PCAP traces to Netflow format for further analysis with netflow tools. Is there any way to do that?
Specifically, I want to use "flow-export" tool in order to extract some fields of interest from a netflow trace as follows:
$ flow-export -f2 -mUNIX_SECS,SYSUPTIME,DPKTS,DOCTETS < mynetflow.trace
In this case, the mynetflow.trace file is taken by converting a PCAP file using the following commands:
$ nfcapd -p 12345 -l ./
$ softflowd -n localhost:12345 -r mytrace.pcap
This, generates a netflow trace but it cannot be used by flow-export correctly, since it is not in the right format. I tried also to pipe the output of the following command to flow-export as follows:
$ flow-import -V1 -z0 -f0 <mynetflow.trace | flow-export -f2 -mUNIX_SECS,SYSUPTIME,DPKTS,DOCTETS
but the output of the first command generated zero timestamps.
Any ideas?

I took at look at the flow-export documentation and there are some acknowledged bugs with the pcap implementation. Not sure if they are fixed yet.
Depending on the content of your capture, you have a couple of other options:
If you captured straight-up traffic from a link and you want to turn that into NetFlow format you can download a free netflow exporter tool that reads PCAP here:
FlowTraq Free Exporter
or here:
NProbe
If you captured NetFlow traffic in transit (say UDP/2055), then you can replay it with a tool like 'tcpreplay', available in any linux distribution.

If you are using a Linux environment, you can use the argus Linux package. Just install argus using apt or your distribution's package manager, and then you can use this with Argus' ra client to get the binetflow format.
Here is the command:
argus -F /mnt/argus.conf -r " +f+" -w - | ra -F /mnt/ra.conf -Z b -n >"+f.split(".")[0]+".binetflow

Related

How to execute svn command along with grep on windows?

Trying to execute svn command on windows machine and capture the output for the same.
Code:
import subprocess
cmd = "svn log -l1 https://repo/path/trunk | grep ^r | awk '{print \$3}'"
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
'grep' is not recognized as an internal or external command,
operable program or batch file.
I do understand that 'grep' is not windows utility.
Getting error as "grep' is not recognized as an internal or external command,
operable program or batch file."
Is it only limited to execute on Linux?
Can we execute the same on Windows?
Is my code right?
For windows your command will look something like the following
svn log -l1 https://repo/path/trunk | find "string_to_find"
You need to use the find utility in windows to get the same effect as grep.
svn --version | find "ra"
* ra_svn : Module for accessing a repository using the svn network protocol.
* ra_local : Module for accessing a repository on local disk.
* ra_serf : Module for accessing a repository via WebDAV protocol using serf.
Use svn log --search FOO instead of grep-ing the command's output.
grep and awk are certainly available for Windows as well, but there is really no need to install them -- the code is easy to replace with native Python.
import subprocess
p = subprocess.run(["svn", "log", "-l1", "https://repo/path/trunk"],
capture_output=True, text=True)
for line in p.stdout.splitlines():
# grep ^r
if line.startswith('r'):
# awk '{ print $3 }'
print(line.split()[2])
Because we don't need a pipeline, and just run a single static command, we can avoid shell=True.
Because we don't want to do the necessary plumbing (which you forgot anyway) for Popen(), we prefer subprocess.run(). With capture_output=True we conveniently get its output in the resulting object's stdout atrribute; because we expect text output, we pass text=True (in older Python versions you might need to switch to the old, slightly misleading synonym universal_newlines=True).
I guess the intent is to search for the committer in each revision's output, but this will incorrectly grab the third token on any line which starts with an r (so if you have a commit message like "refactored to use Python native code" the code will extract use from that). A better approach altogether is to request machine-readable output from svn and parse that (but it's unfortunately rather clunky XML, so there's another not entirely trivial rabbithole for you). Perhaps as middle ground implement a more specific pattern for finding those lines -- maybe look for a specific number of fields, and static strings where you know where to expect them.
if line.startswith('r'):
fields = line.split()
if len(fields) == 13 and fields[1] == '|' and fields[3] == '|':
print(fields[2])
You could also craft a regular expression to look for a date stamp in the third |-separated field, and the number of changed lines in the fourth.
For the record, a complete commit message from Subversion looks like
------------------------------------------------------------------------
r16110 | tripleee | 2020-10-09 10:41:13 +0300 (Fri, 09 Oct 2020) | 4 lines
refactored to use native Python instead of grep + awk
(which is a useless use of grep anyway; see http://www.iki.fi/era/unix/award.html#grep)

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

programmatically access IME

Is there a way to access Japanese or chinese IME either from the command line or python? I have Linux/osx/win8 boxes, so which ever system exposes the easiest accessible api is fine.
I'm experimenting with building a Japanese kana-kanji conversion algorithm and would like to establish a baseline using existing tools. I also have some collections of kana I would like to process.
Preferably I would like something along the lines of
$ ime JP "きしゃのきしゃがきしゃできしゃした"
貴社の記者が汽車で帰社した
I've looked at anthy, mozc and dbus on Linux but can't find anyway to interact with them via the terminal or scripting (such as python)
Anthy provides a cli tool
Personally, I prefer google's IME / mozc for better results, but perhaps this helps.
The source for anthy (sourceforge, file anthy-9100h.tar.gz) includes a simple cli program for testing. Download the source file, extract it, run
./configure && make
Enter the directory test which contains the binary anthy. By default, it reads from test.txt and uses EUC_JP encoding.
Simple test:
Input file test.txt
*にほんごにゅうりょく
*もももすももももものうち。
Run (using iconv to convert to UTF-8:
./anthy --all | iconv -f EUC-JP -t UTF-8
Output:
1:(にほんごにゅうりょく)
|にほんご|にゅうりょく
にほんご(日本語:(1,1000,N,72089)2500,001 ,にほんご:(N,0,-)2 ,ニホンゴ:(N,0,-)1 ,):
にゅうりょく(入力:(1,1000,N,62394)2500,001 ,にゅうりょく:(N,0,-)2 ,ニュウリョク:(N,0,-)1 ,):
2:(もももすももももものうち。)
|ももも|すももも|もものうち|。
ももも(桃も:(,1000,Ny,72089)225,279 ,ももも:(N,1000,Ny,72089)220,773 ,モモも:(,1000,Ny,72089)205,004 ,腿も:(,1000,Ny,72089)204,722 ,股も:(,1000,Ny,72089)146,431 ,モモモ:(N,0,-)1 ,):
すももも(すももも:(N,1000,Ny,72089)202,751 ,スモモも:(,1000,Ny,72089)168,959 ,李も:(,1000,Ny,72089)168,677 ,スモモモ:(N,0,-)1 ,):
もものうち(桃のうち:(,1000,N,655)2,047 ,もものうち:(N,1000,N,655)2,006 ,モモのうち:(,1000,N,655)1,863 ,腿のうち:(,1000,N,655)1,861 ,股のうち:(,1000,N,655)1,331 ,モモノウチ:(N,0,-)1 ,):
。(。:(1N,100,N,70203)57,040 ,.:(1,100,N,70203)52,653 ,.:(1,100,N,70203)3,840 ,):
You can uncomment some printf statements in the source files test/main.c and src-main/context.c to make the output more readable/parsable, eg:
1 にほんごにゅうりょく
にほんご 日本語
にゅうりょく 入力
2 もももすももももものうち。
ももも 桃も
すももも すももも
もものうち 桃のうち
。 。

Extracting checksum from singe line

I'm trying to get the MD5 of the specified Java package by going through: http://www.oracle.com/technetwork/java/javase/downloads/java-se-binaries-checksum-1956892.html
However, that entire table is just a one-liner in the HTML code, so that makes it a little trickier.
Since you have tagded your question with sed, grep, etc. I am assuming you'll do it from Linux. So you can use Perl's one liner for this.
perl -MLWP::Simple -e "$\ = $/; print for get('http://www.oracle.com/technetwork/java/javase/downloads/java-se-binaries-checksum-1956892.html') =~ m|<td>([a-f0-9]{32})</td>|g;"
This is first downloading the html into $_ variable. Then its parsing the hash from the <td> tags using regex. Pretty simple, yet powerful!
This is from your request, using curl + gnu grep
curl -s http://www.oracle.com/technetwork/java/javase/downloads/java-se-binaries-checksum-1956892.html|grep -Po '(?<=<td>)[a-f0-9]{32}'
Explanation
curl command will get the hmtl to stout and pipe to grep command
grep -Po '(?<=<td>)[a-f0-9]{32}' is a positive look-behind assertion, get only md5 sum. It should be supported in JAVA as well.
For your new request, I recommend to use lynx (text-based web browser). So if you have it ready, run this command:
lynx -dump http://www.oracle.com/technetwork/java/javase/downloads/java-se-binaries-checksum-1956892.html |grep jdk-7u51-solaris-sparc.tar.Z
jdk-7u51-solaris-sparc.tar.Z eb2ebfe3217d306f0ee549edc1875a93
explanation
1) lynx is text-base web browser, here are its homepage and related introduces.
http://lynx.isc.org/lynx2.8.7/index.html
http://en.wikipedia.org/wiki/Lynx_(web_browser)
http://en.wikipedia.org/wiki/Text-based_web_browser
2) lynx with -dump option will take a snapshot on that webpage with reserved format. I used it as html2txt tool. Here is the sample webpage for your reference.
Java SE Binaries Checksum
Checksum for Java SE 7u51 binaries
Filename MD5 Checksum
jdk-7u51-linux-arm-vfp-hflt.tar.gz 80e14facc0aa784f44d8f142025dd020
jdk-7u51-linux-arm-vfp-sflt.tar.gz a2965bc7591a257da8c09772f15f6195
jdk-7u51-linux-i586.rpm 457fb449a4486860ec5bde6c28ce8ec4
jdk-7u51-linux-i586.tar.gz 909d353c1caf6b3b54cc20767a7778ef
jdk-7u51-linux-x64.rpm c523e7339d925c1e6c5994813f7c9e86
jdk-7u51-linux-x64.tar.gz 764f96c4b078b80adaa5983e75470ff2
jdk-7u51-macosx-x64.dmg 73e9cc08d590021706e117c81bc9a4a9
jdk-7u51-solaris-i586.tar.Z 9127418718bec67a4146c5dc1da15155
jdk-7u51-solaris-i586.tar.gz cd914ce06ff537a3acb249d23baf6244
jdk-7u51-solaris-x64.tar.Z 5ee1d6b0d607f80ac0e376485d70e9e4
jdk-7u51-solaris-x64.tar.gz 6e00698dc72b707580f11c4e0288ab2b
jdk-7u51-solaris-sparc.tar.Z eb2ebfe3217d306f0ee549edc1875a93
jdk-7u51-solaris-sparc.tar.gz 60bdb8a9b19db80848d8b6c27466276b
jdk-7u51-solaris-sparcv9.tar.Z 9da60e11238b288a5339688acd64abe0
jdk-7u51-solaris-sparcv9.tar.gz 1cb3c5e8cdcad6c9bfaffc3874187786
jdk-7u51-windows-i586.exe 121b2a740e18bc00b0e13f4537e5f1bc
jdk-7u51-windows-x64.exe d1367410be659f1b47e554e7bd011ea0
jre-7u51-linux-i586.rpm 28d0ee36020023904e64afeebc9555cc
jre-7u51-linux-i586.tar.gz f133f125ca93acef3f70d1912cc2f4b0
jre-7u51-linux-x64.rpm d914baffa3cb378a6054969d7d9bbbd0
jre-7u51-linux-x64.tar.gz 1f6a93cc5ef5f66bb01bc39fd731cd9f
jre-7u51-macosx-x64.dmg b66f5af9e3607dc5727f752a9d28b7fd
jre-7u51-macosx-x64.tar.gz cbd57817ea302be8b2c44968e130bb9b
jre-7u51-solaris-i586.tar.gz 61c5daacea83dc1b267e84bf21e22645
jre-7u51-solaris-x64.tar.gz f03c4d69124f0595db32e20f2aa517f2
jre-7u51-solaris-sparc.tar.gz f9b459dabd97428e95275e259422d6a7
jre-7u51-solaris-sparcv9.tar.gz 32cb98b794bc01ca79f1b6e51fe09c9c
jre-7u51-windows-i586-iftw.exe 5e8cb14f5264af82f66008306e56eaa8
jre-7u51-windows-i586.exe 1af9e2aa8264b023404a76d3fb6751fe
jre-7u51-windows-i586.tar.gz 3921c19528d180902939b9f4c9ac92f1
jre-7u51-windows-x64.exe b0f3a9c0f4c2c66127223ba3644b54f6
jre-7u51-windows-x64.tar.gz 1931de2341f22408be9d6639205675c9
server-jre-7u51-linux-x64.tar.gz c5a034f4222bac326101799bcb20509c
server-jre-7u51-solaris-i586.tar.gz 955d2884960124e93699008236d736fe
server-jre-7u51-solaris-x64.tar.gz b858f9326986cfc7f7cceb4b166c0bfa
server-jre-7u51-solaris-sparc.tar.gz 04c708b162e6210b546b0eef188d4adb
server-jre-7u51-solaris-sparcv9.tar.gz 7ae0e51f5836289d71ad614326c5e9c8
server-jre-7u51-windows-x64.tar.gz 4d9855b5b54cbae9d04318eae9b8e11e
Use the md5sum command line utility on Linux to verify the integrity of
the downloaded file.
Use the md5 command line utility on Mac OS X to verify the integrity of
the downloaded file
See the following articles for guidance on how to verify these
checksums on other platforms:
* Microsoft Windows: [29]Availability and description of the File
Checksum Integrity Verifier utility
Left Curve
Java SDKs and Tools
Right Curve
[30]Java SE
[31]Java EE and Glassfish
[32]Java ME
[33]JavaFX
[34]Java Card
[35]NetBeans IDE
[36]Java Mission Control
Left Curve
Java Resources
Finally went with sed:
$ curl -s http://www.oracle.com/technetwork/java/javase/downloads/javase8-binaries-checksum-2133161.html | sed -nr 's|.*>jdk-8-linux-x64.tar.gz</td><td>(<*[^<]*).*|\1|p'
7e9e5e5229c6603a4d8476050bbd98b1

Read and parse perf.data

I am recording a performance counters frm linux using the command perf record.
I want to use the result perf.data as an input to other programming apps. Do you know how shall I read and parse the data in perf.data? Is there a way to transform it to .text file or .csv?
There is builtin perf.data parser and printer in perf tool of linux tools with subcommand "script".
To convert perf.data file
perf script > perf.data.txt
To convert output of perf record in other file (perf record -o filename.data) use -i option:
perf script -i filename.data > filename.data.txt
perf script is documented at man perf-script, available online at http://man7.org/linux/man-pages/man1/perf-script.1.html
perf-script - Read perf.data (created by perf record) and display
trace output
This command reads the input file and displays the trace recorded.
'perf script' to see a detailed trace of the workload that was
recorded.
perf data convert --to-json landed in April.
https://man7.org/linux/man-pages/man1/perf-data.1.html
The quipper sub-project of https://github.com/google/perf_data_converter can parse perf.data files.
An example command definition that redirects service check performance data to a text file for later processing by another application is shown below:
define command{
command_name store-service-perfdata
command_line /bin/echo -e "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$" >> /usr/local/nagios/var/service-perfdata.dat
}

Resources