Trying to execute svn command on windows machine and capture the output for the same.
Code:
import subprocess
cmd = "svn log -l1 https://repo/path/trunk | grep ^r | awk '{print \$3}'"
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
'grep' is not recognized as an internal or external command,
operable program or batch file.
I do understand that 'grep' is not windows utility.
Getting error as "grep' is not recognized as an internal or external command,
operable program or batch file."
Is it only limited to execute on Linux?
Can we execute the same on Windows?
Is my code right?
For windows your command will look something like the following
svn log -l1 https://repo/path/trunk | find "string_to_find"
You need to use the find utility in windows to get the same effect as grep.
svn --version | find "ra"
* ra_svn : Module for accessing a repository using the svn network protocol.
* ra_local : Module for accessing a repository on local disk.
* ra_serf : Module for accessing a repository via WebDAV protocol using serf.
Use svn log --search FOO instead of grep-ing the command's output.
grep and awk are certainly available for Windows as well, but there is really no need to install them -- the code is easy to replace with native Python.
import subprocess
p = subprocess.run(["svn", "log", "-l1", "https://repo/path/trunk"],
capture_output=True, text=True)
for line in p.stdout.splitlines():
# grep ^r
if line.startswith('r'):
# awk '{ print $3 }'
print(line.split()[2])
Because we don't need a pipeline, and just run a single static command, we can avoid shell=True.
Because we don't want to do the necessary plumbing (which you forgot anyway) for Popen(), we prefer subprocess.run(). With capture_output=True we conveniently get its output in the resulting object's stdout atrribute; because we expect text output, we pass text=True (in older Python versions you might need to switch to the old, slightly misleading synonym universal_newlines=True).
I guess the intent is to search for the committer in each revision's output, but this will incorrectly grab the third token on any line which starts with an r (so if you have a commit message like "refactored to use Python native code" the code will extract use from that). A better approach altogether is to request machine-readable output from svn and parse that (but it's unfortunately rather clunky XML, so there's another not entirely trivial rabbithole for you). Perhaps as middle ground implement a more specific pattern for finding those lines -- maybe look for a specific number of fields, and static strings where you know where to expect them.
if line.startswith('r'):
fields = line.split()
if len(fields) == 13 and fields[1] == '|' and fields[3] == '|':
print(fields[2])
You could also craft a regular expression to look for a date stamp in the third |-separated field, and the number of changed lines in the fourth.
For the record, a complete commit message from Subversion looks like
------------------------------------------------------------------------
r16110 | tripleee | 2020-10-09 10:41:13 +0300 (Fri, 09 Oct 2020) | 4 lines
refactored to use native Python instead of grep + awk
(which is a useless use of grep anyway; see http://www.iki.fi/era/unix/award.html#grep)
I noticed something a bit odd while fooling around with sed. If you try to remove multiple line intervals (by number) from a file, but any interval specified later in the list is fully contained within an interval earlier in the list, then an additional single line is removed after the specified (larger) interval.
seq 10 > foo.txt
sed '2,7d;3,6d' foo.txt
1
9
10
This behaviour was behind an annoying bug for me, since in my script I generated the interval endpoints on the fly, and in some cases the intervals produced were redundant. I can clean this up, but I can't think of a good reason why sed would behave this way on purpose.
Since this question was highlighted as needing an answer in the Stack Overflow Weekly Newsletter email for 2015-02-24, I'm converting the comments above (which provide the answer) into a formal answer. Unattributed comments here were made by me in essentially equivalent form.
Thank you for a concise, complete question. The result is interesting. I can reproduce it with your script. Intriguingly, sed '3,6d;2,7d' foo.txt (with the delete operations in the reverse order) produces the expected answer with 8 included in the output. That makes it look like it might be a reportable bug in (GNU) sed, especially as BSD sed (on Mac OS X 10.10.2 Yosemite) works correctly with the operations in either order. I tested using 'sed (GNU sed) 4.2.2' from an Ubuntu 14.04 derivative.
More data points for you/them. Both of these include 8 in the output:
sed -e '/2/,/7/d' -e '/3/,/6/d' foo.txt
sed -e '2,7d' -e '/3/,/6/d' foo.txt
By contrast, this does not:
sed -e '/2/,/7/d' -e '3,6d' foo.txt
The latter surprised me (even accepting the basic bug).
Beats me. I thought given some of sed's arcane constructs that you might be missing the batman symbol or something from the middle of your command but sed -e '2,7d' -e '3,6d' foo.txt behaves the same way and swapping the order produces the expected results (GNU sed 4.2.2 on Cygwin). /bin/sed on Solaris always produces the expected result and interestingly so does GNU sed 3.02. Ed Morton
More data: it only seems to happen with sed 4.2.2 if the 2nd range is a subset of the first: sed '2,5d;2,5d' shows the bug, sed '2,5d;1,5d' and sed '2,5d;2,6d' do not. glenn jackman
The GNU sed home page says "Please send bug reports to bug-sed at gnu.org" (except it has an # in place of ' at '). You've got a good reproduction; be explicit about the output you expect vs the output you get (they'll get the point, but it's best to make sure they can't misunderstand). Point out that the reverse ordering of the commands works as expected, and give the various other commands as examples of working or not working. (You could even give this Q&A URL as a cross-reference, but make sure that the bug report is self-contained so that it can be understood even if no-one follows the URL.)
You can also point to BSD sed (and the Solaris version, and the older GNU 3.02 sed) as behaving as expected. With the old version GNU sed working, it means this is arguably a regression. […After a little experimentation…] The breakage occurred in the 4.1 release; the 4.0.9 release is OK. (I also checked 4.1.5 and 4.2.1; both are broken.) That will help the maintainers if they want to find the trouble by looking at what changed.
The OP noted:
Thanks everyone for comments and additional tests. I'll submit a bug report to GNU sed and post their response. santayana
Is there a way to access Japanese or chinese IME either from the command line or python? I have Linux/osx/win8 boxes, so which ever system exposes the easiest accessible api is fine.
I'm experimenting with building a Japanese kana-kanji conversion algorithm and would like to establish a baseline using existing tools. I also have some collections of kana I would like to process.
Preferably I would like something along the lines of
$ ime JP "きしゃのきしゃがきしゃできしゃした"
貴社の記者が汽車で帰社した
I've looked at anthy, mozc and dbus on Linux but can't find anyway to interact with them via the terminal or scripting (such as python)
Anthy provides a cli tool
Personally, I prefer google's IME / mozc for better results, but perhaps this helps.
The source for anthy (sourceforge, file anthy-9100h.tar.gz) includes a simple cli program for testing. Download the source file, extract it, run
./configure && make
Enter the directory test which contains the binary anthy. By default, it reads from test.txt and uses EUC_JP encoding.
Simple test:
Input file test.txt
*にほんごにゅうりょく
*もももすももももものうち。
Run (using iconv to convert to UTF-8:
./anthy --all | iconv -f EUC-JP -t UTF-8
Output:
1:(にほんごにゅうりょく)
|にほんご|にゅうりょく
にほんご(日本語:(1,1000,N,72089)2500,001 ,にほんご:(N,0,-)2 ,ニホンゴ:(N,0,-)1 ,):
にゅうりょく(入力:(1,1000,N,62394)2500,001 ,にゅうりょく:(N,0,-)2 ,ニュウリョク:(N,0,-)1 ,):
2:(もももすももももものうち。)
|ももも|すももも|もものうち|。
ももも(桃も:(,1000,Ny,72089)225,279 ,ももも:(N,1000,Ny,72089)220,773 ,モモも:(,1000,Ny,72089)205,004 ,腿も:(,1000,Ny,72089)204,722 ,股も:(,1000,Ny,72089)146,431 ,モモモ:(N,0,-)1 ,):
すももも(すももも:(N,1000,Ny,72089)202,751 ,スモモも:(,1000,Ny,72089)168,959 ,李も:(,1000,Ny,72089)168,677 ,スモモモ:(N,0,-)1 ,):
もものうち(桃のうち:(,1000,N,655)2,047 ,もものうち:(N,1000,N,655)2,006 ,モモのうち:(,1000,N,655)1,863 ,腿のうち:(,1000,N,655)1,861 ,股のうち:(,1000,N,655)1,331 ,モモノウチ:(N,0,-)1 ,):
。(。:(1N,100,N,70203)57,040 ,.:(1,100,N,70203)52,653 ,.:(1,100,N,70203)3,840 ,):
You can uncomment some printf statements in the source files test/main.c and src-main/context.c to make the output more readable/parsable, eg:
1 にほんごにゅうりょく
にほんご 日本語
にゅうりょく 入力
2 もももすももももものうち。
ももも 桃も
すももも すももも
もものうち 桃のうち
。 。
In Linux shell, I am trying to return links to JPG files from the downloaded HTML script file. So far I only got to this point:
grep 'http://[:print:]*.jpg' 'www_page.html'
I don't want to use auxiliary commands like 'tr', 'cut', 'sed' etc...'lynx' is okay!
Using grep alone without massaging the file is doable but not recommended as many have pointed out in the comments.
If you can loosen up your requirements a bit then you can use html tidy to massage the downloaded HTML file so that each html entities are on a single line so that the regular expression can be simpler like you wanted, something like this:
$ tidy file.html|grep -o 'http://[[:print:]]*.jpg'
Note the use of "-o" option to grep to print only the matching part of the input
I would like to convert some PCAP traces to Netflow format for further analysis with netflow tools. Is there any way to do that?
Specifically, I want to use "flow-export" tool in order to extract some fields of interest from a netflow trace as follows:
$ flow-export -f2 -mUNIX_SECS,SYSUPTIME,DPKTS,DOCTETS < mynetflow.trace
In this case, the mynetflow.trace file is taken by converting a PCAP file using the following commands:
$ nfcapd -p 12345 -l ./
$ softflowd -n localhost:12345 -r mytrace.pcap
This, generates a netflow trace but it cannot be used by flow-export correctly, since it is not in the right format. I tried also to pipe the output of the following command to flow-export as follows:
$ flow-import -V1 -z0 -f0 <mynetflow.trace | flow-export -f2 -mUNIX_SECS,SYSUPTIME,DPKTS,DOCTETS
but the output of the first command generated zero timestamps.
Any ideas?
I took at look at the flow-export documentation and there are some acknowledged bugs with the pcap implementation. Not sure if they are fixed yet.
Depending on the content of your capture, you have a couple of other options:
If you captured straight-up traffic from a link and you want to turn that into NetFlow format you can download a free netflow exporter tool that reads PCAP here:
FlowTraq Free Exporter
or here:
NProbe
If you captured NetFlow traffic in transit (say UDP/2055), then you can replay it with a tool like 'tcpreplay', available in any linux distribution.
If you are using a Linux environment, you can use the argus Linux package. Just install argus using apt or your distribution's package manager, and then you can use this with Argus' ra client to get the binetflow format.
Here is the command:
argus -F /mnt/argus.conf -r " +f+" -w - | ra -F /mnt/ra.conf -Z b -n >"+f.split(".")[0]+".binetflow