How to execute svn command along with grep on windows? - python-3.x

Trying to execute svn command on windows machine and capture the output for the same.
Code:
import subprocess
cmd = "svn log -l1 https://repo/path/trunk | grep ^r | awk '{print \$3}'"
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
'grep' is not recognized as an internal or external command,
operable program or batch file.
I do understand that 'grep' is not windows utility.
Getting error as "grep' is not recognized as an internal or external command,
operable program or batch file."
Is it only limited to execute on Linux?
Can we execute the same on Windows?
Is my code right?

For windows your command will look something like the following
svn log -l1 https://repo/path/trunk | find "string_to_find"
You need to use the find utility in windows to get the same effect as grep.
svn --version | find "ra"
* ra_svn : Module for accessing a repository using the svn network protocol.
* ra_local : Module for accessing a repository on local disk.
* ra_serf : Module for accessing a repository via WebDAV protocol using serf.

Use svn log --search FOO instead of grep-ing the command's output.

grep and awk are certainly available for Windows as well, but there is really no need to install them -- the code is easy to replace with native Python.
import subprocess
p = subprocess.run(["svn", "log", "-l1", "https://repo/path/trunk"],
capture_output=True, text=True)
for line in p.stdout.splitlines():
# grep ^r
if line.startswith('r'):
# awk '{ print $3 }'
print(line.split()[2])
Because we don't need a pipeline, and just run a single static command, we can avoid shell=True.
Because we don't want to do the necessary plumbing (which you forgot anyway) for Popen(), we prefer subprocess.run(). With capture_output=True we conveniently get its output in the resulting object's stdout atrribute; because we expect text output, we pass text=True (in older Python versions you might need to switch to the old, slightly misleading synonym universal_newlines=True).
I guess the intent is to search for the committer in each revision's output, but this will incorrectly grab the third token on any line which starts with an r (so if you have a commit message like "refactored to use Python native code" the code will extract use from that). A better approach altogether is to request machine-readable output from svn and parse that (but it's unfortunately rather clunky XML, so there's another not entirely trivial rabbithole for you). Perhaps as middle ground implement a more specific pattern for finding those lines -- maybe look for a specific number of fields, and static strings where you know where to expect them.
if line.startswith('r'):
fields = line.split()
if len(fields) == 13 and fields[1] == '|' and fields[3] == '|':
print(fields[2])
You could also craft a regular expression to look for a date stamp in the third |-separated field, and the number of changed lines in the fourth.
For the record, a complete commit message from Subversion looks like
------------------------------------------------------------------------
r16110 | tripleee | 2020-10-09 10:41:13 +0300 (Fri, 09 Oct 2020) | 4 lines
refactored to use native Python instead of grep + awk
(which is a useless use of grep anyway; see http://www.iki.fi/era/unix/award.html#grep)

Related

"Cat" into multiple files using brace expansion

I am quite new to bash and trying to type some text into multiple files with a single command using brace expansion.
I tried: cat > file_{1..100} to write into 100 files some text that I will type in the terminal. I get the following error:
bash: file_{1..100}: ambiguous redirect
I also tried: cat > "file_{1..100}" but that creates a singe file named: file_{1..100}.
I tried: cat > `file_{1..100}` but that gives the error:
file_1: command not found
How can I achieve this using brace expansion? Maybe there are other ways using other utilities and/or pipelines. But I want to know if that is possible using only simple brace expansion or not.
You can't do this with cat alone. It only writes its output to its standard output, and that single file descriptor can only be associated with a single file.
You can however do it with tee file_{1..100}.
You may wish to consider using tee file_{01..100} instead, so that the filenames are zero-padded to all have the same width: file_001, file_002, ... This has the advantage that lexicographic order will agree with numerical order, and so ls, *, etc, will process them in numerical order. Without this, you have the situation that file_2 comes after file_10 in lexicographic order.
target could be only a pipe, not a multiple files.
If you want redirect output to multiple files, use tee
cat | tee file_{1..100}
Don't forget to check man tee, for example if you want to append to the files, you should add -a option (tee -a file_{1..100})
This types the string or text into file{1..4}
echo "hello you just knew me by kruz" > file{1..4}
Use to remove them
rm file*

Is it possible to display a file's contents and delete that file in the same command?

I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example,  /dev/stdin, /dev/stdout,  and  /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.

Handling expanded git commands with python subprocess module

I'm trying to retrieve and work with data from historical versions of files in a git repo. I'd like to have something like a dictionary that holds <hash>, <time of commit>, <value retrieved from contents of a file revision>, <commit message> for each entry.
I figured the data I retrieve from each file revision, and any calculations done with them, would be best handled using python. And the subprocess module appeared to be the best fit to integrate my git commands.
Below I show how I'm defining a function getval(key, filename) that I had hoped would output <SHA-1 hash>:<Value> to console, but would like to have a dict with more info... also with <time>, and <commit message>.
I help operate an ion accelerator, where we store 'savesets'--or values relevant to a given accelerator tune--using git. Of the values in these files, are things like charge(Q) and mass(A). Ultimately, I want to retrieve both values, get the ratio (Q/A), and display a list of file revision hashes sorted by the charge:mass ratio of the ion we delivered with the settings in that file's revision.
Sample of file (for 56Fe17+):
# Date: 2018-12-21 01:49:16.888
PV,SELECTED,TIMESTAMP,STATUS,SEVERITY,VALUE_TYPE,VALUE,READBACK,READBACK_VALUE,DELTA,READ_ONLY
REA_EXP:LINE,0,1544047322.881066957,NO_ALARM,NONE,enum,"JENSA~[UDF;AT-TPC;GPL;JENSA]",,"---",,true
REA_BTS19:BEAM:OPTICSFILE,0,1541798820.065952460,NO_ALARM,NONE,string,"BTS19_test3.data",,"---",,true
REA_BTS19:BEAM:A_BOOK,0,1545322510.562031883,NO_ALARM,NONE,double,"56.0",,"---",,true
REA_BTS19:BEAM:Z_BOOK,0,1545322567.544226340,NO_ALARM,NONE,double,"26.0",,"---",,true
REA_BTS19:BEAM:Q_BOOK,0,1545322512.701768974,NO_ALARM,NONE,double,"17.0",,"---",,true
So far--and with the help of others here--I've figured out a git one-liner that greps the revision history of a given file for a key[a string] and uses sed and awk to output <hash>:<val associated with the key>.
Git Oneliner I'm Starting with:
git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) -- ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp | sed 's/:/,/' | awk -F, '{print $1 ":" $8}'
Oneliner's Output
e78f73fe6f90e93d5b3ccf90975b0e540d12ce09:"56.0"
4b94745bd0a6594bb42a774c95b5fc0847ef2d82:"56.0"
f2d5e263deac1d9112be791b39f4ce1b1b34e55d:"56.0"
c03800de52143ddb2abfab51fcc665ff5470e363:"56.0"
4a3a564a6d87bc6ff5f3dc7fec7670aeecfe6a79:"58.0"
d591941e51c4eab1237ce726a2a49448114b8f26:"58.0"
a9c8f5cdf224ff4fd94514c33888796760afd792:"58.0"
2f221492beea1663216dcfb27da89343817b11fd:"58.0"
I've also started playing with the subprocess python module. But I'm struggling to figure out how to handle my more complicated git commands. Generally, I'll want to be able to pass a key, and a file.. something like getval(key, filename).
When my cmd string was ['git', 'grep', str, '$(git rev-list --all)', '--', pathspec], it returned errors stating that '$(git rev-list --all)' was ambiguous. Thinking it wasn't being expanded, I added a separate process to execute the nested command, but I'm not sure I'm doing this correctly.
My Python file (gitfun.py): which I'm currently running the function from
import sys, os
import subprocess
def getval(str, pathspec, repoDir='/mnt/d/stash.projects/rea'):
p1 = subprocess.Popen(["git", "rev-list", "--all"], stdout=subprocess.PIPE)
output, err = p1.communicate()
cmd = ['git', 'grep', str, output, '--', pathspec]
p2 = subprocess.Popen(cmd, cwd=repoDir)
p2.wait()
cwd = '/mnt/d/stash.projects/rea'
filename = 'ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp'
os.chdir(cwd)
getval('BTS19:BEAM:A_BOOK', filename)
Currently it is returning 'file name too long' so (even though I'm not convinced it really is too long) I tried changing my core.longpaths in git config to true, however this had no effect. Again why I suspect I'm not handling my replacement of the $(git rev-list --all) expansion correctly.
For this code, I expect something that looks like this:
522628b8d3db01ac330240b28935933b0448649c:ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1545240215.74320185
5,NO_ALARM,NONE,double,"58.0",,"---",,true
2557c599d2dc67d80ffc5b9be3f79899e0c15a10:ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1545240215.74320185
5,NO_ALARM,NONE,double,"58.0",,"---",,true
7fc97ec2aa76f32265196c42dbcd289c49f0ad93:ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1545240215.74320185
5,NO_ALARM,NONE,double,"58.0",,"---",,true
...
But I ultimately want an output to console that looks identical to the git one-liner above, or better yet, a dict that I can print to console or do other things with.
Remember that your shell tokenizes the command line using white space.
When you run git rev-list --all, you get output like:
2a4be2748fad885f88163a5b9b1b438fe3cb2ece
c1a30c743eb810fbefe1dc314277931fa33842b3
b2e5c75131e94a3543e5dcf9fb641ccd553906b4
95718f7e128a8b36ca93d6589328cc5b739668b1
87a9ada188a8cd1c13e48c21f093be7027d61eca
When you substitute that into your git grep command...
git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) -- \
ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp
...each line is a separate argument. That is, if the output of git rev-list --all was exactly what I've shown above, then your one-liner would be tokenized into the following arguments, which I have listed one per line for clarity:
git
grep
BTS19:BEAM:A_BOOK
2a4be2748fad885f88163a5b9b1b438fe3cb2ece
c1a30c743eb810fbefe1dc314277931fa33842b3
b2e5c75131e94a3543e5dcf9fb641ccd553906b4
95718f7e128a8b36ca93d6589328cc5b739668b1
87a9ada188a8cd1c13e48c21f093be7027d61eca
--
ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp
But you're not doing this in your Python code! You're pasing the entire output of git rev-list --all as a single argument. That means the command you're trying to execute has a fixed number (6) of arguments:
git
grep
BTS19:BEAM:A_BOOK
2a4be2748fad885f88163a5b9b1b438fe3cb2ece c1a30c743eb810fbefe1dc314277931fa33842b3 b2e5c75131e94a3543e5dcf9fb641ccd553906b4 95718f7e128a8b36ca93d6589328cc5b739668b1 87a9ada188a8cd1c13e48c21f093be7027d61eca
--
ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp
All those revisions are getting bundled together in a single argument, which is where the "filename too long" error comes from. You need to split that output into multiple arguments just like the shell does:
p1 = subprocess.Popen(["git", "rev-list", "--all"], stdout=subprocess.PIPE)
output, err = p1.communicate()
cmd = ['git', 'grep', str] + output.splitlines() + ['--', pathspec]
p2 = subprocess.Popen(cmd, cwd=repoDir)
p2.wait()

Usage of AWK in Linux

please explain the line below used in shell scripts,
awk -F\| -v src=$storekey 'src==$41' $SRC_Path >> $DST_Path
Thanks!
Ok first ${variable} is a shell variable, so those would be defined higher in your script i.e.
storekey = "1234" or something
you can try this on your shell (linux or command line terminal)
type:
$ storekey="foo"
$ echo $storekey
So most of your question is pertaining to the variables and the command line which confuses how they are used, if you replaced the variables on a command line to test, you could work test it out to find out what they are doing.
In essence Awk is a stream parsing tool, so if you had a file of say 10 columns with a known delimiter such as "," or "|" you could ask awk for a specific column to be printed or output. This is what is happening below, but it is being confused by the presence of custom shell variables.
then to break down the command line awk is parsing a "|" delimited input (-F\| ) defined by $storekey variable, taking the column where src== $41 (this has some reference to the data being input), from $SRC_PATH (a directory) to $DST_PATH (another directory or path).
If you could share more of the shell script I could provide a more in depth answer.
btw, you could also find out more information, using the commands
man awk
info awk
from your command line, however these are a bit arcane for those not so familiar with *nix variants.

egrep command with piped variable in ssh throwing No Such File or Directory error

Ok, here I'm again, struggling with ssh. I'm trying to retrieve some data from remote log file based on tokens. I'm trying to pass multiple tokens in egrep command via ssh:
IFS=$'\n'
commentsArray=($(ssh $sourceUser#$sourceHost "$(egrep "$v" /$INSTALL_DIR/$PROP_BUNDLE.log)"))
echo ${commentsArray[0]}
echo ${commentsArray[1]}
commax=${#commentsArray[#]}
echo $commax
where $v is something like below but it's length is dynamic. Meaning it can have many file names seperated by pipe.
UserComments/propagateBundle-2013-10-22--07:05:37.jar|UserComments/propagateBundle-2013-10-22--07:03:57.jar
The output which I get is:
oracle#172.18.12.42's password:
bash: UserComments/propagateBundle-2013-10-22--07:03:57.jar/New: No such file or directory
bash: line 1: UserComments/propagateBundle-2013-10-22--07:05:37.jar/nouserinput: No such file or directory
0
Thing worth noting is that my log file data has spaces in it. So, in the code piece I've given, the actual comments which I want to extract start after the jar file name like : UserComments/propagateBundle-2013-10-22--07:03:57.jar/
The actual comments are 'New Life Starts here' but the logs show that we are actually getting it till 'New' and then it breaks at space. I tried giving IFS but of no use. Probably I need to give it on remote but I don't know how should I do that.
Any help?
Your command is trying to run the egrep "$v" /$INSTALL_DIR/$PROP_BUNDLE.log on the local machine, and pass the result of that as the command to run via SSH.
I suspect that you meant for that command to be run on the remote machine. Remove the inner $() to get that to happen (and fix the quoting):
commentsArray=($(ssh $sourceUser#$sourceHost "egrep '$v' '/$INSTALL_DIR/$PROP_BUNDLE.log'"))
You should use fgrep to avoid regex special interpretation from your input:
commentsArray=($(ssh $sourceUser#$sourceHost "$(fgrep "$v" /$INSTALL_DIR/$PROP_BUNDLE.log)"))

Resources