cant call curl from python3 - python-3.x

I am trying to call this curl from python3. This, from bash, is working fine.
curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1103/PhysRevLett.117.126802
yielding the expected result:
#article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J. K. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H. W.}, year={2016}, month={Sep}}
in python3, I am doing:
import subprocess
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
try:
subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
except ExplicitException:
print("DOI is not available")
self.Messages.on_warn_clicked("DOI is not given",
"Search google instead")
which is giving error:
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>
whats going wrong here?

You have 3 problems here:
don't quote your arguments in subprocess, it already does that for you when necessary, since you pass the arguments and not the unsplitted command line (good practice, keep it on, but drop the unneccessary quoting).
then, subprocess.call does not allow to parse/store the output in python, which is problematic for number 3:
and last: your site answers with rubbish HTML (java stacktrace) randomly. This explains why you're getting different output in python, but you can get it in bash as well.
Problem #1
subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
should be
subprocess.call(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi])
Else, quotes are applied twice and your Accept: xxx argument has quotes around it, which is unexpected by curl
demo of the non-working quote part:
import subprocess,os
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
#### this is wrong because of the quoting ####
p = subprocess.Popen(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi],stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
[output,error] = p.communicate()
print(output)
result:
b' some stats then ... <html><body><h1>400 Bad request</h1>\nYour browser sent an invalid request.\n</body></html>\n\r\n'
Problems #2 and #3
I have implemented a retry mechanism which parses the output and retries until correct output is found:
import subprocess,os,sys
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
while True:
p = subprocess.Popen(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi],stdout=subprocess.PIPE)
[output,error] = p.communicate()
output = output.decode("latin-1")
if "java.util.concurrent.FutureTask.run" in output:
# site crashed when responding: junk HTML output: retry
sys.stderr.write("Wrong answer: retrying\n")
else:
print(output)
break
result:
Wrong answer: retrying <==== here the site throwed a big HTML exception output
#article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J.âK. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H.âW.}, year={2016}, month={Sep}}
So it works, it's just a site problem, but with my python wrapper you are able to re-submit the request until it yields the proper answer.

Related

Trend Micro deepsecurity delete a computer: HTTP Status 400 – Bad Request

Trying to delete a computer from TMDS (Trend Micro Deep Security) with the python script provided.
The script was copied from TMDS and slightly altered. I added a line where the document which opens also is being read.
tmds_del_comp_list.txt => contains the computername to delete. eg: computername.domain.domain
creds.py contains the api_key.
The ipaddress and port has been changed for obvious reasons.
To confirm, the deepsecurity module has been installed in the same directory.
Directories deepsecurity, deep_security_api.egg-info, build and pycache are present.
# DEL operation, /api/computers/{computerID}
from __future__ import print_function
import sys, warnings
import deepsecurity
from deepsecurity.rest import ApiException
import creds
# Setup
if not sys.warnoptions:
warnings.simplefilter("ignore")
configuration = deepsecurity.Configuration()
configuration.host = 'https://ipaddress:1234/api/computers/{computerID}'
# Authentication
configuration.api_key['api-secret-key'] = creds.api_key
# Initialization
# Set Any Required Values
# api_version = 'v4'
reed = open('tmds_del_comp_list.txt', mode='r' )
computer_id = reed.read()
api_instance = deepsecurity.ComputersApi(deepsecurity.ApiClient(configuration))
api_version = 'v1'
try:
api_instance.delete_computer(computer_id, api_version)
except ApiException as e:
print("An exception occurred when calling ComputersApi.delete_computer: %s\n" % e)
error received:
An exception occurred when calling ComputersApi.delete_computer: (400)
Reason:
HTTP response headers: HTTPHeaderDict({'Content-Type': 'text/html;charset=utf-8', 'Content-Language': 'en', 'Content-Length': '435', 'Date': 'Thu, 06 Oct 2022 09:24:22 GMT', 'Connection': 'close'})
HTTP response body: <!doctype html><html lang="en"><head><title>HTTP Status 400 – Bad Request</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 400 – Bad Request</h1></body></html>
Before this error i received different errors, which i corrected. I had a .env file instead of creds.py.
I have tested out if my tmds_del_comp_list.txt file was actually being read, it was not so that's why i added the line with the read function. ( when i did print print(reed) nothing came up)
The api_version was wrong, from the documentation i understood that TMDS version 20 corresponds to version v4 in the API. After changing it to v1, no more error. When double checking the version in the browser https://ipaddress:4119/rest/apiVersion I get 4. Bit baffled by this.
'https://ipaddress:1234/api/computers/{computerID}'
I find the url weird. The {computerID} bit is what i find weird since it does not correspond to any variable. I do not see how it works together with the rest of the code, unless api_instance.delete_computer adds computer_id to {computerID}. There's no indication that what i think is correct or not.
api_instance.delete_computer(computer_id, api_version)
Googling does not really bring any relevant information up.
I'm a beginner with python, api's and deepsecurity.
Any leads, pointing to the obvious and constructive help/comments/etc are very welcome.
Edit1: looking back at all the docs available, i see that "computerID" should be an integer, which in our organisation, it is not a number or integer but a vm name + domain name.
Maybe there's a number connected to every VM reporting to TMDS. Maybe i'll try to Get/List all computers to see what id's they have.
I did try that and i could not find an ID number with just a number.
This is probably not the issue.
path Parameters
computerID
required
integer <int32> \d+
The ID number of the computer to delete.
Example: 1
Edit2: When clicking on the arrow down next to GET/computers from link <https://automation.deepsecurity.trendmicro.com/article/12_0/api-reference/tag/Computers#operation/listComputers>
I get to see this link https://automation.deepsecurity.trendmicro.com/#operation/searchComputers/computers which points to the correct endpoint i presume and where #operation should be replaced by GET. When doing so i get a 404 response. I got a 404 when changing the endpoint to /get/computers.
Conclusion, end-point is probably correct, when it is not, i do get an error that the url is wrong.
error:
An exception occurred when calling ComputersApi.list_computers: (404)
Reason:
HTTP response headers: HTTPHeaderDict({'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1;mode=block', 'Set-Cookie': 'JSESSIONID=codewithlettersandnumbers; Path=/; Secure; HttpOnly; SameSite=Lax', 'Content-Type': 'text/html;charset=ISO-8859-1', 'Content-Length': '105', 'Date': 'Thu, 06 Oct 2022 14:56:58 GMT'})
HTTP response body:
<html>
<head>
<meta http-equiv="REFRESH" content="0;url=/SignIn.screen">
</head>
<body>
</body>
</html>
Edit3: Testing a simple GET/computers with Postman gave me a clue that the actual key was wrong. I corrected that and got a 200 response. So the key was wrong. I corrected that on my python script but i still get the same 400 error.

SED style Multi address in Python?

I have an app that parses multiple Cisco show tech files. These files contain the output of multiple router commands in a structured way, let me show you an snippet of a show tech output:
`show clock`
20:20:50.771 UTC Wed Sep 07 2022
Time source is NTP
`show callhome`
callhome disabled
Callhome Information:
<SNIPET>
`show module`
Mod Ports Module-Type Model Status
--- ----- ------------------------------------- --------------------- ---------
1 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
2 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
3 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
4 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
21 0 Fabric Module N9K-C9504-FM-R ok
22 0 Fabric Module N9K-C9504-FM-R ok
23 0 Fabric Module N9K-C9504-FM-R ok
<SNIPET>
My app currently uses both SED and Python scripts to parse these files. I use SED to parse the show tech file looking for a specific command output, once I find it, I stop SED. This way I don't need to read all the file (these can get to be very big files). This is a snipet of my SED script:
sed -E -n '/`show running-config`|`show running`|`show running config`/{
p
:loop
n
p
/`show/q
b loop
}' $1/$file
As you can see I am using a multi address range in SED. My question specifically is, how can I achieve something similar in python? I have tried multiple combinations of flags: DOTALL and MULTILINE but I can't get the result I'm expecting, for example, I can get a match for the command I'm looking for, but python regex wont stop until the end of the file after the first match.
I am looking for something like this
sed -n '/`show clock`/,/`show/p'
I would like the regex match to stop parsing the file and print the results, immediately after seeing `show again , hope that makes sense and thank you all for reading me and for your help
You can use nested loops.
import re
def process_file(filename):
with open(filename) as f:
for line in f:
if re.search(r'`show running-config`|`show running`|`show running config`', line):
print(line)
for line1 in f:
print(line1)
if re.search(r'`show', line1):
return
The inner for loop will start from the next line after the one processed by the outer loop.
You can also do it with a single loop using a flag variable.
import re
def process_file(filename):
in_show = False
with open(filename) as f:
for line in f:
if re.search(r'`show running-config`|`show running`|`show running config`', line):
in_show = True
if in_show
print(line)
if re.search(r'`show', line1):
return

How to profile a vim plugin written in python

Vim offers the :profile command, which is really handy. But it is limited to Vim script -- when it comes to plugins implemented in python it isn't that helpful.
Currently I'm trying to understand what is causing a large delay on Denite. As it doesn't happen in vanilla Vim, but only on some specific conditions which I'm not sure how to reproduce, I couldn't find which setting/plugin is interfering.
So I turned to profiling, and this is what I got from :profile:
FUNCTION denite#vim#_start()
Defined: ~/.vim/bundle/denite.nvim/autoload/denite/vim.vim line 33
Called 1 time
Total time: 5.343388
Self time: 4.571928
count total (s) self (s)
1 0.000006 python3 << EOF
def _temporary_scope():
nvim = denite.rplugin.Neovim(vim)
try:
buffer_name = nvim.eval('a:context')['buffer_name']
if nvim.eval('a:context')['buffer_name'] not in denite__uis:
denite__uis[buffer_name] = denite.ui.default.Default(nvim)
denite__uis[buffer_name].start(
denite.rplugin.reform_bytes(nvim.eval('a:sources')),
denite.rplugin.reform_bytes(nvim.eval('a:context')),
)
except Exception as e:
import traceback
for line in traceback.format_exc().splitlines():
denite.util.error(nvim, line)
denite.util.error(nvim, 'Please execute :messages command.')
_temporary_scope()
if _temporary_scope in dir():
del _temporary_scope
EOF
1 0.000017 return []
(...)
FUNCTIONS SORTED ON TOTAL TIME
count total (s) self (s) function
1 5.446612 0.010563 denite#helper#call_denite()
1 5.396337 0.000189 denite#start()
1 5.396148 0.000195 <SNR>237_start()
1 5.343388 4.571928 denite#vim#_start()
(...)
I tried to use the python profiler directly by wrapping the main line:
import cProfile
cProfile.run(_temporary_scope(), '/path/to/log/file')
, but no luck -- just a bunch of errors from cProfile. Perhaps it is because the way python is started from Vim, as it is hinted here that it only works on the main thread.
I guess there should be an easier way of doing this.
The python profiler does work by enclosing the whole code,
cProfile.run("""
(...)
""", '/path/to/log/file')
, but it is not that helpful. Maybe it is all that is possible.

Beautiful Soup findAll() doesn't find the first one

I'm working on a coreference-resolution system based on Neural Networks for my Bachelor's Thesis, and i have a problem when i read the corpus.
The corpus is already preproccesed, and i only need to read it to do my stuff. I use Beautiful Soup 4 to read the xml files of each document that contains the data i need.
the files look like this:
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE markables SYSTEM "markables.dtd">
<markables xmlns="www.eml.org/NameSpaces/markable">
<markable id="markable_102" span="word_390" grammatical_role="vc" coref_set="empty" visual="none" rel_type="none" np_form="indefnp" type="" entity="NO" nb="UNK" def="INDEF" sentenceid="19" lemmata="premia" pos="nn" head_pos="word_390" wikipedia="" mmax_level="markable"/>
<markable id="markable_15" span="word_48..word_49" grammatical_role="vc" coref_set="empty" visual="none" rel_type="none" np_form="defnp" type="" entity="NO" nb="SG" def="DEF" sentenceid="3" lemmata="Grozni hegoalde" pos="nnp nn" head_pos="word_48" wikipedia="Grozny" mmax_level="markable"/>
<markable id="markable_101" span="word_389" grammatical_role="sbj" coref_set="set_21" coref_type="named entities" visual="none" rel_type="coreferential" sub_type="exact repetition" np_form="ne_o" type="enamex" entity="LOC" nb="SG" def="DEF" sentenceid="19" lemmata="Mosku" pos="nnp" head_pos="word_389" wikipedia="" mmax_level="markable"/>
...
i need to extract all the spans here, so try to do it with this code (python3):
...
from bs4 import BeautifulSoup
...
file1 = markables+filename+"_markable_level.xml"
xml1 = open(file1) #markable
soup1 = BeautifulSoup(xml1, "html5lib") #markable
...
...
for markable in soup1.findAll('markable'):
try:
span = markable.contents[1]['span']
print(span)
spanA = span.split("..")[0]
spanB = span.split("..")[-1]
...
(I ignored most of the code, as they are 500 lines)
python3 aurreprozesaketaSTM.py
train
--- 28.329787254333496 seconds ---
&&&&&&&&&&&&&&&&&&&&&&&&& egun.06-1-p0002500.2000-06-01.europa
word_48..word_49
word_389
word_385..word_386
word_48..word_52
...
if you conpare the xml file with the output, you can see that word_390 is missing.
I get almost all the data that i need, then preproccess everything, build the system with neural networks, and finally i get scores and all...
But as I loose the first word of each document, my systems accuracy is a bit lower than what should be.
Can anyone help me with this? Any idea where is the problem?
You are parsing XML with html5lib. It is not supported for parsing XML.
lxml’s XML parser ... The only currently supported XML parser
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

Minimal self-compiling to .pdf Rmarkdown file

I need to compose a simple rmarkdown file, with text, code and the results of executed code included in a resulting PDF file. I would prefer if the source file is executable and self sifficient, voiding the need for a makefile.
This is the best I have been able to achieve, and it is far from good:
#!/usr/bin/env Rscript
library(knitr)
pandoc('hw_ch4.rmd', format='latex')
# TODO: how to NOT print the above commands to the resulting .pdf?
# TODO: how to avoid putting everyting from here on in ""s?
# TODO: how to avoid mentioning the file name above?
# TODO: how to render special symbols, such as tilde, miu, sigma?
# Unicode character (U+3BC) not set up for use with LaTeX.
# See the inputenc package documentation for explanation.
# nano hw_ch4.rmd && ./hw_ch4.rmd && evince hw_ch4.pdf
"
4E1. In the model definition below, which line is the likelihood?
A: y_i is the likelihood, based on the expectation and deviation.
4M1. For the model definition below, simulate observed heights from the prior (not the posterior).
A:
```{r}
points <- 10
rnorm(points, mean=rnorm(points, 0, 10), sd=runif(points, 0, 10))
```
4M3. Translate the map model formula below into a mathematical model definition.
A:
```{r}
flist <- alist(
y tilda dnorm( mu , sigma ),
miu tilda dnorm( 0 , 10 ),
sigma tilda dunif( 0 , 10 )
)
```
"
Result:
What I eventually came to use is the following header. At first it sounded neat, but later I realized
+ is indeed easy to compile in one step
- this is code duplication
- mixing executable script and presentation data in one file is a security risk.
Code:
#!/usr/bin/env Rscript
#<!---
library(rmarkdown)
argv <- commandArgs(trailingOnly=FALSE)
fname <- sub("--file=", "", argv[grep("--file=", argv)])
render(fname, output_format="pdf_document")
quit(status=0)
#-->
---
title:
author:
date: "compiled on: `r Sys.time()`"
---
The quit() line is supposed to guarantee that the rest of the file is treated as data. The <!--- and --> comments are to render the executable code as comments in the data interpretation. They are, in turn, hidden by the #s from the shell.

Resources