Python requests hangs when script launched through crontab

Python requests hangs when script launched through crontab - linux

I've got a Python script which downloads data in json format through HTTP. If I run the script through command-line using the requests module, the HTTP connection is successful and data is downloaded without any issues. But when I try to launch the script as a crontab job, the HTTP connection throws a timeout after a while. Could anyone please tell me what is going on here? I am currently downloading data via a bash script first and then running the Python script from within that bash. But this is nonsense! Thank you so much!
Using: 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
P.S.: I haven't found any posts regarding this issue. If there is already an answer for this on some other post, then please accept my apologies.
This is an excerpt from my code. It times out when running requests.get(url):
try:
response = requests.get(url)
messages = response.json()["Messages"]
except requests.exceptions.Timeout:
logging.critical("TIMEOUT received when connecting to HTTP server.")
except requests.exceptions.ConnectionError:
logging.critical("CONNECTION ERROR received when connecting to HTTP server.")

I just found the answer to my question. I've defined the proxy being used and then used it like this in my code:
HTTP_PROXY="http://your_proxy:proxy_port"
PROXY_DICT={"http":HTTP_PROXY}
response = requests.get(url, proxies=PROXY_DICT)
Reference:
Proxies with Python 'Requests' module
Thank you all for your comprehension. I guess I should have done a thorough search before posting. Sorry.

Related

pysmb not working on linux ubuntu server 22.04 LTS

I have a telegram bot and there I implemented support for receiving files from a file server, in my case smb. The problem is that on the local everything works fine! You also need to keep in mind that I have Ubuntu 20.04 on my local. This is the only thing I haven't been able to check. And I checked the following points.
on the server, the python version was 3.10.4, on the local 3.9. Then i installed the latest version of python and the project works great!
and on the server, when trying to make a connection to SMB, there was an error:
File "bot/venv/lib/python3.10/site-packages/smb/utils/md4.py", line 251, in int_array2str
nstr = nstr + str(chr(i))
TypeError: 'U32' object cannot be interpreted as an integer
Then I found this problem on the Python discuss and tried to do the same. This error disappeared but another, more direct one came up. Now it's just AssertionError and points to the string with the connection attempt.
This is my code with connection:
conn = SMBConnection(
SMB_LOGIN, SMB_PASSWORD,
LOCAL_NAME, SMB_REMOTE_NAME, use_ntlm_v2=True
)
assert conn.connect(SMB_IP_ADDRESS, 139)
There doesn't seem to be anything remarkable here.
I also tried to change ports, put 445/135/137.
445:
ConnectionResetError: [Errno 104] Connection reset by peer
135 and 137:
timeout expired
A simple connection via the smbclient from the terminal also works.
I don't understand what exactly is the problem. The only thing left for me to check is to change the version of Linux. But I also do not believe that after that it will start working

Python script which access GitLab works on Windows but returns 'Project Not Found' on Windows Subsystem for Linux (WSL) - Used python requests

I have a python script which does a GET request to GitLab and stores the data from the response in an excel file using tablib library.
This script works fine in Windows when I execute it using python3.
I have tried to execute the same script in the Windows Subsystem for Linux (WSL) I have enabled and the script fails.
The output when I execute with python3 script.py in WSL is the following:
RESPONSE {"message":"404 Project Not Found"}
When I execute from Windows using python .\gitlab.py where python is python3:
RESPONSE [{"id":567,"iid":22}, {"id":10,"iid":3}]
I think the problem could be related to the GET api call I am doing because in WSL it returns Project Not Found.
I executed that request using curl in WSL to see if the unix in general has this issue, but I get back the expected response instead of the not found response. This was the request:
curl -X GET 'https://URL/api/v4/projects/server%2Fproducts%2FPROJECT/issues?per_page=100' -H 'Content-Type: application/json' -H 'PRIVATE-TOKEN: TOKEN' --insecure
Why is python failing in unix using Python if unix is able to execute the get request using curl? Should I enable/disable something in the request perhaps?
This is the request I am doing in my python script:
def get_items():
url = "https://URL/api/v4/projects/server%2Fproducts%2FPROJECT/issues"
payload = {}
querystring = {"state": "closed", "per_page": "100"}
headers = {
'Content-Type': "application/json",
'PRIVATE-TOKEN': os.environ.get("GITLAB_KEY") # enviromental variable added in windows
}
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
response = requests.request(
"GET", url, headers=headers, data=payload, params=querystring, verify=False)
print("RESPONSE " + response.text)
return json.loads(response.text)
UPDATE:
I have tried using the project id as well instead of the path but it didn't work

REF: https://docs.gitlab.com/ee/api/projects.html#get-single-project
GET /projects/:id
Change this:
url = "https://URL/api/v4/projects/server%2Fproducts%2FPROJECT/issues"
To
projectId = 1234 # or whatever your project id is ... Project Page, Settings -> General
url = "https://URL/api/v4/projects/" + projectId + "/issues"

Based on an answer I got in the post I did in Reddit, I found the problem.
In the python script, I am using an environmental variable which is not accessible in that way ( os.environ.get("GITLAB_KEY") ) from the WSL.
For now, I have replaced it with the hard-coded value just to check that this was really the issue. The script now works as expected.
I will find a way to access the env var again now that I know what the problem was.

How can i stop and start windows services using subprocess with admin permissions?

I am building a python tool to update an application. To do so i have to stop the apache service, do some update related stuff and then start it again, after the update ist finished.
Im currently using python 3.7.2 on Windows 10.
I have tried to somehow build a working process using these questions as a reference:
Run process as admin with subprocess.run in python
Windows can't find the file on subprocess.call()
Python subprocess call returns "command not found", Terminal executes correctly
def stopApache():
processName = config["apacheProcess"]
#stopstr = f'stop-service "{processName}"'
# the line abpove should be used once im finished, but for testing
# purposes im going with the one below
stopstr = 'stop-service "Apache2.4(x64)"'
print(stopstr)
try:
subprocess.run(stopstr, shell=True)
#subprocess.run(stopstr)
# the commented line here is for explanatory purposes, and also to
# show where i started.
except:
print('subprocess failed', sys.exc_info())
stopExecution()
From what i have gathered so far, the shell=TRUE option, is a must, since python does not check PATH.
Given the nature of what im trying to do, i expected the service to get stoppped. In reality the console error looks like this :
stopstr = get-service "Apache2.4(x64)"
Der Befehl "get-service" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.
Which roughly translates to : the command "stop-service" is either incorrectly spelled or could not be found.
If i run the same command directly in powershell i get a similar error. If i open the shell as admin, and run i again, everything works fine.
But, if i use the very same admin shell to run the python code, i am back to square one.
What am i missing here, aparently there is some issue with permissions but i cannot wrap my head arround it

The command for stopping a service in MS Windows is net stop [servicename]
So change stopstr to 'net stop "Apache2.4(x64)"'
You will need to run your script as admin.
shell=True runs commands against cmd, not powershell.
Powershell is different to the regular command line. So to get stop-service to work you'd have to pass it to an instance of powershell.
stopstr = 'stop-service \\"Apache2.4(x64)\\"'
subprocess.run(['powershell', stopstr], shell=True)
As you noted in a comment, it's neccessary to escape the " around the service name, as first python will unescape them, then powershell will use them.

HTTP Error 403 Forbidden - when downloading nltk data [duplicate]

This question already has answers here:
Getting 405 error while trying to download nltk data
(2 answers)
Closed 5 years ago.
I am facing some problem for accessing nltk data. I have tried nltk.download(). The gui page has come with HTTP Error 403: Forbidden error. I have also try to install from command line which is provided here.
python -m nltk.downloader all
and get this error.
C:\Python36\lib\runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) [nltk_data] Error loading all: HTTP Error 403: Forbidden.
I also go through How do I download NLTK data? and Failed loading english.pickle with nltk.data.load.

The problem is coming from the nltk download server. If you look at the gui's config, it's pointing to this link
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
If you access this link in the browser, you get this as a message :
Error 403 Forbidden.
Forbidden.
Guru Mediation:
Details: cache-lcy1125-LCY 1501134862 2002107460
Varnish cache server
So, I was going to file an issue on github, but someone else already did that here : https://github.com/nltk/nltk/issues/1791
A workaround was suggested here: https://github.com/nltk/nltk/issues/1787.
Based on the discussion on github:
It seems like the Github is down/blocking access to the raw content on
the repo.
The suggested workaround is to manually download as follows:
PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA
People also suggested using an laternative index as follows:
python -m nltk.downloader -u https://pastebin.com/raw/D3TBY4Mj punkt

Go to /nltk/downloader.py
And change the default url:
DEFAULT_URL = 'http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml'
to
DEFAULT_URL = 'http://nltk.github.com/nltk_data/'

For me the best solution is:
PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA
link
Alternative solution is not working for me
python -m nltk.downloader -u https://pastebin.com/raw/D3TBY4Mj punkt

Decent error output in browser for wsgi/web.py

I'm using web.py 0.3 / apache2 / mod_wsgi and the cgitb module doesn't seem to work out of the box (I still just get 'internal server error' from web.py and the usual output goes to apache's error_log). The web.py install guide suggested a workaround which didn't work for me - I could probably hack it into working, but is there something better (perhaps designed for web.py or wsgi) that I should use instead?

Set web.config.debug = True before creating your app. That enables debug error, which contains the stack trace of exception along with values of locals.

When debugging apache2 and web.py, it's usually good to catch errors in the apache error log. When you get an internal server error, for instance, it means nothing was returned for whatever reason by your app.
On Linux, I just watch the error log in a separate terminal...
tail -f /var/log/apache2/error_log
or
tail -f /var/log/httpd/error_log
or something depending on your distribution. If there's a typo or error message or what not, you'll get the typical python stack trace in your error log even if you get an internal server error in your browser.

Lack of cgitb was really slowing me down, too. This did it for me:
try:
Output+=TroublesomeScript(etc)
except:
import traceback;
Output+=str(traceback.format_exc())
You can beautify the output if you like but this should give you the information you need for debugging. You can also just output sys.exc_info(), but the traceback module seems to be recommended.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string