Python requests_html render runs forever on certain URLs

Python requests_html render runs forever on certain URLs - python-3.x

I am trying to write a simple script that given an arbitrary URL will return the title tag of that website. Because many of the URLs I want to resolve need to have JavaScript enabled, I need to use something like requests_html's render function to do this. However, I have encountered an issue with the library where the example URL below never terminates. I have tried the timeout arg of the render call and it did not work. Can anyone help me figure out how to get this to timeout properly or some other work around to make sure it doesn't get stuck?
This is my current code that does not terminate (it gets stuck on the render call):
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('http://shan-shui-inf.lingdong.works/')
# render with JS
r.html.render(sleep = 1, keep_page=True)
# Also does not work: r.html.render(sleep = 1, keep_page=True, timeout = 3)
title = r.html.find('title', first=True).full_text
I have already tried solutions like: Timeout on a function call and Python timeout decorator which still did not timeout strangely enough.
NOTE: I am using Python 3.7.4 64-bit on Windows 10.

I would suggest to put r.session.close() at last. This worked for me.

Ok I'm quite late here,
This is what I've done:
pip install -U pyppeteer
(pip installed the 0.2.6 version for me)
Then it worked somehow
(unrelated)
If you want the Chromium browser to appear on the screen you'll need to change requests_html.py (somewhere in site-packages)'s 714th line,
headless=True -> headless=False

Related

Running console window in background for GUI using tkinter on Windows 10

So I have this GUI that I made with tkinter and everything works well. What it does is connects to servers and sends commands for both Linux or Windows. I went ahead and used pyinstaller to create a windowed GUI without console and when I try to uses a specific function for sending Windows commands it will fail. If I create the GUI with a console that pops up before the GUI, it works like a charm. What I'm trying to figure out is how to get my GUI to work with the console being invisible to the user.
The part of my code that has the issue revolves around subprocess. To spare you all from the 400+ lines of code I wrote, I'm providing the specific code that has issues. Here is the snippet:
def rcmd_in(server):
import subprocess as sp
for i in command_list:
result = sp.run(['C:/"Path to executable"/rcmd.exe', '\\\\' + server, i],
universal_newlines=True, stdout=sp.PIPE, stderr=sp.STDOUT)
print(result.stdout)
The argument 'server' is passed from another function that calls to 'rcmd_in' and 'command_list' is a mutable list created in the root of the code, accessible for all functions.
Now, I have done my due diligence. I scoured multiple searches and came up with an edit to my code that makes an attempt to run my code with that console invisible, found using info from this link: recipe-subprocess. Here is what the edit looks like:
def rcmd_in(server):
import subprocess as sp
import os, os.path
si = sp.STARTUPINFO()
si.dwFlags |= sp.STARTF_USESHOWWINDOW
for i in command_list:
result = sp.run(['C:/"Path to executable"/rcmd.exe', '\\\\' + server, i],
universal_newlines=True, stdin=sp.PIPE, stdout=sp.PIPE,
stderr=sp.STDOUT, startupinfo=si, env=os.environ)
print(result.stdout)
The the problem I have now is when it runs an error of "Error:8 - Internal error -109" pops up. Let me add I tried using functions 'call()', 'Popen()', and others but only 'run()' seems to work.
I've reached a point where my brain hurts and I can use some help. Any suggestions? As always I am forever great full for anyone's help. Thanks in advance!

I figured it out and it only took me 5 days! :D
Looks like the reason the function would fail falls on how Windows handles stdin. I found a post that helped me edit my code to work with pyinstaller -w (--noconsole). Here is the updated code:
def rcmd_in(server):
import subprocess as sp
si = sp.STARTUPINFO()
si.dwFlags |= sp.STARTF_USESHOWWINDOW
for i in command_list:
result = sp.Popen(['C:/"Path to executable"/rcmd.exe', '\\\\' + server, i],
universal_newlines=True, stdin=sp.PIPE, stdout=sp.PIPE,
stderr=sp.PIPE, startupinfo=si)
print(result.stdout.read())
Note the change of functions 'run()' to 'Popen()'. The 'run()' function will not work with the print statement at the end. Also, for those of you who are curious the 'si' variable I created is preventing 'subprocess' from opening a console when being ran while using a GUI. I hope this will become useful to someone struggling with this. Cheers

OpenCV(imread) operation stuck in elastic beanstalk

I'm trying to read a png file and output the numpy matrix of the image in terminal using imread function of opencv on the server like this
import cv2
from flask import Flask
import os
#application.route('/readImage',methods=['POST'])
def handleHTTPPostRequest():
imagePath = f'{os.getcwd()}/input.png'
print('image path is', imagePath)
print(cv2.__version__)
im = cv2.imread(imagePath,cv2.IMREAD_COLOR)
print(im)
return 'success'
This is giving expected output on my local machine(Ubuntu 18.04) no matter howmany times I execute it. I moved this to elastic beanstalk(CentOS) with necessary setup. The request runs fine(gives proper logs along with success) the very first time I make a post call.
But when I make the post call second time, it's only outputting first two logs(imagepath and cv2 version) and is stuck there for a while. and after sometime, it's showing this error
End of script output before headers: application.py
I have added one more line just before cv2.imread just to make sure that the file exists
print('does the file exists',os.path.isfile(imagePath) )
This is returning true everytime. I have restarted the server multiple times, looks like it only works the very first time and cv2.imread() is stuck after the first post call.What am I missing

When you print from a request handler, Flask tries to do something sensible, but print really isn't what you want to be doing, as it risks throwing the HTTP request/response bookkeeping off.
A fully-supported way of getting diagnostic info out of a handler is to use the logging module. It will require a small bit of configuration. See http://flask.pocoo.org/docs/1.0/logging/

To anyone facing this issue, I have found a solution. Add this to your ebextensions config file
container_commands:
AddGlobalWSGIGroupAccess:
command: "if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' ../wsgi.conf ; then echo 'WSGIApplicationGroup %{GLOBAL}' >> ../wsgi.conf; fi;"

Saikiran's final solution worked for me. I was getting this issue when I tried calling methods from the opencv-python library. I'm running Ubuntu 18.04 locally and it works fine there. However, like Saikiran's original post, when deployed to Elastic Beanstalk the first request works and then the second one does not. For my EB environment, I'm using a Python3.6-based Amazon Linux server.

pyldavis Unable to view the graph

I am trying to visually depict my topics in python using pyldavis. However i am unable to view the graph. Is it that we have to view the graph in the browser or will it get popped upon execution. Below is my code
import pyLDAvis
import pyLDAvis.gensim as gensimvis
print('Pyldavis ....')
vis_data = gensimvis.prepare(ldamodel, doc_term_matrix, dictionary)
pyLDAvis.display(vis_data)
The program is continuously in execution mode on executing the above commands. Where should I view my graph? Or where it will be stored? Is it integrated only with the Ipython notebook?Kindly guide me through this.
P.S My python version is 3.5.

This not work:
pyLDAvis.display(vis_data)
This will work for you:
pyLDAvis.show(vis_data)

I'm facing the same problem now.
EDIT:
My script looks as follows:
first part:
import pyLDAvis
import pyLDAvis.sklearn
print('start script')
tf_vectorizer = CountVectorizer(strip_accents = 'unicode',stop_words = 'english',lowercase = True,token_pattern = r'\b[a-zA-Z]{3,}\b',max_df = 0.5,min_df = 10)
dtm_tf = tf_vectorizer.fit_transform(docs_raw)
lda_tf = LatentDirichletAllocation(n_topics=20, learning_method='online')
print('fit')
lda_tf.fit(dtm_tf)
second part:
print('prepare')
vis_data = pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer)
print('display')
pyLDAvis.display(vis_data)
The problem is in the line "vis_data = (...)".if I run the script, it will print 'prepare' and keep on running after that without printing anything else (so it never reaches the line "print('display')).
Funny thing is, when I just run the whole script it gets stuck on that line, but when I run the first part, got to my console and execute purely the single line "vis_data = pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer)" this is executed in a couple of seconds.
As for the graph, I saved it as html ("simple") and use the html file to view the graph.

I ran into the same problem (I use PyCharm as IDE) The problem is that pyLDAvize is developed for Ipython (see the docs, https://media.readthedocs.org/pdf/pyldavis/latest/pyldavis.pdf, page 3).
My fix/workaround:
make a dict of lda_tf, dtm_tf, tf_vectorizer (eg., pyLDAviz_dict)dump the dict to a file (eg mydata_pyLDAviz.pkl)
read the pkl file into notebook (I did get some depreciation info from pyLDAviz, but that had no effect on the end result)
play around with pyLDAviz in notebook
if you're happy with the view, dump it into html
The cause is (most likely) that pyLDAviz expects continuous user interaction (including user-initiated "exit"). However, I rather dump data from a smart IDE and read that into jupyter, than develop/code in jupyter notebook. That's pretty much like going back to before-emacs times.
From experience this approach works quite nicely for other plotting rountines

If you received the module error pyLDA.gensim, then try this one instead:
import pyLdAvis.gensim_models
You get the error because of a new version update.

Playing a sound in a ipython notebook

I would like to be able to play a sound file in a ipython notebook.
My aim is to be able to listen to the results of different treatments applied to a sound directly from within the notebook.
Is this possible? If yes, what is the best solution to do so?

The previous answer is pretty old. You can use IPython.display.Audio now. Like this:
import IPython
IPython.display.Audio("my_audio_file.mp3")
Note that you can also process any type of audio content, and pass it to this function as a numpy array.
If you want to display multiple audio files, use the following:
IPython.display.display(IPython.display.Audio("my_audio_file.mp3"))
IPython.display.display(IPython.display.Audio("my_audio_file.mp3"))

A small example that might be relevant : http://nbviewer.ipython.org/5507501/the%20sound%20of%20hydrogen.ipynb
it should be possible to avoid gooing through external files by base64 encoding as for PNG/jpg...

The code:
import IPython
IPython.display.Audio("my_audio_file.mp3")
may give an error of "Invalid Source" in IE11, try in other browsers it should work fine.

The other available answers added an HTML element which I disliked, so I created the ringbell, which gets you both play a custom sound as such:
from ringbell import RingBell
RingBell(
sample = "path/to/sample.wav",
minimum_execution_time = 0,
verbose = True
)
and it also gets you a one-lines to play a bell when a cell execution takes more than 1 minute (or a custom amount of time for that matter) or is fails with an exception:
import ringbell.auto
You can install this package from PyPI:
pip install ringbell

If the sound you are looking for could be also a "Text-to-Speech", I would like to mention that every time a start some long process in the background, I queue the execution of a cell like this too:
from IPython.display import clear_output, display, HTML, Javascript
display(Javascript("""
var msg = new SpeechSynthesisUtterance();
msg.text = "Process completed!";
window.speechSynthesis.speak(msg);
"""))
You can change the text you want to hear with msg.text.

python 3.3 basic error

I have python 3.3 installed.
i use the example they use on their site:
import urllib.request
response = urllib.request.urlopen('http://python.org/')
html = response.read()
the only thing that happens when I run it is I get this :
======RESTART=========
I know I am a rookie but I figured the example from python's own website should be able to work.
It doesn't. What am I doing wrong?Eventually I want to run this script from the website below. But I think urllib is not going to work as it is on that site. Can someone tell me if the code will work with python3.3???
http://flowingdata.com/2007/07/09/grabbing-weather-underground-data-with-beautifulsoup/

I think I see what's probably going on. You're likely using IDLE, and when it starts a new run of a program, it prints the
======RESTART=========
line to tell you that a fresh program is starting. That means that all the variables currently defined are reset and/or deleted, as appropriate.
Since your program didn't print any output, you didn't see anything.
The two lines I suggested adding were just tests to figure out what was going on, they're not needed in general. [Unless the window itself is automatically closing, which it shouldn't.] But as a rule, if you want to see output, you'll have to print what you're interested in.

Your example works for me. However, I suggest using requests instead of urllib2.
To simplify the example you linked to, it would look like:
from bs4 import BeautifulSoup
import requests
resp = requests.get("http://www.wunderground.com/history/airport/KBUF/2007/12/16/DailyHistory.html")
soup = BeautifulSoup(resp.text)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python requests_html render runs forever on certain URLs - python-3.x

I would suggest to put r.session.close() at last. This worked for me.

Related

Running console window in background for GUI using tkinter on Windows 10

OpenCV(imread) operation stuck in elastic beanstalk

pyldavis Unable to view the graph

Playing a sound in a ipython notebook

python 3.3 basic error

Categories

Resources