I have a list of URLs. I want to get their content asynchronously every 10 seconds.
urls = [
'http://www.python.org',
'http://stackoverflow.com',
'http://www.twistedmatrix.com',
'http://www.google.com',
'http://launchpad.net',
'http://github.com',
'http://bitbucket.org',
]
waiting = [client.getPage(url) for url in urls]
defer.gatherResults(waiting).addCallback(saveResults)
reactor.run()
How do I do this? This code allows me to only get the urls content once. Calling it again throws error.ReactorNotRestartable()
Thanks :)
This is definitely possible with Twisted.
First off, although this is somewhat unrelated to your question, don't use getPage. It's a very limited API, with poor defaults for security on HTTPS. Instead, use Treq.
Now, onto your main question.
The important thing to understand about reactor.run() is that it doesn't mean "run this code here". It means "run the whole program". When reactor.run() exits, it's time for your program to exit.
Lucky for you, Twisted has a nice built-in way to do things on a regular schedule: LoopingCall.
Here's a working example, using treq and LoopingCall:
urls = [
'http://www.python.org',
'http://stackoverflow.com',
'http://www.twistedmatrix.com',
'http://www.google.com',
'http://launchpad.net',
'http://github.com',
'http://bitbucket.org',
]
from twisted.internet.task import LoopingCall
from twisted.internet.defer import gatherResults
from treq import get, content
def fetchWebPages():
return (gatherResults([get(url).addCallback(content) for url in urls])
.addCallback(saveResults))
def saveResults(responses):
print("total: {} bytes"
.format(sum(len(response) for response in responses)))
repeatedly = LoopingCall(fetchWebPages)
repeatedly.start(10.0)
from twisted.internet import reactor
reactor.run()
As a bonus, this handles the case where fetchWebPages takes longer than 10 seconds, and will react intelligently rather than letting too many outstanding requests pile up, or delaying longer and longer as the requests take longer.
Related
The telegram bot I'm making can execute a function that takes a few minutes to process and I'd like to be able to continue to use the bot while it's processing the function.
I'm using aiogram, asyncio and I tried using Python threading to make this possible.
The code I currently have is:
import asyncio
from queue import Queue
from threading import Thread
import time
import logging
from aiogram import Bot, types
from aiogram.types.message import ContentType
from aiogram.contrib.middlewares.logging import LoggingMiddleware
from aiogram.contrib.fsm_storage.memory import MemoryStorage
from aiogram.dispatcher import Dispatcher, FSMContext
from aiogram.utils.executor import start_webhook
from aiogram.types import InputFile
...
loop = asyncio.get_event_loop()
bot = Bot(token=BOT_TOKEN, loop=loop)
dp = Dispatcher(bot, storage=MemoryStorage())
dp.middleware.setup(LoggingMiddleware())
task_queue = Queue()
...
async def send_result(id):
logging.warning("entered send_result function")
image_res = InputFile(path_or_bytesio="images/result/res.jpg")
await bot.send_photo(id, image_res, FINISHED_MESSAGE)
def queue_processing():
while True:
if not task_queue.empty():
task = task_queue.get()
if task["type"] == "nst":
nst.run(task["style"], task["content"])
send_fut = asyncio.run_coroutine_threadsafe(send_result(task['id']), loop)
send_fut.result()
task_queue.task_done()
time.sleep(2)
if __name__ == "__main__":
executor_images = Thread(target=queue_processing, args=())
executor_images.start()
start_webhook(
dispatcher=dp,
webhook_path=WEBHOOK_PATH,
skip_updates=False,
on_startup=on_startup,
host=WEBAPP_HOST,
port=WEBAPP_PORT,
)
So I'm trying to setup a separate thread that's running a loop that is processing a queue of slow tasks thus allowing to continue chatting with the bot in the meantime and which would send the result message (image) to the chat after it's finished with a task.
However, this doesn't work. My friend came up with this solution while doing a similar task about a year ago, and it does work in his bot, but it doesn't seem to work in mine.
Judging by logs, it never even enters the send_result function, because the warning never comes through. The second thread does work properly though and the result image is saved and is located in its assigned path by the time nst.run finishes working.
I tried A LOT of different things and I'm very puzzled why this solution doesn't work for me because it does work with another bot. For example, I tried using asyncio.create_task instead of asyncio.run_coroutine_threadsafe, but to no avail.
To my understanding, you don't need to pass a loop to aiogram's Bot or Dispatcher anymore, but in that case I don't know how to send a task to the main thread from the second one.
Versions I'm using: aiogram 2.18, asyncio 3.4.3, Python 3.9.10.
Solved, the issue was that you can't access the bot's loop directly (with bot.loop or dp.loop) even if you pass your own asyncio loop to the bot or the dispatcher.
So the solution was to access the main thread's loop by using asyncio.get_event_loop() (which returns currently running loop, if there's one) from within one of the message handlers, because the loop is running at this point, and pass it to asyncio.run_coroutine_threadsafe (I used the "task" dictionary for that) like this: asyncio.run_coroutine_threadsafe(send_result(task['id']), task['loop']).
I'm building an apllication which is intended to do a bulk-job processing data within another software. To control the other software automatically I'm using pyautoit, and everything works fine, except for application errors, caused from the external software, which occur from time to time.
To handle those cases, I built a watchdog:
It starts the script with the bulk job within a subprocess
process = subprocess.Popen(['python', job_script, src_path], stdout=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
It listens to the system event using winevt.EventLog module
EventLog.Subscribe('System', 'Event/System[Level<=2]', handle_event)
In case of an error occurs, it shuts down everything and re-starts the script again.
Ok, if an system error event occurs, this event should get handled in a way, that the supprocess gets notified. This notification should then lead to the following action within the subprocess:
Within the subprocess there's an object controlling everything and continuously collecting
generated data. In order to not having to start the whole job from the beginnig, after re-starting the script, this object has to be dumped using pickle (which isn't the problem here!)
Listening to the system event from inside the subprocess didn't work. It results in a continuous loop, when calling subprocess.Popen().
So, my question is how I can either subscribe for system events from inside a childproces, or communicate between the parent and childprocess - means, sending a message like "hey, an errorocurred", listening within the subprocess and then creating the dump?
I'm really sorry not being allowed to post any code in this case. But I hope (and actually think), that my description should be understandable. My question is just about what module to use to accomplish this in the best way?
Would be really happy, if somebody could point me into the right direction...
Br,
Mic
I believe the best answer may lie here: https://docs.python.org/3/library/subprocess.html#subprocess.Popen.stdin
These attributes should allow for proper communication between the different processes fairly easily, and without any other dependancies.
Note that Popen.communicate() may suit better if other processes may cause issues.
EDIT to add example scripts:
main.py
from subprocess import *
import sys
def check_output(p):
out = p.stdout.readline()
return out
def send_data(p, data):
p.stdin.write(bytes(f'{data}\r\n', 'utf8')) # auto newline
p.stdin.flush()
def initiate(p):
#p.stdin.write(bytes('init\r\n', 'utf8')) # function to send first communication
#p.stdin.flush()
send_data(p, 'init')
return check_output(p)
def test(p, data):
send_data(p, data)
return check_output(p)
def main()
exe_name = 'Doc2.py'
p = Popen([sys.executable, exe_name], stdout=PIPE, stderr=STDOUT, stdin=PIPE)
print(initiate(p))
print(test(p, 'test'))
print(test(p, 'test2')) # testing responses
print(test(p, 'test3'))
if __name__ == '__main__':
main()
Doc2.py
import sys, time, random
def recv_data():
return sys.stdin.readline()
def send_data(data):
print(data)
while 1:
d = recv_data()
#print(f'd: {d}')
if d.strip() == 'test':
send_data('return')
elif d.strip() == 'init':
send_data('Acknowledge')
else:
send_data('Failed')
This is the best method I could come up with for cross-process communication. Also make sure all requests and responses don't contain newlines, or the code will break.
I have a flask app that returns a JSON response. However, I want it to call that function every 30 seconds without clicking the refresh button on the browser. Here is what I did
Using apscheduler
. This code in application.py
from apscheduler.schedulers.background import BachgroundScheduler
def create_app(config_filname):
con = redis.StrictRedis(host= "localhost", port=6379, charset ="utf-8", decode_responses=True, db=0)
application = Flask(__name__)
CORS(application)
sched = BackgroundScheduler()
#application.route('/users')
#cross_origin()
#sched.scheduled_job('interval', seconds = 20)
def get_users():
//Some code...
return jsonify(users)
sched.start()
return application
Then in my wsgi.py
from application import create_app
application = create_app('application.cfg')
with application.app_context():
if __name__ == "__main__":
application.run()
When I run this appliaction, I get the json output but it does not refresh instead after 20 seconds it throws
RuntimeError: Working outside of application context.
This typically means that you attempted to use functionality that needed
to interface with the current application object in some way. To solve
this, set up an application context with app.app_context(). See the
documentation for more information.
What am I doing wrong? I would appreciate any advise.
Apologies if this in a way subverting the question, but if you want the users to be sent every 30 seconds, this probably shouldn't be done in the backend. The backend should only ever send out data when a request is made. In order for the data to be sent at regular intervals the frontend needs to be configured to make requests at regular intervals
Personally I'd recommend doing this with a combination of i-frames and javascript, as described in this stack overflow question:
Auto Refresh IFrame HTML
Lastly, when it comes to your actual code, it seems like there is an error here:
if __name__ == "__main__":
application.run()
The "application.run()" line should be indented as it is inside the if statement
I've been stuck on this same issue for short of a week now:
the program should add widgets based on a http request. However, that request may take some time depending on user's internet connection, so I decided to thread that request and add a spinner to indicate that something is being done.
Here lies the issue. Some piece of code:
#mainthread
def add_w(self, parent, widget):
parent.add_widget(widget)
def add_course():
# HTTP Request I mentioned
course = course_manager.get_course(textfield_text)
courses_stack_layout = constructor_screen.ids.added_courses_stack_layout
course_information_widget = CourseInformation(coursename_label=course.name)
self.add_w(courses_stack_layout, course_information_widget)
constructor_screen.ids.spinner.active = False
add_course is being called from a thread, and spinner.active is being set True before calling this function. Here's the result, sometimes: messed up graphical interface
I also tried solving this with clock.schedule_once and clock.schedule_interval with a queue. The results were the same. Sometimes it works, sometimes it doesn't. The spinner does spin while getting the request, which is great.
Quite frankly, I would've never thought that implementing a spinner would be so hard.
How to implement that spinner? Maybe another alternative to threading? Maybe another alternative to urllib to make a request?
edit: any feedback on how I should've posted this so I can get more help? Is is too long? Maybe I could've been more clear?
The problem here was simply that widgets must also be created within the mainthread.
Creating another function marqued with #mainthread and calling that from the threaded one solved the issue.
Thanks for those who contributed.
Let us assume I serve data to colleagues in-office with a small Flask app, and let us also assume that it is a project I am not explicitly 'paid to do' so I don't have all the time in the world to write code.
It has occurred to me in my experimentation with pet projects at home that instead of decorating every last route with #app.route('/some/local/page') that I can do the following:
from flask import Flask, render_template, url_for, redirect, abort
from collections import OrderedDict
goodURLS = OrderedDict([('/index','Home'), ##can be passed to the template
('/about', 'About'), ##to create the navigation bar
('/foo', 'Foo'),
('/bar', 'Bar'), ##hence the use of OrderedDict
('/eggs', 'Eggs'), ##to have a set order for that navibar
('/spam', 'Spam')])
app = Flask(__name__)
#app.route('/<destination>')
def goThere(destination):
availableRoutes = goodURLS.keys():
if "/" + destination in availableRoutes:
return render_template('/%s.html' % destination, goodURLS=goodURLS)
else:
abort(404)
#app.errorhandler(404)
def notFound(e):
return render_template('/notFound.html'), 404
Now all I need to do is update my one list, and both my navigation bar and route handling function are lock-step.
Alternatively, I've written a method to determine the viable file locations by using os.walk in conjunction with file.endswith('.aGivenFileExtension') to locate every file which I mean to make accessible. The user's request can then be compared against the list this function returns (which obviously changes the serveTheUser() function.
from os import path, walk
def fileFinder(directory, extension=".html"):
"""Returns a list of files with a given file extension at a given path.
By default .html files are returned.
"""
foundFilesList = []
if path.exists(directory):
for p, d, files in walk(directory):
for file in files:
if file.endswith(extension):
foundFilesList.append(file)
return foundFilesList
goodRoutes = fileFinder('./templates/someFolderWithGoodRoutes/')
The question is, Is This Bad?
There are many aspects of Flask I'm just not using (mainly because I haven't needed to know about them yet) - so maybe this is actually limiting, or redundant when compared against a built-in feature of Flask. Does my lack of explicitly decorating each route rob me of a great feature of Flask?
Additionally, is either of these methods more or less safe than the other? I really don't know much about web security - and like I said, right now this is all in-office stuff, the security of my data is assured by our IT professional and there are no incoming requests from outside the office - but in a real-world setting, would either of these be detrimental? In particular, if I am using the backend to os.walk a location on the server's local disk, I'm not asking to have it abused by some ne'er-do-well am I?
EDIT: I've offered this as a bounty, because if it is not a safe or constructive practice I'd like to avoid using it for things that I'd want to like push to Heroku or just in general publicly serve for family, etc. It just seems like decorating every viable route with app.route is a waste of time.
There isn't anything really wrong with your solution, in my opinion. The problem is that with this kind of setup the things you can do are pretty limited.
I'm not sure if you simplified your code to show here, but if all you are doing in your view function is to gather some data and then select one of a few templates to render it then you might as well render the whole thing in a single page and maybe use a Javascript tab control to divide it up in sections on the client.
If each template requires different data, then the logic that obtains and processes the data for each template will have to be in your view function, and that is going to look pretty messy because you'll have a long chain of if statements to handle each template. Between that and separate view functions per template I think the latter will be quicker, even more so if you also consider the maintenance effort.
Update: based on the conversion in the comments I stand by my answer, with some minor reservations.
I think your solution works and has no major problems. I don't see a security risk because you are validating the input that comes from the client before you use it.
You are just using Flask to serve files that can be considered static if you ignore the navigation bar at the top. You should consider compiling the Flask app into a set of static files using an extension like Frozen-Flask, then you just host the compiled files with a regular web server. And when you need to add/remove routes you can modify the Flask app and compile it again.
Another thought is that your Flask app structure will not scale well if you need to add server-side logic. Right now you don't have any logic in the server, everything is handled by jQuery in the browser, so having a single view function works just fine. If at some point you need to add server logic for these pages then you will find that this structure isn't convenient.
I hope this helps.
I assume based on your code that all the routes have a corresponding template file of the same name (destination to destination.html) and that the goodURL menu bar is changed manually. An easier method would be to try to render the template at request and return your 404 page if it doesn't exist.
from jinja2 import TemplateNotFound
from werkzeug import secure_filename
....
#app.route('/<destination>')
def goThere(destination):
destTemplate = secure_filename("%s.html" % destination)
try:
return render_template(destTemplate, goodURLS=goodURLS)
except TemplateNotFound:
abort(404)
#app.errorhandler(404)
def notFound(e):
return render_template('/notFound.html'), 404
This is adapted from the answer to Stackoverflow: How do I create a 404 page?.
Edit: Updated to make use of Werkzeug's secure_filename to clean user input.