I'd like to write a Telegram bot which can provide Scrapy stats on request.
My try mostly works, the only issue is that forcefully closing the spider (obviously) doesn't stop the bot.
So I have two questions :
is my general approach the correct one?
is it possible to close the bot even on forceful shutdown of the spider?
Here is the relevant class:
class TelegramBot(object):
telegram_token = telegram_credentials.token
#classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def __init__(self, crawler):
self.crawler = crawler
cs = crawler.signals
cs.connect(self._spider_closed, signal=signals.spider_closed)
"""Start the bot."""
# Create the Updater and pass it your bot's token.
# Make sure to set use_context=True to use the new context based callbacks
# Post version 12 this will no longer be necessary
self.updater = Updater(self.telegram_token, use_context=True)
# Get the dispatcher to register handlers
dp = self.updater.dispatcher
# on different commands - answer in Telegram
dp.add_handler(CommandHandler("stats", self.stats))
# Start the Bot
self.updater.start_polling()
def _spider_closed(self, spider, reason):
# Stop the Bot
self.updater.stop()
def stats(self, update, context):
# Send a message with the stats
msg = (
"Spider "
+ self.crawler.spider.name
+ " stats: "
+ str(self.crawler.stats.get_stats())
)
update.message.reply_text(msg)
Here you can find my full code inside the Scrapy tutorial quotes spider https://github.com/jtommi/scrapy_telegram-bot_example/blob/master/tutorial/tutorial/telegram-bot.py
My code is a combination of
latencies extension from the "Learning Scrapy" book https://github.com/scalingexcellence/scrapybook/blob/master/ch09/properties/properties/latencies.py
echobot example from the python-telegram-bot library https://github.com/python-telegram-bot/python-telegram-bot/blob/master/examples/echobot.py
official scrapy documentation on stats collection https://docs.scrapy.org/en/latest/topics/stats.html
Calling the updater.stop() will definitely stop the bot. From the docs of python-telegram-bot,
"""Stops the polling/webhook thread, the dispatcher and the job queue."""
Check if updater.stop() is called after spider closes. Bot might not stop immediately, but eventually will stop.
Related
I am trying to use my telegram bot with Django. I want the code to keep running in the background. I am Using the apps.py to do this but there's one problem when the bot starts as it's an infinite loop, the Django server is never started.
Apps.py:
from django.apps import AppConfig
import os
class BotConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'bot'
def ready(self):
from . import jobs
if os.environ.get('RUN_MAIN', None) != 'true':
jobs.StartBot()
Jobs.py:
def StartBot():
updater = Updater("API KEY")
dp = updater.dispatcher
dp.add_handler(ChatMemberHandler(GetStatus, ChatMemberHandler.CHAT_MEMBER))
updater.start_polling(allowed_updates=Update.ALL_TYPES)
updater.idle()
What's the best way to run my bot in the background? while making sure that the Django server runs normally. I tried Django background tasks but it's not compatible with Django 4.0.
The purpose of Updater.idle is to keep the main thread alive because start_polling only starts some background threads that don't block the main thread. If you want to run other stuff in the main thread, skip updater.idle() and instead call Updater.stop manually when the program should shut down.
Disclaimer: I'm currently the maintainer of python-telegram-bot
The telegram bot I'm making can execute a function that takes a few minutes to process and I'd like to be able to continue to use the bot while it's processing the function.
I'm using aiogram, asyncio and I tried using Python threading to make this possible.
The code I currently have is:
import asyncio
from queue import Queue
from threading import Thread
import time
import logging
from aiogram import Bot, types
from aiogram.types.message import ContentType
from aiogram.contrib.middlewares.logging import LoggingMiddleware
from aiogram.contrib.fsm_storage.memory import MemoryStorage
from aiogram.dispatcher import Dispatcher, FSMContext
from aiogram.utils.executor import start_webhook
from aiogram.types import InputFile
...
loop = asyncio.get_event_loop()
bot = Bot(token=BOT_TOKEN, loop=loop)
dp = Dispatcher(bot, storage=MemoryStorage())
dp.middleware.setup(LoggingMiddleware())
task_queue = Queue()
...
async def send_result(id):
logging.warning("entered send_result function")
image_res = InputFile(path_or_bytesio="images/result/res.jpg")
await bot.send_photo(id, image_res, FINISHED_MESSAGE)
def queue_processing():
while True:
if not task_queue.empty():
task = task_queue.get()
if task["type"] == "nst":
nst.run(task["style"], task["content"])
send_fut = asyncio.run_coroutine_threadsafe(send_result(task['id']), loop)
send_fut.result()
task_queue.task_done()
time.sleep(2)
if __name__ == "__main__":
executor_images = Thread(target=queue_processing, args=())
executor_images.start()
start_webhook(
dispatcher=dp,
webhook_path=WEBHOOK_PATH,
skip_updates=False,
on_startup=on_startup,
host=WEBAPP_HOST,
port=WEBAPP_PORT,
)
So I'm trying to setup a separate thread that's running a loop that is processing a queue of slow tasks thus allowing to continue chatting with the bot in the meantime and which would send the result message (image) to the chat after it's finished with a task.
However, this doesn't work. My friend came up with this solution while doing a similar task about a year ago, and it does work in his bot, but it doesn't seem to work in mine.
Judging by logs, it never even enters the send_result function, because the warning never comes through. The second thread does work properly though and the result image is saved and is located in its assigned path by the time nst.run finishes working.
I tried A LOT of different things and I'm very puzzled why this solution doesn't work for me because it does work with another bot. For example, I tried using asyncio.create_task instead of asyncio.run_coroutine_threadsafe, but to no avail.
To my understanding, you don't need to pass a loop to aiogram's Bot or Dispatcher anymore, but in that case I don't know how to send a task to the main thread from the second one.
Versions I'm using: aiogram 2.18, asyncio 3.4.3, Python 3.9.10.
Solved, the issue was that you can't access the bot's loop directly (with bot.loop or dp.loop) even if you pass your own asyncio loop to the bot or the dispatcher.
So the solution was to access the main thread's loop by using asyncio.get_event_loop() (which returns currently running loop, if there's one) from within one of the message handlers, because the loop is running at this point, and pass it to asyncio.run_coroutine_threadsafe (I used the "task" dictionary for that) like this: asyncio.run_coroutine_threadsafe(send_result(task['id']), task['loop']).
I relatively new in that Python field (I have notions of C++) and I would want to create a bot that auto execute ban as soon as it detects russian bots with cyrillic characters in their nicks entering in the group.
I copy-pasted several parts of different examples and I don't know if I'm doing well. Here is the code:
import re
import logging
from telegram import ReplyKeyboardMarkup, ReplyKeyboardRemove, Update
from telegram.ext import (
Updater,
CommandHandler,
MessageHandler,
Filters,
ConversationHandler,
CallbackContext,
)
updater = Updater("<MY-BOT-TOKEN>")
# Get the dispatcher to register handlers
dispatcher = updater.dispatcher
# on different commands - answer in Telegram
dispatcher.add_handler(CommandHandler("start", start))
dispatcher.add_handler(CommandHandler("help", help_command))
# on noncommand i.e message - echo the message on Telegram
dispatcher.add_handler(MessageHandler(Filters.text & ~Filters.command, echo))
def has_cyrillic(text):
return bool(re.search('[а-яА-Я]', text))
def add_group(update: Update, context: CallbackContext):
for member in update.message.new_chat_members:
if has_cyrillic(member.full_name):
--- SUPPOSEDLY, AUTO BAN WOULD BE HERE ---
update.message.reply_text("{member.full_name} has been banned")
else:
update.message.reply_text("{member.full_name} just joined the group")
add_group_handle = MessageHandler(Filters.status_update.new_chat_members, add_group)
# Start the Bot
updater.start_polling()
# Run the bot until you press Ctrl-C or the process receives SIGINT,
# SIGTERM or SIGABRT. This should be used most of the time, since
# start_polling() is non-blocking and will stop the bot gracefully.
updater.idle()
I'm using the newest version of Python and python-telegram-bot.
Thanks in advance!
I am using CherryPy to speak to an authentication server. The script runs fine if all the inputted information is fine. But if they make an mistake typing their ID the internal HTTP error screen fires ok, but the server keeps running and nothing else in the script will run until the CherryPy engine is closed so I have to manually kill the script. Is there some code I can put in the index along the lines of
if timer >10 and connections == 0:
close cherrypy (< I have a method for this already)
Im mostly a data mangler, so not used to web servers. Googling shows lost of hits for closing CherryPy when there are too many connections but not when there have been no connections for a specified (short) time. I realise the point of a web server is usually to hang around waiting for connections, so this may be an odd case. All the same, any help welcome.
Interesting use case, you can use the CherryPy plugins infrastrcuture to do something like that, take a look at this ActivityMonitor plugin implementation, it shutdowns the server if is not handling anything and haven't seen any request in a specified amount of time (in this case 10 seconds).
Maybe you have to adjust the logic on how to shut it down or do anything else in the _verify method.
If you want to read a bit more about the publish/subscribe architecture take a look at the CherryPy Docs.
import time
import threading
import cherrypy
from cherrypy.process.plugins import Monitor
class ActivityMonitor(Monitor):
def __init__(self, bus, wait_time, monitor_time=None):
"""
bus: cherrypy.engine
wait_time: Seconds since last request that we consider to be active.
monitor_time: Seconds that we'll wait before verifying the activity.
If is not defined, wait half the `wait_time`.
"""
if monitor_time is None:
# if monitor time is not defined, then verify half
# the wait time since the last request
monitor_time = wait_time / 2
super().__init__(
bus, self._verify, monitor_time, self.__class__.__name__
)
# use a lock to make sure the thread that triggers the before_request
# and after_request does not collide with the monitor method (_verify)
self._active_request_lock = threading.Lock()
self._active_requests = 0
self._wait_time = wait_time
self._last_request_ts = time.time()
def _verify(self):
# verify that we don't have any active requests and
# shutdown the server in case we haven't seen any activity
# since self._last_request_ts + self._wait_time
with self._active_request_lock:
if (not self._active_requests and
self._last_request_ts + self._wait_time < time.time()):
self.bus.exit() # shutdown the engine
def before_request(self):
with self._active_request_lock:
self._active_requests += 1
def after_request(self):
with self._active_request_lock:
self._active_requests -= 1
# update the last time a request was served
self._last_request_ts = time.time()
class Root:
#cherrypy.expose
def index(self):
return "Hello user: current time {:.0f}".format(time.time())
def main():
# here is how to use the plugin:
ActivityMonitor(cherrypy.engine, wait_time=10, monitor_time=5).subscribe()
cherrypy.quickstart(Root())
if __name__ == '__main__':
main()
I'm coding my first telegram bot, but now I have to serve multiple user at the same time.
This code it's just a little part, but it should help me to use multithread with selenium
class MessageCounter(telepot.helper.ChatHandler):
def __init__(self, *args, **kwargs):
super(MessageCounter, self).__init__(*args, **kwargs)
def on_chat_message(self, msg):
content_type, chat_type, chat_id = telepot.glance(msg)
chat_id = str(chat_id)
browser = browserSelenium.start_browser(chat_id)
userIsLogged = igLogin.checkAlreadyLoggedIn(browser, chat_id)
print(userIsLogged)
TOKEN = "***"
bot = telepot.DelegatorBot(TOKEN, [
pave_event_space()(
per_chat_id(), create_open, MessageCounter, timeout=10),
])
MessageLoop(bot).run_as_thread()
while 1:
time.sleep(10)
when the bot recive any message it starts a selenium session calling this function:
def start_browser(chat_id):
global browser
try:
browser.get('https://www.google.com')
#igLogin.checkAlreadyLoggedIn(browser)
#links = telegram.getLinks(24)
#instagramLikes(browser, links)
except Exception as e:
print("type error: " + str(e))
print('No such session! starting webDivers!')
sleep(3)
# CLIENT CONNECTION !!
chrome_options = Options()
chrome_options.add_argument('user-data-dir=/home/ale/botTelegram/users/'+ chat_id +'/cookies')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--lang=en')
print("Starting WebDrivers")
browser = webdriver.Chrome(options=chrome_options)
start_browser(chat_id)
return browser
and then this one check if the user is logged:
def checkAlreadyLoggedIn(browser, chat_id):
browser.get('https://www.instagram.com/instagram/')
try:
WebDriverWait(browser, 5).until(EC.element_to_be_clickable(
(By.XPATH, instagramClicks.buttonGoToProfile))).click()
print('User already Logged')
return True
except:
print('User not Logged')
userLogged = login(browser, chat_id)
return userLogged
and if the user is not logged it try to log the user in whit username and password
so, basically, when I write at the bot with one account everithing works fine, but if I write to the bot from two different account it opens two browser, but it controll just one.
What I mean it's that for example, one window remain over the google page, and then the other one recive two times the comand, so, even when it has to write the username, it writes the username two times
How can I interract with multiple sessions?
WebDriver is not thread-safe. Having said that, if you can serialise access to the underlying driver instance, you can share a reference in more than one thread. This is not advisable. But you can always instantiate one WebDriver instance for each thread.
Ideally the issue of thread-safety isn't in your code but in the actual browser bindings. They all assume there will only be one command at a time (e.g. like a real user). But on the other hand you can always instantiate one WebDriver instance for each thread which will launch multiple browsing tabs/windows. Till this point it seems your program is perfect.
Now, different threads can be run on same Webdriver, but then the results of the tests would not be what you expect. The reason behind is, when you use multi-threading to run different tests on different tabs/windows a little bit of thread safety coding is required or else the actions you will perform like click() or send_keys() will go to the opened tab/window that is currently having the focus regardless of the thread you expect to be running. Which essentially means all the test will run simultaneously on the same tab/window that has focus but not on the intended tab/window.
Reference
You can find a relevant detailed discussion in:
Chrome crashes after several hours while multiprocessing using Selenium through Python