Queue and thread from file customize working threads - python-3.x

I am planing to write a python script that reads urls from a file and checks the status code from these urls using requests. To speed up the process my intention is to use multiple threads at the same time.
import threading
import queue
q = queue.Queue()
def CheckUrl():
while True:
project = q.get()
#Do the URL checking here
q.task_done()
threading.Thread(target=CheckUrl, daemon=True).start()
file = open("TextFile.txt", "r")
while True:
next_line = file.readline()
q.put(next_line)
if not next_line:
break;
file.close()
print('project requests sent\n', end='')
q.join()
print('projects completed')
My problem. Now the code is reading all the text at once making as many threads as there are lines in the text file if I understand correctly. I i would like to do something like read 20 lines at the same time, check status code from the 20 urls, if one or more checks are done go to the next.
is there something like
threading.Thread(target=CheckUrl, daemon=True, THREADSATSAMETIME=20).start()

Seems i have to stick with this one
def threads_run():
for i in range(20): #create 20 threads
(i) = threading.Thread(target=CheckUrl, daemon=True).start()
threads_run()

Related

Using contextlib.redirect_stdout in an async function redirects output of other tasks

I want to redirect the output of a few lines in my code that I don't have control over, but the outputs are not relevant. I've been able to use contextlib.redirect_stdout(io.StringIO()) in a synchronous function to successfully redirect the lines I want, but I can't do it with an async function
This is what I have so far
import asyncio
import contextlib
import sys
async def long_function(val: int, semaphore: asyncio.Semaphore, file_out, old_stdout=sys.stdout):
# Only let two tasks start at a time
await semaphore.acquire()
print(f"{val}: Starting")
# Redirect stdout of ONLY the lines within this context manager
with contextlib.redirect_stdout(file_out):
await asyncio.sleep(3) # long-running task that prints output I can't control, but is not useful to me
print(f"{val}: Finished redirect")
contextlib.redirect_stdout(old_stdout)
print(f"{val}: Done")
semaphore.release()
async def main():
# I want to limit the number of concurrent tasks to 2
semaphore: asyncio.Semaphore = asyncio.Semaphore(2)
# Create a list of tasks to perform
file_out = open("file.txt", "w")
tasks = []
for i in range(0, 9):
tasks.append(long_function(i, semaphore, file_out))
# Gather/run the tasks
await asyncio.gather(*tasks)
if __name__ == '__main__':
asyncio.run(main())
When running this, however, the output of other tasks is also placed into the "file.txt" file. I only want the "Finished redirect" to be placed into the file
I see the following in the Python docs
Note that the global side effect on sys.stdout means that this context manager is not suitable for use in library code and most threaded applications. It also has no effect on the output of subprocesses. However, it is still a useful approach for many utility scripts.
Is there any other way to go about this, or do I just have to live with the output as-is?
Thanks for any help!

using multiprocessing in windows with multithreading python

I have a script which works like that, a list of element and a function,:
def fct(elm):
do work
after that I start the threads (3) wherein the end of every thread I print the name of the element like this:
jobs = Queue()
def do_stuff(q):
while not q.empty():
value = q.get()
fct(item=value)
q.task_done()
for i in lines:
jobs.put(i)
for i in range(3):
worker = threading.Thread(target=do_stuff, args=(jobs,))
worker.start()
jobs.join()
what I wanna do is whenever a three is done (a file is saved) starting another process which has to read the file and apply another fct2
note: I'm using windows

How to return data from a separate process (like queue) without closing it so you can send it more data later

The reason I am trying to do this is because Windows is terribly slow at opening and closing processes all the time. Its terribly inefficient which is the anti-purpose of multiprocessing. If one could start say 10 processes that do a specific operation on some data that can return the data (with queue or something) without closing it so you can send it more data to operate on and return. Basically a hive of processes holding functions that are always open ready to process and return data without death on the return so you don't have to keep opening new ones. I guess I could just run Linux Debian Crunchbang but I want to make it Windows efficient as well.
import multiprocessing
def operator_dies(data, q):
for i in range(len(data)):
if data % i == 0:
return q.put(0) # Function and process die here
return q.put(1) # And here
def operator_lives(data, q):
for i in range(len(data)):
if data % i == 0:
q.put(0) # I want it to stay open, but send data back with q.put()
q.put(1) # Same
def initialize(data):
q = multiprocessing.Queue()
p = [multiprocessing.Process(target=operator_dies, args=data, q]
p.daemon = True
p.start()
# Multiprocessing.Process will open more processes.
# Is there another multiprocessing function that can send new data as
# arguments in the function of an already open process ID and reset its loop?
# I cannot simply do: q.put().processid from here to refresh
# a variable for arguments in an open process can I?
if __name__ == '__main__':
initialize(data)

Memory efficient massive http requests

I need to do an unlimited HTTP requests from a web API one after another and make it work efficiently and quite fast. (I need it for a utility so it should work no matter how many time im using it, also it should be able to be used on a web server(people use at the same time))
right now I'm using a threading with a queue but after a while of doing it I'm getting errors like:
'cant start a new thread'
'MemoryError'
or it may work a bit, but pretty slow.
this is a part of my code:
concurrent = 25
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=receiveJson)
t.daemon = True
t.start()
for url in get_urls():
q.put(url.strip())
q.join()
*get_urls() is a simple function that returns a list of urls(unknown length)
this is my recieveJson(thread target):
def receiveJson():
while True:
url = q.get()
res = request.get(url).json()
q.task_done()
The problem is coming from your Threads never ending, notice that there is no exit condition in your receiveJson function. The simplest way to signal it should end is usually by enqueuing None:
def receiveJson():
while True:
url = q.get()
if url is None: # Exit condition allows thread to complete
q.task_done()
break
res = request.get(url).json()
q.task_done()
and then you can change the other code as follows:
concurrent = 25
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=receiveJson)
t.daemon = True
t.start()
for url in get_urls():
q.put(url.strip())
for i in range(concurrent):
q.put(None) # Add a None for each thread to be able to get and complete
q.join()
There are other ways of doing this, but this is the how to do it with the least amount of change to your code. If this is happening often, it might be worth looking into the concurrent.futures.ThreadPoolExecutor class to avoid the cost of opening threads very often.

Keep GIF animation running while doing calculations

I am trying to improve the user experience by showing a load mask above the active QMainWindow/QDialog when performing tasks that takes some time. I have managed to get it working as I want it, except for a moving GIF when performing the task. If I leave the load mask on after the task is complete, the GIF starts moving as it should.
My class for the load mask:
from PyQt4 import QtGui, QtCore
from dlgLoading_view import Ui_dlgLoading
class dlgLoading(QtGui.QDialog, Ui_dlgLoading):
def __init__(self,parent):
QtGui.QDialog.__init__(self,parent)
self.setupUi(self)
self.setWindowFlags(QtCore.Qt.WindowFlags(QtCore.Qt.FramelessWindowHint))
self.setGeometry(0, 0, parent.frameGeometry().width(), parent.frameGeometry().height())
self.setStyleSheet("background-color: rgba(255, 255, 255, 100);")
movie = QtGui.QMovie("loader.gif")
self.lblLoader.setMovie(movie)
movie.start()
def showEvent(self, event):
QtGui.qApp.processEvents()
super(dlgLoading, self).showEvent(event)
def setMessage(self,message):
self.lblMessage.setText(message)
The Ui_dlgLoading contains two labels and some vertical spacers: lblLoader (will contain the gif) and lblMessage (will contain a message if needed)
I create the load mask with this code:
loadmask = dlgLoading(self)
loadmask.setMessage('Reading data... Please wait')
loadmask.show()
I figured I needed some multithreading/multiprocessing, but I can't for the life of me figure out how to do it. I read somewhere that you can't tamper with the GUIs threading, so I would need to move the heavy task there instead, but I'm still blank.
As a simple example, let's say I am trying to load a huge file into memory:
file = open(dataFilename, 'r')
self.dataRaw = file.read()
file.close()
Around that I would create and close my load mask dialog. How do I start the file read without halting the GIF animation?
The GUI is for running some heavy external exe files, so it should work with that too.
I ended up doing this:
class runthread(threading.Thread):
def __init__(self, commandline, cwd):
self.stdout = None
self.stderr = None
self.commandline = commandline
self.cwd = cwd
self.finished = False
threading.Thread.__init__(self)
def run(self):
subprocess.call(self.commandline, cwd=self.cwd)
self.finished = True
class command()
def __init__(self):
...
def run():
...
thread = runthread("\"%s\" \"%s\"" % (os.path.join(self.__caller.exefolder, "%s.exe" % self.__cmdtype), self.__name), self.__caller.exeWorkdir)
thread.start()
count = 0
sleeptime = 0.5
maxcount = 60.0/sleeptime
while True:
time.sleep(sleeptime)
QtWidgets.qApp.processEvents()
count += 1
if thread.finished:
break
if count >= maxcount:
results = QtWidgets.QMessageBox.question(self.__caller, "Continue?", "The process is taking longer than expected. Do you want to continue?", QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No)
if results == QtWidgets.QMessageBox.Yes:
count == 0
else:
QtWidgets.QMessageBox.warning(self.__caller, "Process stopped", "The process was stopped")
return False
It actually doesn't directly answer my question, but it worked for me, so I'm posting the answer if others want to do something similar.
I call a process (in this case Pythons subprocess.call) through a thread and track when the process is actually finished. A continuous loop checks periodically if the process is done and updates the GUI (processEvents - this is what triggers the GIF to update). To avoid an infinite loop I offer the user an option to exit after some time.

Resources