Why does calling datetime hang a thread? - multithreading

I am attempting to make use of concurrent.futures.ThreadPoolExecutor for the first time. One of my threads (level_monitor) consistently hangs on a call to datetime.now.strftime()—and on another hardware-specific function. For now I am assuming it is the same fundamental problem in both cases.
I've created a reproducible minimum example.
from concurrent.futures import ThreadPoolExecutor
import socket
from time import sleep
status = 'TRY AGAIN\n'
def get_level():
print('starting get_level()')
while True:
sleep(2)
now = datetime.now().strftime('%d-%b-%Y %H:%M:%S')
print('get_level woken...')
# report status when requested
def serve_level():
print('starting serve_level()')
si = socket.socket()
port = 12345
si.bind(('127.0.0.1',port))
si.listen()
print('socket is listening')
while True:
ci, addr = si.accept()
print('accepted client connection from ',addr)
with ci:
req = ci.recv(1024)
print( req )
str = status.encode('utf-8')
ci.send(str)
ci.close()
if __name__ == '__main__':
nthreads = 5
with ThreadPoolExecutor(nthreads) as executor:
level_monitor = executor.submit(get_level)
server = executor.submit(serve_level)
When I run it I see the serve_level thread works fine. I can talk to that thread using telnet. I can see the level_monitor thread starts too, but then it hangs before print('get_level woken...'). If I comment out the call to datetime then the thread behaves as expected.
I am sure that when I find out why I will have found out a lot.

Related

Can I stop waiting for threads to finish if one of them produced results?

Im making a bunch of GET requests to about a few hundred different API endpoints on different servers. In one of these endpoints there is some information that i want to fetch and return.
After any of these requests return something to me, i want to terminate the other threads and exit. Some requests are almost instant, some can take up to 20 seconds to finish.
If i happen to find the info in 2 seconds, i don't want for 20 seconds to elapse before i can resume work.
Currently I'm doing things like this:
threads = list()
for s in silos: #here i create all the requests
t = Thread(target=process_request, args=(my, args, here))
t.name = "{} - {}".format(some, name)
threads.append(t)
Then I do:
print("Threads: {}".format(len(threads))) # 100 - 250 of them
[ t.start() for t in threads ]
[ t.join() for t in threads ]
process_request() simply makes the get request and stores the result inside a dict if the status_code == 200.
I'm using the requests and threading modules.
If you use the multiprocess pool, then you can terminate the pool as soon as the first response arrives:
import multiprocessing as mp
import time
pool = None
def make_get_request(inputs):
print('Making get request with inputs ' + str(inputs))
time.sleep(2)
return 'dummy response for inputs ' + str(inputs)
def log_response(response):
print("Got response = " + response)
pool.terminate()
def main():
global pool
pool = mp.Pool()
for i in range(10):
pool.apply_async(make_get_request, args = (i,), callback = log_response)
pool.close()
pool.join()
if __name__ == '__main__':
main()

socket and multiprocessing blocking

I am trying to write a program that utilizes sockets to talk to others in the cluster. the problem I am having is that I can not seam to implment a server without blocking on accept() even though I have tried to utilize threading and multiprocessing.
server.py
import traceback
import sys, base64
from threading import Thread
import time
def server(host="", port=65098):
host= host
port= port
soc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
soc.bind((host, port))
soc.listen(5)
clients = []
time.sleep(1)
while 1:
try:
clientsocket, clentaddr = soc.accept()
clients.append(Thread(target=handler, args=(clientsocket, clientaddr)))
except KeyboardInterrupt:
soc.close()
for cl in clients:
if not cl.is_active:
cl.join()
clients.remove(cl)
def handler(clientsocket, clientaddr):
while True:
data = clientsocket.recv(1024).decode('utf8')
if data.startswith('---'):
if data[0:9] == '---test---':
ping(clientsocket, data[10:])
break
clientsocket.close()
def ping(clientsocket, data):
name = base64.decode(data)
clientsocket.send(base64.encode(b'hi' + name))
then I move on to a test.py where I try to work on the noneblocking server with a test.
test.py
from lib import server
import pdb
thread1 = multiprocessing.Process(target=server.server())
pdb.set_trace()
thread1.daemon = True
thread1.start()
# pdb.set_trace()
import socket
soc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
pdb.set_trace()
soc.connect('localhost', 65098)
message = b'This is our message. It is very long but will only be transmitted in chunks of 16 at a time'
sock.sendall(message)
when I run test.py The program blocks. when I ctl+c it seams that the program is not advancing past line 5(test.py) and is blocking even though I have yet to call thread1.start as you can see I have a pdb trace. this is not being executed. I didn't think it should matter with threading but server.py in the stack seams to be on line 18 where it calls soc.accept().
This is strange behavior to me. any ideas?
For some reason it being inside a module is not allowing threading to run it without the second thread blocking the first thread. I was able to get around this by creating a function in tests.py called tcpserv all it dose is call server.server() instead of calling it directly as the multiprocessing target. test.py now looks like this and dumps to the debugger like I expect.
import multiprocessing
from lib import server
import pdb
def tcpserv():
server.server()
thread1 = multiprocessing.Process(target=tcpserv)
thread1.start()
pdb.set_trace()

Handling a lot of concurrent connections in Python 3 asyncio

Iam trying to improve the performance of my application. It is a Python3.6 asyncio.Protocol based TCP server (SSL wrapped) handling a lot of requests.
It works fine and the performance is acceptable when only one connection is active, but as soon as another connection is opened, the client part of the application slows down. This is really noticeable once there are 10-15 client connection.
Is there a way to properly handle requests in parallel or should I resort to running multiple server instances?
/edit Added code
main.py
if __name__ == '__main__':
import package.server
server = package.server.TCPServer()
server.join()
package.server
import multiprocessing, asyncio, uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
from package.connection import Connection
class TCPServer(multiprocessing.Process):
name = 'tcpserver'
def __init__(self, discord_queue=None):
multiprocessing.Process.__init__(self)
self.daemon = True
# some setup in here
self.start()
def run(self):
loop = uvloop.new_event_loop()
self.loop = loop
# db setup, etc
server = loop.create_server(Connection, HOST, PORT, ssl=SSL_CONTEXT)
loop.run_until_complete(server)
loop.run_forever()
package.connection
import asyncio, hashlib, os
from time import sleep, time as timestamp
class Connection(asyncio.Protocol):
connections = {}
def setup(self, peer):
self.peer = peer
self.ip, self.port = self.peer[0], self.peer[1]
self.buffer = []
#property
def connection_id(self):
if not hasattr(self, '_connection_id'):
self._connection_id = hashlib.md5('{}{}{}'.format(self.ip, self.port, timestamp()).encode('utf-8')).hexdigest()
return self._connection_id
def connection_lost(self, exception):
del Connection.connections[self.connection_id]
def connection_made(self, transport):
self.transport = transport
self.setup(transport.get_extra_info('peername'))
Connection.connections[self.connection_id] = self
def data_received(self, data):
# processing, average server side execution time is around 30ms
sleep(0.030)
self.transport.write(os.urandom(64))
The application runs on Debian 9.9 and is started via systemd
To "benchmark" I use this script:
import os, socket
from multiprocessing import Pool
from time import time as timestamp
def foobar(i):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.1', 60000))
while True:
ms = timestamp()*1000
s.send(os.urandom(128))
s.recv(1024*2)
print(i, timestamp()*1000-ms)
if __name__ == '__main__':
instances = 4
with Pool(instances) as p:
print(p.map(foobar, range(0, instances)))
To answer my own question here. I went with a solution that spawned multiple instances which were listening on base_port + x and I put a nginx TCP loadbalancer in front of it.
The individual TCPServer instances are still spawned as own process and communicate among themselves via a separate UDP connection and with the main process via multiprocessing.Queue.
While this does not "fix" the problem, it provides a somewhat scalable solution for my very specific problem.

aiohttp slowness with threading

I copied the code from How to run an aiohttp server in a thread?. It runs fine. So I am adding one second sleep. When I launch 10 requests at the same time. The average response time is 9 seconds. Why is that? Wouldn't all requests coming back in a little bit over 1 second?
import asyncio
import threading
from aiohttp import web
import time
loop = asyncio.get_event_loop()
def say_hello(request):
time.sleep(1)
return web.Response(text='Hello, world')
app = web.Application(debug=True)
app.add_routes([web.get('/', say_hello)])
handler = app.make_handler()
server = loop.create_server(handler, host='127.0.0.1', port=8080)
def aiohttp_server():
loop.run_until_complete(server)
loop.run_forever()
t = threading.Thread(target=aiohttp_server)
t.start()
Use asyncio.sleep instead. Your setup is running coros that hard sleep 1 second before they yield to the event loop. So if you gather a bunch of them you have to wait that 1 second for each one serially.
You are starting the server in a second thread, but all of the requests are served from the same thread. The call to time.sleep blocks this thread and does not yield to the event loop so that the requests are effectively processed serially.
If you genuinely want to use sleep for a delay in the response you could use asyncio.sleep instead, which yields to the event loop.
However I expect you are using it as a placeholder for another blocking function. In this case you need to run this in another thread to the main server. The example below shows how to do this using run_in_executor and asyncio.wait.
import asyncio
from aiohttp import web
from concurrent.futures import ThreadPoolExecutor
import time
def blocking_func(seconds: int) -> int:
time.sleep(seconds)
return seconds
async def view_page(request: web.Request):
seconds = int(request.query.get("seconds", 5))
executor = request.app["executor"]
loop = asyncio.get_event_loop()
task = loop.run_in_executor(executor, blocking_func, seconds)
completed, pending = await asyncio.wait([task])
result = task.result()
return web.Response(text=f"Waited {result} second(s).")
def create_app():
app = web.Application()
app.add_routes([web.get("/", view_page)])
executor = ThreadPoolExecutor(max_workers=3)
app["executor"] = executor
return app
if __name__ == "__main__":
app = create_app()
web.run_app(app)

Python Tweepy streaming with multitasking

in Python 2.7 I am successful in using the following code to listen to a direct message stream on an account:
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy import API
from tweepy.streaming import StreamListener
# These values are appropriately filled in the code
consumer_key = '######'
consumer_secret = '######'
access_token = '######'
access_token_secret = '######'
class StdOutListener( StreamListener ):
def __init__( self ):
self.tweetCount = 0
def on_connect( self ):
print("Connection established!!")
def on_disconnect( self, notice ):
print("Connection lost!! : ", notice)
def on_data( self, status ):
print("Entered on_data()")
print(status, flush = True)
return True
# I can add code here to execute when a message is received, such as slicing the message and activating something else
def on_direct_message( self, status ):
print("Entered on_direct_message()")
try:
print(status, flush = True)
return True
except BaseException as e:
print("Failed on_direct_message()", str(e))
def on_error( self, status ):
print(status)
def main():
try:
auth = OAuthHandler(consumer_key, consumer_secret)
auth.secure = True
auth.set_access_token(access_token, access_token_secret)
api = API(auth)
# If the authentication was successful, you should
# see the name of the account print out
print(api.me().name)
stream = Stream(auth, StdOutListener())
stream.userstream()
except BaseException as e:
print("Error in main()", e)
if __name__ == '__main__':
main()
This is great, and I can also execute code when I receive a message, but the jobs I'm adding to a work queue need to be able to stop after a certain amount of time. I'm using a popular start = time.time() and subtracting current time to determine elapsed time, but this streaming code does not loop to check the time. I just waits for a new message, so the clock is never checked so to speak.
My question is this: How can I get streaming to occur and still track time elapsed? Do I need to use multithreading as described in this article? http://www.tutorialspoint.com/python/python_multithreading.htm
I am new to Python and having fun playing around with hardware attached to a Raspberry Pi. I have learned so much from Stackoverflow, thank you all :)
I'm not sure exactly how you want to decide when to stop, but you can pass a timeout argument to the stream to give up after a certain delay.
stream = Stream(auth, StdOutListener(), timeout=30)
That will call your listener's on_timeout() method. If you return true, it will continue streaming. Otherwise, it will stop.
Between the stream's timeout argument and your listener's on_timeout(), you should be able to decide when to stop streaming.
I found I was able to get some multithreading code the way I wanted to. Unlike this tutorial from Tutorialspoint which gives an example of launching multiple instances of the same code with varying timing parameters, I was able to get two different blocks of code to run in their own instances
One block of code constantly adds 10 to a global variable (var).
Another block checks when 5 seconds elapses then prints var's value.
This demonstrates 2 different tasks executing and sharing data using Python multithreading.
See code below
import threading
import time
exitFlag = 0
var = 10
class myThread1 (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
#var counting block begins here
print "addemup starting"
global var
while (var < 100000):
if var > 90000:
var = 0
var = var + 10
class myThread2 (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
#time checking block begins here and prints var every 5 secs
print "checkem starting"
global var
start = time.time()
elapsed = time.time() - start
while (elapsed < 10):
elapsed = time.time() - start
if elapsed > 5:
print "var = ", var
start = time.time()
elapsed = time.time() - start
# Create new threads
thread1 = myThread1(1, "Thread-1", 1)
thread2 = myThread2(2, "Thread-2", 2)
# Start new Threads
thread1.start()
thread2.start()
print "Exiting Main Thread"
My next task will be breaking up my twitter streaming in to its own thread, and passing direct messages received as variables to a task queueing program, while hopefully the first thread continues to listen for more direct messages.

Resources