Tornado thread blocking task - multithreading

Can someone help me solve this problem. Here is my code:
class Handler(RequestHandler):
#asynchronous
def get(self):
res = 'result '
_t = threading.Thread(target=self._thread, args=(res,))
print _t, time.time()
_t.start()
def _thread(self, response):
time.sleep(5)
IOLoop.instance().add_callback(callback=lambda: self.print_response(response))
def print_response(self, _response):
self.write(_response)
self.finish()
application = Application([
(r'/', Handler),
])
if __name__ == '__main__':
application.listen(8889)
IOLoop.instance().start()
On browser, visit localhost:8889 in one tab and localhost:8889 in another: I'll see that “result” is not printed in the second tab until the first one has finished, after 5 seconds.
I think I was created 2 threads parallel processing, and when finished, add_callback result main loop. Tab2 should have results shortly after tab1 finished???
If I copy Handle class to Handle1 class, adding route r'/1', Handle1. Try again, localhost:8889 and localhost:8889/1 ---> It's will ok.
Anyone can explain to me this problem and how to sloved it.
Thank you!

It's not tornado, it's the browser. Browsers don't like making multiple requests for the same thing, so they won't send the second request until the first has finished. If you use two different browsers (or different urls), you'll see that it works.

Related

Proper way to start a Trio server that manages multiple TCP Connections

I recently finished a project using a mix of Django and Twisted and realized it's overkill for what I need which is basically just a way for my servers to communicate via TCP sockets. I turned to Trio and so far I'm liking what I see as it's way more direct (for what I need). That said though, I just wanted to be sure I was doing this the right way.
I followed the tutorial which taught the basics but I need a server that could handle multiple clients at once. To this end, I came up with the following code
import trio
from itertools import count
PORT = 12345
BUFSIZE = 16384
CONNECTION_COUNTER = count()
class ServerProtocol:
def __init__(self, server_stream):
self.ident = next(CONNECTION_COUNTER)
self.stream = server_stream
async def listen(self):
while True:
data = await self.stream.receive_some(BUFSIZE)
if data:
print('{} Received\t {}'.format(self.ident, data))
# Process data here
class Server:
def __init__(self):
self.protocols = []
async def receive_connection(self, server_stream):
sp: ServerProtocol = ServerProtocol(server_stream)
self.protocols.append(sp)
await sp.listen()
async def main():
await trio.serve_tcp(Server().receive_connection, PORT)
trio.run(main)
My issue here seems to be that each ServerProtocol runs listen on every cycle instead of waiting for data to be available to be received.
I get the feeling I'm using Trio wrong in which case, is there a Trio best practices that I'm missing?
Your overall structure looks fine to me. The issue that jumps out at me is:
while True:
data = await self.stream.receive_some(BUFSIZE)
if data:
print('{} Received\t {}'.format(self.ident, data))
# Process data here
The guarantee that receive_some makes is: if the other side has closed the connection already, then it immediately returns an empty byte-string. Otherwise, it waits until there is some data to return, and then returns it as a non-empty byte-string.
So your code should work fine... until the other end closes the connection. Then it starts doing an infinite loop, where it keeps checking for data, getting an empty byte-string back (data = b""), so the if data: ... block doesn't run, and it immediately loops around to do it again.
One way to fix this would be (last 3 lines are new):
while True:
data = await self.stream.receive_some(BUFSIZE)
if data:
print('{} Received\t {}'.format(self.ident, data))
# Process data here
else:
# Other side has gone away
break

Why is this queue.join call blocking indefinitely?

I'm playing about with a personal project in python3.6 and I've run into the following issue which results in the my_queue.join() call blocking indefinitely. Note this isn't my actual code but a minimal example demonstrating the issue.
import threading
import queue
def foo(stop_event, my_queue):
while not stop_event.is_set():
try:
item = my_queue.get(timeout=0.1)
print(item) #Actual logic goes here
except queue.Empty:
pass
print('DONE')
stop_event = threading.Event()
my_queue = queue.Queue()
thread = threading.Thread(target=foo, args=(stop_event, my_queue))
thread.start()
my_queue.put(1)
my_queue.put(2)
my_queue.put(3)
print('ALL PUT')
my_queue.join()
print('ALL PROCESSED')
stop_event.set()
print('ALL COMPLETE')
I get the following output (it's actually been consistent, but I understand that the output order may differ due to threading):
ALL PUT
1
2
3
No matter how long I wait I never see ALL PROCESSED output to the console, so why is my_queue.join() blocking indefinitely when all the items have been processed?
From the docs:
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls
task_done() to indicate that the item was retrieved and all work on it
is complete. When the count of unfinished tasks drops to zero, join()
unblocks.
You're never calling q.task_done() inside your foo function. The foo function should be something like the example:
def worker():
while True:
item = q.get()
if item is None:
break
do_work(item)
q.task_done()

Transfer of data between Python files

Need some help please to explain why the following does not work.
Environment: Python 3.4, Gtk3.0, limited experience of Python
File selectcontact.py contains code to select one of a number of records and pass its key back to its parent process for use in one of at least three other actions.
Code snippet from the parent class:
….
self.cindex = 0
….
def editcontact_clicked (self, menuitem):
import selectcontact
selectcontact.SelectContactGUI(self)
print ('From Manage ', self.cindex)
if self.cindex > 0:
import editcontact
editcontact.EditContactGUI(self.db, self.cindex)
….
Code snippet from selectcontact:
class SelectContactGUI:
def init(self, parent_class):
self.builder = Gtk.Builder()
self.builder.add_from_file(UI_FILE)
self.builder.connect_signals(self)
self.parent_class = parent_class
self.db = parent_class.db
self.cursor = self.db.cursor(cursor_factory = psycopg2.extras.NamedTupleCursor)
self.contact_store = self.builder.get_object('contact_store')
self.window = self.builder.get_object('window1')
self.window.show_all()
def select_contact_path(self, path):
self.builder.get_object('treeview_selection1').select_path(path)
def contact_treerow_changed (self, treeview):
selection = self.builder.get_object('treeview_selection1')
model, path = selection.get_selected()
if path != None:
self.parent_class.cindex = model[path][0]
print ('From select ', self.parent_class.cindex)
self.window.destroy()
….
window1 is declared as “modal”, so I was expecting the call to selectcontact to act as a subroutine, so that editcontact wouldn’t be called until control was passed back to the parent. The parent_class bit works because the contact_store is correctly populated. However the transfer back to the parent appears not to work, and the two print statements occur in the wrong order:
From Manage 0
From select 2
Comments gratefully received.
Graeme
"Modal" refers to windows only. That is, a modal window prevents accessing the parent window.
It has little to do with what code is running. I am not familiar with this particular windowing framework, but any I have worked with has had a separate thread for GUI and at least one for processing, to keep the GUI responsive, and message loops running in all active windows, not just the one currently with the focus. The modal dialog has no control over what code in other threads are executed when.
You should be able to break into the debugger and see what threads are running and what is running in each thread at any given time.

Scrapy / Python - executing several yields

In my parse method, I'd like to call 3 methods from a SpiderClass that I inherit from.
At first, I'd like to parse the XPaths, then clean the data, then assign the data to an item instance and hand it over to the pipeline.
I'll try it with little code and just ask for the principles: cleanData and assignProductValues are never called - why?
def parse(self, response):
for href in response.xpath("//a[#class='product--title']/#href"):
url = href.extract()
yield scrapy.Request(url, callback=super(MyclassSpider, self).scrapeProduct)
yield scrapy.Request(url, callback=super(MyclassSpider, self).cleanData)
yield scrapy.Request(url, callback=super(MyclassSpider, self).assignProductValues)
I understand that I create a generator when using yield but I don't understand why the 2nd and 3rd yield are not being called after the first yield or how I can achieve them being called.
--
Then I tried another way: I don't want to do 3 requests towards the website - just one and work with the data.
def parse(self, response):
for href in response.xpath("//a[#class='product--title']/#href"):
url = href.extract()
item = MyItem()
response = scrapy.Request(url, meta={'item': item}, callback=super(MyclassSpider, self).scrapeProduct)
super(MyclassSpider, self).cleanData(response)
super(MyclassSpider, self).assignProductValues(response)
yield response
What happens here is, scrapeProduct is being called, that might take a while. (I've got a 5 seconds delay).
But then cleanData and assignProductValues are being called right away like 30 times (as often as the for is true/looped through).
How can I execute the three Methods one by one with only 1 request towards the website?
I guess that after you yield the first request, the other two are getting filtered by dupefilter. Check your log. If you don't want it to be filtered, pass dont_filter=True for the Request object.

Python Threading Issue, Is this Right?

I am attempting to make a few thousand dns queries. I have written my script to use python-adns. I have attempted to add threading and queue's to ensure the script runs optimally and efficiently.
However, I can only achieve mediocre results. The responses are choppy/intermittent. They start and stop, and most times pause for 10 to 20 seconds.
tlock = threading.Lock()#printing to screen
def async_dns(i):
s = adns.init()
for i in names:
tlock.acquire()
q.put(s.synchronous(i, adns.rr.NS)[0])
response = q.get()
q.task_done()
if response == 0:
dot_net.append("Y")
print(i + ", is Y")
elif response == 300:
dot_net.append("N")
print(i + ", is N")
tlock.release()
q = queue.Queue()
threads = []
for i in range(100):
t = threading.Thread(target=async_dns, args=(i,))
threads.append(t)
t.start()
print(threads)
I have spent countless hours on this. I would appreciate some input from expedienced pythonista's . Is this a networking issue ? Can this bottleneck/intermittent responses be solved by switching servers ?
Thanks.
Without answers to the questions, I asked in comments above, I'm not sure how well I can diagnose the issue you're seeing, but here are some thoughts:
It looks like each thread is processing all names instead of just a portion of them.
Your Queue seems to be doing nothing at all.
Your lock seems to guarantee that you actually only do one query at a time (defeating the purpose of having multiple threads).
Rather than trying to fix up this code, might I suggest using multiprocessing.pool.ThreadPool instead? Below is a full working example. (You could use adns instead of socket if you want... I just couldn't easily get it installed and so stuck with the built-in socket.)
In my testing, I also sometimes see pauses; my assumption is that I'm getting throttled somewhere.
import itertools
from multiprocessing.pool import ThreadPool
import socket
import string
def is_available(host):
print('Testing {}'.format(host))
try:
socket.gethostbyname(host)
return False
except socket.gaierror:
return True
# Test the first 1000 three-letter .com hosts
hosts = [''.join(tla) + '.com' for tla in itertools.permutations(string.ascii_lowercase, 3)][:1000]
with ThreadPool(100) as p:
results = p.map(is_available, hosts)
for host, available in zip(hosts, results):
print('{} is {}'.format(host, 'available' if available else 'not available'))

Resources