Nim counting over multiple threads - nim-lang

I'm currently trying to count how many times I requested a website. In python I would just use a global variable but I have no idea how I would write this in nim.
import httpclient
proc threadMain(a: int) {.thread.} =
var client = newHttpClient()
while true:
try:
var r = client.getContent("URL")
echo "sent"
#Count here
except:
echo "error"
var thread: array[0..10, Thread[int]]
for i in 0..10:
thread[i].createThread(threadMain, i)
thread.joinThreads()

This is explained almost as a copy in the "Nim in Action" book, page 174.
First of all, if you used a global in Python, you had to use a lock or risk a race condition. The thing is not different in Nim: first create a global, and guard it using a lock.
import locks
var counterLock: Lock
initLock(counterLock)
var counter {.guard: counterLock.} = 0
Now use a withLock where you need to update the counter:
withLock counterLock:
counter.inc
The chapter of the book related to paralellism/concurrency is very good. You should check it, as it also explains concurrency (your code is an example of concurrency being better than threading) or how to use Channels to pass data between threads, for example.

Related

How to check if a similar scheduled job exists in python-rq?

Below is the function called for scheduling a job on server start.
But somehow the scheduled job is getting called again and again, and this is causing too many calls to that respective function.
Either this is happening because of multiple function calls or something else? Suggestions please.
def redis_schedule():
with current_app.app_context():
redis_url = current_app.config["REDIS_URL"]
with Connection(redis.from_url(redis_url)):
q = Queue("notification")
from ..tasks.notification import send_notifs
task = q.enqueue_in(timedelta(minutes=5), send_notifs)
Refer - https://python-rq.org/docs/job_registries/
Needed to read scheduled_job_registry and retrieve jobids.
Currently below logic works for me as I only have a single scheduled_job.
But in case of multiple jobs, I will need to loop these jobids to find the right job exists or not.
def redis_schedule():
with current_app.app_context():
redis_url = current_app.config["REDIS_URL"]
with Connection(redis.from_url(redis_url)):
q = Queue("notification")
if len(q.scheduled_job_registry.get_job_ids()) == 0:
from ..tasks.notification import send_notifs
task = q.enqueue_in(timedelta(seconds=30), send_notifs)

locks needed for multithreaded python scraping?

I have a list of zipcodes that I want to pull business listings for using the yelp fusion api. Each zipcode will have to make at least one api call ( often much more) and so, I want to be able to keep track of my api usage as the daily limit is 25000. I have defined each zipcode as an instance of user defined Locale class. This locale class has a class variable Locale.pulls, which acts as a global counter for the number of pulls.
I want to multithread this using the multiprocessing module but I am not sure if I need to use locks and if so, how would I do so? The concern is race conditions as I need to be sure each thread sees the current number of pulls defined as the Zip.pulls class variable in the pseudo code below.
import multiprocessing.dummy as mt
class Locale():
pulls = 0
MAX_PULLS = 20000
def __init__(self,x,y):
#initialize the instance with arguments needed to complete the API call
def pull(self):
if Locale.pulls > MAX_PULLS:
return none
else:
# make the request, store the returned data and increment the counter
self.data = self.call_yelp()
Locale.pulls += 1
def main():
#zipcodes below is a list of arguments needed to initialize each zipcode as a Locale class object
pool = mt.Pool(len(zipcodes)/100) # let each thread work on 100 zipcodes
data = pool.map(Locale, zipcodes)
A simple solution would be to check that len(zipcodes) < MAP_PULLS before running the map().

Python 3.5 asyncio execute coroutine on event loop from synchronous code in different thread

I am hoping someone can help me here.
I have an object that has the ability to have attributes that return coroutine objects. This works beautifully, however I have a situation where I need to get the results of the coroutine object from synchronous code in a separate thread, while the event loop is currently running. The code I came up with is:
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
"""
Get an attribute synchronously and safely.
Note:
This does nothing special if an attribute is synchronous. It only
really has a use for asynchronous attributes. It processes
asynchronous attributes synchronously, blocking everything until
the attribute is processed. This helps when running SQL code that
cannot run asynchronously in coroutines.
Args:
key (str): The Config object's attribute name, as a string.
default (Any): The value to use if the Config object does not have
the given attribute. Defaults to None.
Returns:
Any: The vale of the Config object's attribute, or the default
value if the Config object does not have the given attribute.
"""
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
return ret
What I am looking for is a safe way to synchronously get the results of a coroutine object in a multithreaded environment. self.get() can return a coroutine object, for attributes I have set to provide them. The issues I have found are: If the event loop is running or not. After searching for a few hours on stack overflow and a few other sites, my (broken) solution is above. If the loop is running, I make a new event loop and run my coroutine in the new event loop. This works, except that the code hangs forever on the ret = loop2.run_until_complete(ret) line.
Right now, I have the following scenarios with results:
results of self.get() is not a coroutine
Returns results. [Good]
results of self.get() is a coroutine & event loop is not running (basically in same thread as the event loop)
Returns results. [Good]
results of self.get() is a coroutine & event loop is running (basically in a different thread than the event loop)
Hangs forever waiting for results. [Bad]
Does anyone know how I can go about fixing the bad result so I can get the value I need? Thanks.
I hope I made some sense here.
I do have a good, and valid reason to be using threads; specifically I am using SQLAlchemy which is not async and I punt the SQLAlchemy code to a ThreadPoolExecutor to handle it safely. However, I need to be able to query these asynchronous attributes from within these threads for the SQLAlchemy code to get certain configuration values safely. And no, I won't switch away from SQLAlchemy to another system just in order to accomplish what I need, so please do not offer alternatives to it. The project is too far along to switch something so fundamental to it.
I tried using asyncio.run_coroutine_threadsafe() and loop.call_soon_threadsafe() and both failed. So far, this has gotten the farthest on making it work, I feel like I am just missing something obvious.
When I get a chance, I will write some code that provides an example of the problem.
Ok, I implemented an example case, and it worked the way I would expect. So it is likely my problem is elsewhere in the code. Leaving this open and will change the question to fit my real problem if I need.
Does anyone have any possible ideas as to why a concurrent.futures.Future from asyncio.run_coroutine_threadsafe() would hang forever rather than return a result?
My example code that does not duplicate my error, unfortunately, is below:
import asyncio
import typing
loop = asyncio.get_event_loop()
class ConfigSimpleAttr:
__slots__ = ('value', '_is_async')
def __init__(
self,
value: typing.Any,
is_async: bool=False
):
self.value = value
self._is_async = is_async
async def _get_async(self):
return self.value
def __get__(self, inst, cls):
if self._is_async and loop.is_running():
return self._get_async()
else:
return self.value
class BaseConfig:
__slots__ = ()
attr1 = ConfigSimpleAttr(10, True)
attr2 = ConfigSimpleAttr(20, True)
def get(self, key: str, default: typing.Any=None) -> typing.Any:
return getattr(self, key, default)
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
fut = asyncio.run_coroutine_threadsafe(ret, loop)
print(fut, fut.running())
ret = fut.result()
else:
ret = loop.run_until_complete(ret)
return ret
config = BaseConfig()
def example_func():
return config.get_sync('attr1')
async def main():
a1 = await loop.run_in_executor(None, example_func)
a2 = await config.attr2
val = a1 + a2
print('{a1} + {a2} = {val}'.format(a1=a1, a2=a2, val=val))
return val
loop.run_until_complete(main())
This is the stripped down version of exactly what my code is doing, and the example works, even if my actual application doesn't. I am stuck as far as where to look for answers. Suggestions are welcome as to where to try to track down my "stuck forever" problem, even if my code above doesn't actually duplicate the problem.
It is very unlikely that you need to run several event loops at the same time, so this part looks quite wrong:
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
Even testing whether the loop is running or not doesn't seem to be the right approach. It's probably better to give explicitly the (only) running loop to get_sync and schedule the coroutine using run_coroutine_threadsafe:
def get_sync(self, key, loop):
ret = self.get(key, default)
if not asyncio.iscoroutine(ret):
return ret
future = asyncio.run_coroutine_threadsafe(ret, loop)
return future.result()
EDIT: Hanging problems can be related to tasks being scheduled in the wrong loop (e.g. forgetting about the optional loop argument when calling a coroutine). This kind of problem should be easier to debug with the PR 303 (now merged): a RuntimeError is raised instead when the loop and the future don't match. So you might want to run your tests with the latest version of asyncio.
Ok, I got my code working, by taking a different approach to it. The problem was tied with using something that had file IO, which I was converting into a coroutine using loop.run_in_executor() on the file IO components. Then, I was trying to use this in a sync function being called from another thread, processed using another loop.run_in_executor() on that function. This is a very important routine in my code (called probably a million times or more during the execution of my short-running code), and I made a decision that my logic was just getting too complicated. So... I uncomplicated it. Now, if I want to use the file IO components asynchronously, I explicitly use my "get_async()" method, otherwise, I use my attribute through normal attribute access.
By removing the complexity of my logic, it made the code cleaner, easier to understand, and even more importantly, it actually works. While I am not 100% certain that I know the root cause of the issue (I believe it has something to do with a thread processing an attribute, which then in turn starts another thread that tries to read the attribute before it is processed, which caused something like a race condition and halting my code, but I could never duplicate the error outside of my application unfortunately to completely prove it out), I was able to get past it and continue with my development efforts.

What can be slowing down my program when i use multithreading?

I'm writing a program that downloads data from a website (eve-central.com). It returns xml when I send a GET request with some parameters. The problem is that I need to make about 7080 of such requests because i can't specify the typeid parameter more than once.
def get_data_eve_central(typeids, system, hours, minq=1, thread_count=1):
import xmltodict, urllib3
pool = urllib3.HTTPConnectionPool('api.eve-central.com')
for typeid in typeids:
r = pool.request('GET', '/api/quicklook', fields={'typeid': typeid, 'usesystem': system, 'sethours': hours, 'setminQ': minq})
answer = xmltodict.parse(r.data)
It was really slow when I just connected to the website and made all the requests so I decided to make it use multiple threads at a time (I read that if the process involves a lot of waiting (I/O, HTTP requests), it can be speeded up a lot with multithreading). I rewrote it using multiple threads, but it somehow isn't any faster (a bit slower in fact). Here's the code rewritten using multithreading:
def get_data_eve_central(all_typeids, system, hours, minq=1, thread_count=1):
if thread_count > len(all_typeids): raise NameError('TooManyThreads')
def requester(typeids):
pool = urllib3.HTTPConnectionPool('api.eve-central.com')
for typeid in typeids:
r = pool.request('GET', '/api/quicklook', fields={'typeid': typeid, 'usesystem': system, 'sethours': hours, 'setminQ': minq})
answer = xmltodict.parse(r.data)['evec_api']['quicklook']
answers.append(answer)
def chunkify(items, quantity):
chunk_len = len(items) // quantity
rest_count = len(items) % quantity
chunks = []
for i in range(quantity):
chunk = items[:chunk_len]
items = items[chunk_len:]
if rest_count and items:
chunk.append(items.pop(0))
rest_count -= 1
chunks.append(chunk)
return chunks
t = time.clock()
threads = []
answers = []
for typeids in chunkify(all_typeids, thread_count):
threads.append(threading.Thread(target=requester, args=[typeids]))
threads[-1].start()
threads[-1].join()
print(time.clock()-t)
return answers
What I do is I divide all typeids into as many chunks as the quantity of threads i want to use and create a thread for each chunk to process it. The question is: what can slow it down? (I apologise for my bad english)
Python has Global Interpreter Lock. It can be your problem. Actually Python cannot do it in a genuine parallel way. You may think about switching to other languages or staying with Python but use process-based parallelism to solve your task. Here is a nice presentation Inside the Python GIL

how to use wxpython threading to prevent blocking main loop

I'm working on a school project to develop a customized media player on python platform. The problem is when i use time.sleep(duration), it block the main loop of my GUI preventing it from updating. I've consulted my supervisor and was told to use multi-threading but i have no idea how to use threading. Would anyone advice me on how to implement threading in my scenario below?
Code:
def load_playlist(self, event):
playlist = ["D:\Videos\test1.mp4", "D:\Videos\test2.avi"]
for path in playlist:
#calculate each media file duration
ffmpeg_command = ['C:\\MPlayer-rtm-svn-31170\\ffmpeg.exe', '-i' , path]
pipe = subprocess.Popen(ffmpeg_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
results = pipe.communicate()
#Regular expression to get the duration
length_regexp = 'Duration: (\d{2}):(\d{2}):(\d{2})\.\d+,'
re_length = re.compile(length_regexp)
# find the matches using the regexp that to compare with the buffer/string
matches = re_length.search(str(results))
#print matches
hour = matches.group(1)
minute = matches.group(2)
second = matches.group(3)
#Converting to second
hour_to_second = int(hour) * 60 * 60
minute_to_second = int(minute) * 60
second_to_second = int(second)
num_second = hour_to_second + minute_to_second + second_to_second
print num_second
#Play the media file
trackPath = '"%s"' % path.replace("\\", "/")
self.mplayer.Loadfile(trackPath)
#Sleep for the duration of second(s) for the video before jumping to another video
time.sleep(num_second) #THIS IS THE PROBLEM#
You'll probably want to take a look at the wxPython wiki which has several examples of using threads, Queues and other fun things:
http://wiki.wxpython.org/LongRunningTasks
http://wiki.wxpython.org/Non-Blocking%20Gui
I also wrote a tutorial on the subject here: http://www.blog.pythonlibrary.org/2010/05/22/wxpython-and-threads/
The main thing to keep in mind is that when you use threads, you cannot call your wx methods directly (i.e. myWidget.SetValue, etc). Instead, you need to use one of the wxPython threadsafe methods: wx.CallAfter, wx.CallLater or wx.PostEvent
You would start a new thread like any other multithreading example:
from threading import Thread
# in caller code, start a new thread
Thread(target=load_playlist).start()
However, you have to make sure that calls to wx have to deal with inter-thread communication. You cannot just call wx-code from this new thread. It will segfault. So, use wx.CallAfter:
# in load_playlist, you have to synchronize your wx calls
wx.CallAfter(self.mplayer.Loadfile, trackPath)

Resources