Create locust tasks dynamically? - python-3.x

Basically I need to spawn 30 users and have 50 different tasks for them I need them to run in parallel. So I am trying to generate 50 tasks like following:
class UserSimulation(HttpUser):
host = os.environ['BASE_URL']
# time in seconds the simulated user will wait before proceeding to the next task
wait_time = between(1, 2)
for item_id in range(1, 51):
#task(1)
def view_items_with_different_item_ids(self, item_id=item_id):
self.client.get(
url=f"/my-url/item24_00{item_id}",
verify=False,
auth=(os.environ['USERNAME'], os.environ['PASSWORD'])
For obvious reasons this approach doesn't let me create 50 tasks dynamically, as only the last one gets saved. Any ideas of a work-around to it?

To do it the way you're trying, try creating different functions programmatically. I don't know how to use decorators with those, but if nothing else you can add the functions to the Locust task list when you create them. Just create a task list tasks = [] then tasks.append(view_items_with_different_item_ids_1) when the function is created.
However, I'm not positive if this is necessary, based on your description of what you want. If you only need 30 users to go make that call 50 times, you only need the one task and you can loop through the call in there.
class UserSimulation(HttpUser):
host = os.environ['BASE_URL']
# time in seconds the simulated user will wait before proceeding to the next task
wait_time = between(1, 2)
#task(1)
def view_items_with_different_item_ids(self):
for item_id in range(1, 51):
self.client.get(
url=f"/my-url/item24_00{item_id}",
verify=False,
auth=(os.environ['USERNAME'], os.environ['PASSWORD'])
If you need a random number instead of a sequential one but need to make sure each one is called once:
import random
item_ids = list(range(1,51))
random.shuffle(item_ids)
#task(1)
def view_items_with_different_item_ids(self):
for item_id in item_ids:
self.client.get(
url=f"/my-url/item24_00{item_id}",
verify=False,
auth=(os.environ['USERNAME'], os.environ['PASSWORD'])
If you just want to pull a random number all the time and don't care about repeats:
import random
item_ids = list(range(1,51))
#task(1)
def view_items_with_different_item_ids(self):
random_id = random.choice(item_ids)
self.client.get(
url=f"/my-url/item24_00{random_id}",
verify=False,
auth=(os.environ['USERNAME'], os.environ['PASSWORD'])

Related

Best way to keep creating threads on variable list argument

I have an event that I am listening to every minute that returns a list ; it could be empty, 1 element, or more. And with those elements in that list, I'd like to run a function that would monitor an event on that element every minute for 10 minute.
For that I wrote that script
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
import Client
client = Client()
def handle_event(event):
for i in range(10):
client.get_info(event)
sleep(60)
async def main():
while True:
entires = client.get_new_entry()
if len(entires) > 0:
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
executor.map(handle_event, entires)
await asyncio.sleep(60)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
However, instead of keep monitoring the entries, it blocks while the previous entries are still being monitors.
Any idea how I could do that please?
First let me explain why your program doesn't work the way you want it to: It's because you use the ThreadPoolExecutor as a context manager, which will not close until all the threads started by the call to map are finished. So main() waits there, and the next iteration of the loop can't happen until all the work is finished.
There are ways around this. Since you are using asyncio already, one approach is to move the creation of the Executor to a separate task. Each iteration of the main loop starts one copy of this task, which runs as long as it takes to finish. It's a async def function so many copies of this task can run concurrently.
I changed a few things in your code. Instead of Client I just used some simple print statements. I pass a list of integers, of random length, to handle_event. I increment a counter each time through the while True: loop, and add 10 times the counter to every integer in the list. This makes it easy to see how old calls continue for a time, mixing with new calls. I also shortened your time delays. All of these changes were for convenience and are not important.
The important change is to move ThreadPoolExecutor creation into a task. To make it cooperate with other tasks, it must contain an await expression, and for that reason I use executor.submit rather than executor.map. submit returns a concurrent.futures.Future, which provides a convenient way to await the completion of all the calls. executor.map, on the other hand, returns an iterator; I couldn't think of any good way to convert it to an awaitable object.
To convert a concurrent.futures.Future to an asyncio.Future, an awaitable, there is a function asyncio.wrap_future. When all the futures are complete, I exit from the ThreadPoolExecutor context manager. That will be very fast since all of the Executor's work is finished, so it does not block other tasks.
import random
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
def handle_event(event):
for i in range(10):
print("Still here", event)
sleep(2)
async def process_entires(counter, entires):
print("Counter", counter, "Entires", entires)
x = [counter * 10 + a for a in entires]
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
futs = []
for z in x:
futs.append(executor.submit(handle_event, z))
await asyncio.gather(*(asyncio.wrap_future(f) for f in futs))
async def main():
counter = 0
while True:
entires = [0, 1, 2, 3, 4][:random.randrange(5)]
if len(entires) > 0:
counter += 1
asyncio.create_task(process_entires(counter, entires))
await asyncio.sleep(3)
if __name__ == "__main__":
asyncio.run(main())

Where to set the locks in apply pandas with multithreading?

I am trying to asynchronously read and write from a pandas df with an apply function. For this purpose I am using the multithreading.dummy package. Since I am doing read and write simultaneously (multithreaded) on my df, I am using multiprocessing.Lock() so that no more than one thread can edit the df at the a given time. However I am a bit confused to where I should be adding a lock.acquire() and lock.release()with an apply function in pandas. I have tried doing as per below, however, it seems that doing as so the entire process becomes synchronous, so it defeats the whole purpose of multithreading.
self._lock.acquire()
to_df[col_name] = to_df.apply(lambda row: getattr(Object(row['col_1'],
row['col_2'],
row['col_3']),
someattribute), axis=1)
self._lock.release()
Note: In my case I have to be doing getattr. someattribute is simply a #property in Object. Object takes 3 arguments, which some from rows 1,2,3 from my df.
There 2 possible solutions. 1 - locks. 2 - queues. Code below is just a skeleton, it may contain typos/errors and cannot be used as is.
First. Locks where they actually needed:
def method_to_process_url(df):
lock.acquire()
url = df.loc[some_idx, some_col]
lock.release()
info = process_url(url)
lock.acquire()
# add info to df
lock.release()
Second. Queues instead of locks:
def method_to_process_url(df, url_queue, info_queue):
for url in url_queue.get():
info = process_url(url)
info_queue.put(info)
url_queue = queue.Queue()
# add all urls to process to the url_queue
info_queue = queue.Queue()
# working_thread_1
threading.Thread(
target=method_to_process_url,
kwargs={'url_queue': url_queue, 'info_queue': info_queue},
daemon=True).start()
# more working threads
counter = 0
while counter < amount_of_urls:
info = info_queue.get():
# add info to df
counter += 1
In the second case you may even start separate thread for every url without url_queue (reasonable if amount of urls is on the order of thousands or less). counter is some simple way to stop the program when all urls are processed.
I would use the second approach if you ask me. It is more flexible in my opinion.

Python27 Is it able to make timer without thread.Timer?

So, basically I want to make timer but I don't want to use thread.Timer for
efficiency
Python produces thread by itself, it is not efficient and better not to use it.
I search the essay related to this. And checked It is slow to use thread.
e.g) single process was divided into N, and made it work into Thread, It was slower.
However I need to use Thread for this.
class Works(object):
def __init__(self):
self.symbol_dict = config.ws_api.get("ASSET_ABBR_LIST")
self.dict = {}
self.ohlcv1m = []
def on_open(self, ws):
ws.send(json.dumps(config.ws_api.get("SUBSCRIPTION_DICT")))
everytime I get the message form web socket server, I store in self.dict
def on_message(self,ws,message):
message = json.loads(message)
if len(message) > 2 :
ticker = message[2]
pair = self.symbol_dict[(ticker[0])]
baseVolume = ticker[5]
timestmap = time.time()
try:
type(self.dict[pair])
except KeyError as e:
self.dict[pair] = []
self.dict[pair].append({
'pair':pair,
'baseVolume' : baseVolume,
})
def run(self):
websocket.enableTrace(True)
ws = websocket.WebSocketApp(
url = config.ws_api.get("WEBSOCK_HOST"),
on_message = self.on_message,
on_open = self.on_open
)
ws.run_forever(sslopt = {"cert_reqs":ssl.CERT_NONE})
'once in every 60s it occurs. calculate self.dict and save in to self.ohlcv1m
and will sent it to db. eventually self.dict and self.ohlcv1m initialized again to store 1min data from server'
def every60s(self):
threading.Timer(60, self.every60s).start()
for symbol in self.dict:
tickerLists = self.dict[symbol]
self.ohlcv1m.append({
"V": sum([
float(ticker['baseVolume']) for ticker in tickerLists]
})
#self.ohlcv1m will go to database every 1m
self.ohlcv1 = [] #init again
self.dict = {} #init again
if __name__ == "__main__":
work=Works()
t1 = threading.Thread(target=work.run)
t1.daemon = True
t1.start()
work.every60s()
(sorry for the indention)
I am connecting to socket by running run_forever() and getting realtimedata
Every 60s I need to check and calculate the data
Is there any way to make 60s without thread in python27?
I will be so appreciate you answer If you give me any advice.
Thank you
The answer comes down to if you need the code to run exactly every 60 seconds, or if you can just wait 60 seconds between runs (i.e. if the logic takes 5 seconds, it'll run every 65 seconds).
If you're happy with just a 60 second gap between runs, you could do
import time
while True:
every60s()
time.sleep(60)
If you're really set on not using threads but having it start every 60 seconds regardless of the last poll time, you could time the last execution and subtract that from 60 seconds to get the sleep time.
However, really, with the code you've got there you're not going to run into any of the issues with Python threads you might have read about. Those issues come in when you've got multiple threads all running at the same time and all CPU bound, which doesn't seem to be the case here unless there's some very slow, CPU intensive work that's not in your provided code.

How can I control traffic, when I use "for" in locust?

When I use "for" in locust, I do not know why the "req/s" is too high.
class UserBehavior(TaskSet):
#task(1)
def start_congche(self):
filename = 'D:\测试\项目\精励评分\从车评分/阳光压力测试数据.csv'
with open(filename) as f:
reader = csv.DictReader(f)
for test in reader:
self.client.post("/DataPreFillServer/DataPreFillProductService", first +test["vin"] + vincode +test["vehicle_code"] + vehicleCode + end)
class WebsiteUser(HttpLocust):
task_set = UserBehavior
host = "http://10.10.6.12:8080"
min_wait = 1000
max_wait = 1000
But, if I do not use "for", everything is ok....
class UserBehavior(TaskSet):
#task(1)
def start_congche(self):
self.client.post("/DataPreFillServer/DataPreFillProductService", first + vincode + vehicleCode + end)
use the class of queue..
each time from the queue to take the value.
user_data_queue = queue.Queue()
filename = 'XXXXXXXX.csv'
with open(filename) as f:
reader = csv.DictReader(f)
for test in reader:
data = {
"vin": test["vin"],
"vehicle_code": test["vehicle_code"],
}
user_data_queue.put_nowait(data)
try:
data = self.locust.user_data_queue.get_nowait()
except queue.Empty:
exit(0)
self.client.post("/DataPreFillServer/DataPreFillProductService",payload)
I believe the reason why you are seeing 20 requests/second in the first approach is the for loop is executing multiple post requests each time one of the five Locust users attacks the system. Depending how large the file is, let's say its 20 iterations, that means each user executed likely in parallel 20 times and it the test ended.
Take a look at your start and end times, the first test finishes in ~8 seconds while the other one takes around 30 something.
In the second test, five locust users execute a single post request per user and has to go back and continue executing one at a time until 100 requests are satisfied.

locks needed for multithreaded python scraping?

I have a list of zipcodes that I want to pull business listings for using the yelp fusion api. Each zipcode will have to make at least one api call ( often much more) and so, I want to be able to keep track of my api usage as the daily limit is 25000. I have defined each zipcode as an instance of user defined Locale class. This locale class has a class variable Locale.pulls, which acts as a global counter for the number of pulls.
I want to multithread this using the multiprocessing module but I am not sure if I need to use locks and if so, how would I do so? The concern is race conditions as I need to be sure each thread sees the current number of pulls defined as the Zip.pulls class variable in the pseudo code below.
import multiprocessing.dummy as mt
class Locale():
pulls = 0
MAX_PULLS = 20000
def __init__(self,x,y):
#initialize the instance with arguments needed to complete the API call
def pull(self):
if Locale.pulls > MAX_PULLS:
return none
else:
# make the request, store the returned data and increment the counter
self.data = self.call_yelp()
Locale.pulls += 1
def main():
#zipcodes below is a list of arguments needed to initialize each zipcode as a Locale class object
pool = mt.Pool(len(zipcodes)/100) # let each thread work on 100 zipcodes
data = pool.map(Locale, zipcodes)
A simple solution would be to check that len(zipcodes) < MAP_PULLS before running the map().

Resources