How can I control traffic, when I use "for" in locust? - python-3.x

When I use "for" in locust, I do not know why the "req/s" is too high.
class UserBehavior(TaskSet):
#task(1)
def start_congche(self):
filename = 'D:\测试\项目\精励评分\从车评分/阳光压力测试数据.csv'
with open(filename) as f:
reader = csv.DictReader(f)
for test in reader:
self.client.post("/DataPreFillServer/DataPreFillProductService", first +test["vin"] + vincode +test["vehicle_code"] + vehicleCode + end)
class WebsiteUser(HttpLocust):
task_set = UserBehavior
host = "http://10.10.6.12:8080"
min_wait = 1000
max_wait = 1000
But, if I do not use "for", everything is ok....
class UserBehavior(TaskSet):
#task(1)
def start_congche(self):
self.client.post("/DataPreFillServer/DataPreFillProductService", first + vincode + vehicleCode + end)

use the class of queue..
each time from the queue to take the value.
user_data_queue = queue.Queue()
filename = 'XXXXXXXX.csv'
with open(filename) as f:
reader = csv.DictReader(f)
for test in reader:
data = {
"vin": test["vin"],
"vehicle_code": test["vehicle_code"],
}
user_data_queue.put_nowait(data)
try:
data = self.locust.user_data_queue.get_nowait()
except queue.Empty:
exit(0)
self.client.post("/DataPreFillServer/DataPreFillProductService",payload)

I believe the reason why you are seeing 20 requests/second in the first approach is the for loop is executing multiple post requests each time one of the five Locust users attacks the system. Depending how large the file is, let's say its 20 iterations, that means each user executed likely in parallel 20 times and it the test ended.
Take a look at your start and end times, the first test finishes in ~8 seconds while the other one takes around 30 something.
In the second test, five locust users execute a single post request per user and has to go back and continue executing one at a time until 100 requests are satisfied.

Related

Create locust tasks dynamically?

Basically I need to spawn 30 users and have 50 different tasks for them I need them to run in parallel. So I am trying to generate 50 tasks like following:
class UserSimulation(HttpUser):
host = os.environ['BASE_URL']
# time in seconds the simulated user will wait before proceeding to the next task
wait_time = between(1, 2)
for item_id in range(1, 51):
#task(1)
def view_items_with_different_item_ids(self, item_id=item_id):
self.client.get(
url=f"/my-url/item24_00{item_id}",
verify=False,
auth=(os.environ['USERNAME'], os.environ['PASSWORD'])
For obvious reasons this approach doesn't let me create 50 tasks dynamically, as only the last one gets saved. Any ideas of a work-around to it?
To do it the way you're trying, try creating different functions programmatically. I don't know how to use decorators with those, but if nothing else you can add the functions to the Locust task list when you create them. Just create a task list tasks = [] then tasks.append(view_items_with_different_item_ids_1) when the function is created.
However, I'm not positive if this is necessary, based on your description of what you want. If you only need 30 users to go make that call 50 times, you only need the one task and you can loop through the call in there.
class UserSimulation(HttpUser):
host = os.environ['BASE_URL']
# time in seconds the simulated user will wait before proceeding to the next task
wait_time = between(1, 2)
#task(1)
def view_items_with_different_item_ids(self):
for item_id in range(1, 51):
self.client.get(
url=f"/my-url/item24_00{item_id}",
verify=False,
auth=(os.environ['USERNAME'], os.environ['PASSWORD'])
If you need a random number instead of a sequential one but need to make sure each one is called once:
import random
item_ids = list(range(1,51))
random.shuffle(item_ids)
#task(1)
def view_items_with_different_item_ids(self):
for item_id in item_ids:
self.client.get(
url=f"/my-url/item24_00{item_id}",
verify=False,
auth=(os.environ['USERNAME'], os.environ['PASSWORD'])
If you just want to pull a random number all the time and don't care about repeats:
import random
item_ids = list(range(1,51))
#task(1)
def view_items_with_different_item_ids(self):
random_id = random.choice(item_ids)
self.client.get(
url=f"/my-url/item24_00{random_id}",
verify=False,
auth=(os.environ['USERNAME'], os.environ['PASSWORD'])

Memory efficient massive http requests

I need to do an unlimited HTTP requests from a web API one after another and make it work efficiently and quite fast. (I need it for a utility so it should work no matter how many time im using it, also it should be able to be used on a web server(people use at the same time))
right now I'm using a threading with a queue but after a while of doing it I'm getting errors like:
'cant start a new thread'
'MemoryError'
or it may work a bit, but pretty slow.
this is a part of my code:
concurrent = 25
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=receiveJson)
t.daemon = True
t.start()
for url in get_urls():
q.put(url.strip())
q.join()
*get_urls() is a simple function that returns a list of urls(unknown length)
this is my recieveJson(thread target):
def receiveJson():
while True:
url = q.get()
res = request.get(url).json()
q.task_done()
The problem is coming from your Threads never ending, notice that there is no exit condition in your receiveJson function. The simplest way to signal it should end is usually by enqueuing None:
def receiveJson():
while True:
url = q.get()
if url is None: # Exit condition allows thread to complete
q.task_done()
break
res = request.get(url).json()
q.task_done()
and then you can change the other code as follows:
concurrent = 25
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=receiveJson)
t.daemon = True
t.start()
for url in get_urls():
q.put(url.strip())
for i in range(concurrent):
q.put(None) # Add a None for each thread to be able to get and complete
q.join()
There are other ways of doing this, but this is the how to do it with the least amount of change to your code. If this is happening often, it might be worth looking into the concurrent.futures.ThreadPoolExecutor class to avoid the cost of opening threads very often.

Python27 Is it able to make timer without thread.Timer?

So, basically I want to make timer but I don't want to use thread.Timer for
efficiency
Python produces thread by itself, it is not efficient and better not to use it.
I search the essay related to this. And checked It is slow to use thread.
e.g) single process was divided into N, and made it work into Thread, It was slower.
However I need to use Thread for this.
class Works(object):
def __init__(self):
self.symbol_dict = config.ws_api.get("ASSET_ABBR_LIST")
self.dict = {}
self.ohlcv1m = []
def on_open(self, ws):
ws.send(json.dumps(config.ws_api.get("SUBSCRIPTION_DICT")))
everytime I get the message form web socket server, I store in self.dict
def on_message(self,ws,message):
message = json.loads(message)
if len(message) > 2 :
ticker = message[2]
pair = self.symbol_dict[(ticker[0])]
baseVolume = ticker[5]
timestmap = time.time()
try:
type(self.dict[pair])
except KeyError as e:
self.dict[pair] = []
self.dict[pair].append({
'pair':pair,
'baseVolume' : baseVolume,
})
def run(self):
websocket.enableTrace(True)
ws = websocket.WebSocketApp(
url = config.ws_api.get("WEBSOCK_HOST"),
on_message = self.on_message,
on_open = self.on_open
)
ws.run_forever(sslopt = {"cert_reqs":ssl.CERT_NONE})
'once in every 60s it occurs. calculate self.dict and save in to self.ohlcv1m
and will sent it to db. eventually self.dict and self.ohlcv1m initialized again to store 1min data from server'
def every60s(self):
threading.Timer(60, self.every60s).start()
for symbol in self.dict:
tickerLists = self.dict[symbol]
self.ohlcv1m.append({
"V": sum([
float(ticker['baseVolume']) for ticker in tickerLists]
})
#self.ohlcv1m will go to database every 1m
self.ohlcv1 = [] #init again
self.dict = {} #init again
if __name__ == "__main__":
work=Works()
t1 = threading.Thread(target=work.run)
t1.daemon = True
t1.start()
work.every60s()
(sorry for the indention)
I am connecting to socket by running run_forever() and getting realtimedata
Every 60s I need to check and calculate the data
Is there any way to make 60s without thread in python27?
I will be so appreciate you answer If you give me any advice.
Thank you
The answer comes down to if you need the code to run exactly every 60 seconds, or if you can just wait 60 seconds between runs (i.e. if the logic takes 5 seconds, it'll run every 65 seconds).
If you're happy with just a 60 second gap between runs, you could do
import time
while True:
every60s()
time.sleep(60)
If you're really set on not using threads but having it start every 60 seconds regardless of the last poll time, you could time the last execution and subtract that from 60 seconds to get the sleep time.
However, really, with the code you've got there you're not going to run into any of the issues with Python threads you might have read about. Those issues come in when you've got multiple threads all running at the same time and all CPU bound, which doesn't seem to be the case here unless there's some very slow, CPU intensive work that's not in your provided code.

Thread objects not freed from memory

I wrote a continuous script that collects some data from the internet every few seconds, keeps it in memory for a while, periodically stores it all to db and then deletes it. To keep everything running smoothly I use threads to collect the data from several sources at the same time. To minimize db operations and to avoid conflict with other db processes, I only write every now and then.
The memory from the deleted variables is never returned and eventually becomes so large the script crashes (shown by tracemalloc and pympler). I guess I'm handling the data coming out of the threads wrong but I don't know how I could do it differently. Minimal example below.
Addition: I don't think I can use a queue because in reality multiple functions are threaded from this point, modifying different local variables.
import threading
import time
import tracemalloc
import pympler.muppy, pympler.summary
import gc
tracemalloc.start()
def a():
# collect data
collection.update({int(time.time()): list(range(1,1000))})
return
collection = {}
threads = []
start = time.time()
cycle = 0
while time.time() < start + 60:
cycle += 1
t = threading.Thread(target = a)
threads.append(t)
t.start()
time.sleep(1)
for t in threads:
if t.is_alive() == False:
t.join()
# periodically delete data
delete = []
for key, val in collection.items():
if key < time.time() - 10:
delete.append(key)
for delet in delete:
print('DELETING:', delet)
del collection[delet]
gc.collect()
print('CYCLE:', cycle, 'THREADS:', threading.active_count(), 'COLLECTION:', len(collection))
print(tracemalloc.get_traced_memory())
all_objects = pympler.muppy.get_objects()
sum1 = pympler.summary.summarize(all_objects)
pympler.summary.print_(sum1)

Cassandra Pycassa connection pool, how to use properly?

In order to get a Cassandra insert going faster I'm using multithreading, its working ok, but if I add more threads it doesnt make any difference, I think I'm not generating more connections, I think maybe I should be using pool.execute(f, *args, **kwargs) but I dont know how to use it, the documentation is quite scanty. Heres my code so far..
import connect_to_ks_bp
from connect_to_ks_bp import ks_refs
import time
import pycassa
from datetime import datetime
import json
import threadpool
pool = threadpool.ThreadPool(20)
count = 1
bench = open("benchCassp20_100000.txt", "w")
def process_tasks(lines):
#let threadpool format your requests into a list
requests = threadpool.makeRequests(insert_into_cfs, lines)
#insert the requests into the threadpool
for req in requests:
pool.putRequest(req)
pool.wait()
def read(file):
"""read data from json and insert into keyspace"""
json_data=open(file)
lines = []
for line in json_data:
lines.append(line)
print len(lines)
process_tasks(lines)
def insert_into_cfs(line):
global count
count +=1
if count > 5000:
bench.write(str(datetime.now())+"\n")
count = 1
#print count
#print kspool.checkedout()
"""
user_tweet_cf = pycassa.ColumnFamily(kspool, 'UserTweet')
user_name_cf = pycassa.ColumnFamily(kspool, 'UserName')
tweet_cf = pycassa.ColumnFamily(kspool, 'Tweet')
user_follower_cf = pycassa.ColumnFamily(kspool, 'UserFollower')
"""
tweet_data = json.loads(line)
"""Format the tweet time as an epoch seconds int value"""
tweet_time = time.strptime(tweet_data['created_at'],"%a, %d %b %Y %H:%M:%S +0000")
tweet_time = int(time.mktime(tweet_time))
new_user_tweet(tweet_data['from_user_id'],tweet_time,tweet_data['id'])
new_user_name(tweet_data['from_user_id'],tweet_data['from_user_name'])
new_tweet(tweet_data['id'],tweet_data['text'],tweet_data['to_user_id'])
if tweet_data['to_user_id'] != 0:
new_user_follower(tweet_data['from_user_id'],tweet_data['to_user_id'])
""""4 functions below carry out the inserts into specific column families"""
def new_user_tweet(from_user_id,tweet_time,id):
ks_refs.user_tweet_cf.insert(from_user_id,{(tweet_time): id})
def new_user_name(from_user_id,user_name):
ks_refs.user_name_cf.insert(from_user_id,{'username': user_name})
def new_tweet(id,text,to_user_id):
ks_refs.tweet_cf.insert(id,{
'text': text
,'to_user_id': to_user_id
})
def new_user_follower(from_user_id,to_user_id):
ks_refs.user_follower_cf.insert(from_user_id,{to_user_id: 0})
read('tweets.json')
if __name__ == '__main__':
This is just another file..
import pycassa
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily
"""This is a static class I set up to hold the global database connection stuff,
I only want to connect once and then the various insert functions will use these fields a lot"""
class ks_refs():
pool = ConnectionPool('TweetsKS',use_threadlocal = True,max_overflow = -1)
#classmethod
def cf_connect(cls, column_family):
cf = pycassa.ColumnFamily(cls.pool, column_family)
return cf
ks_refs.user_name_cfo = ks_refs.cf_connect('UserName')
ks_refs.user_tweet_cfo = ks_refs.cf_connect('UserTweet')
ks_refs.tweet_cfo = ks_refs.cf_connect('Tweet')
ks_refs.user_follower_cfo = ks_refs.cf_connect('UserFollower')
#trying out a batch mutator whihc is supposed to increase performance
ks_refs.user_name_cf = ks_refs.user_name_cfo.batch(queue_size=10000)
ks_refs.user_tweet_cf = ks_refs.user_tweet_cfo.batch(queue_size=10000)
ks_refs.tweet_cf = ks_refs.tweet_cfo.batch(queue_size=10000)
ks_refs.user_follower_cf = ks_refs.user_follower_cfo.batch(queue_size=10000)
A few thoughts:
Batch sizes of 10,000 are way too large. Try 100.
Make your ConnectionPool size at least as large as the number of threads using the pool_size parameter. The default is 5. Pool overflow should only be used when the number of active threads may vary over time, not when you have a fixed number of threads. The reason is that it will result in a lot of unnecessary opening and closing of new connections, which is a fairly expensive process.
After you've resolved those issues, look into these:
I'm not familiar with the threadpool library that you're using. Make sure that if you take the insertions to Cassandra out of the picture that you see an increase in the performance when you increase the number of threads
Python itself has a limit to how many threads may be useful due to the GIL. It shouldn't normally max out at 20, but it might if you're doing something CPU intensive or something that requires a lot of Python interpretation. The test that I described in my previous point will cover this as well. It may be the case that you should consider using the multiprocessing module, but you would need some code changes to handle that (namely, not sharing ConnectionPools, CFs, or hardly anything else between processes).

Resources