As you might probably already know, python3 is a single threaded, mono processor program, this seems to goes well with tinydb (json) that state being only made of full python, as well as bottle (web server).
In a world where you want to have something in pre-production or early production and low traffic (<100 ppl a week) what do you think of the idea of having a running bottle website with the built-in HTTP server (python) and tinydb as a database.
The two things I was wondering was :
a) data isolation (or concurrency) : but since everything is single threaded the processor will do the job of queuing the CRUD operations, one after the other, there won't be any concurrent access but regarding the low traffic, should I care ?
b) wait time, while the processor is queuing 10 ppl that wants to have access to the same table stored in ram, the processor will queue the requests and people will have wait time. Now the question being will this be humanly noticeable, Python being fast (milisec). However I don't really know how to stat test 50 ppl connecting to the website at the same time and requesting for the same ressources.
I am open to every feedback, let me know.
if you are going to have such a low traffic + very fast RAM only operations then it seems like it might be worthy option and also easily testable.
import bottle
a = bottle.Bottle()
#a.get('/')
def root():
return {'cheese': '🧀'}
if __name__ == '__main__':
a.run()
and our test_file:
import time
import requests
from concurrent.futures import ThreadPoolExecutor
def test(index):
requests.get('http://localhost:8080/').raise_for_status()
pool = ThreadPoolExecutor(max_workers=10)
for i in (1, 10, 50, 100, 1000):
t = time.time()
pool.map(test, range(i))
print(i, 'took', time.time() - t)
print('🥳')
on my Mac this the output:
1 took 0.00046515464782714844
10 took 0.003888845443725586
50 took 0.0003077983856201172
100 took 0.0006000995635986328
1000 took 0.0058138370513916016
🥳
which is indeed, not noticeable.
that said, every IO/CPU/presistancy thing that will be added later will break your assumptions, so with the price of a small overhead it might be better to use a bigger concurrent DB and such.
😀
Related
I have an optimization algorithm written in node.js that uses cpu time (measured with performance.now()) as a heuristic.
However, I noticed that occasionally some trivial lines of code would cost much more than usual.
So I wrote a test program:
const timings = [];
while (true) {
const start = performance.now();
// can add any trivial line of code here, or just nothing
const end = performance.now();
const dur = end - start;
if (dur > 1) {
throw [
"dur > 1",
{
start,
end,
dur,
timings,
avg: _.mean(timings),
max: _.max(timings),
min: _.min(timings),
last: timings.slice(-10),
},
];
}
timings.push(dur);
}
The measurements showed an average of 0.00003ms and a peak >1ms (with the second highest <1ms but same order of magnitude).
The possible reasons I can think of are:
the average timing isn't the actual time for executing the code (some compiler optimization)
performance.now isn't accurate somehow
cpu scheduling related - process wasn't running normally but still counted in performance.now
occasionally node is doing something extra behind the scenes (GC etc)
something happening on the hardware/os level - caching / page faults etc
Is any of these a likely reason, or is it something else?
Whichever the cause is, is there a way to make a more accurate measurement for the algorithm to use?
The outliers are current causing the algorithm to misbehave & without knowing how to resolve this issue the best option is to use the moving average cost as a heuristic but has its downsides.
Thanks in advance!
------- Edit
I appreciate how performance.now() will never be accurate, but was a bit surprised that it could span 3-4 orders of magnitude (as opposed to 2 orders of magnitude or ideally 1.)
Would anyone have any idea/pointers as to how performance.now() works and thus what's likely the major contributor to the error range?
It'd be nice to know if the cause is due to something node/v8 doesn't have control over (hardware/os level) vs something it does have control over (a node bug/options/gc related), so I can decide whether there's a way to reduce the error range before considering other tradeoffs with using an alternative heuristic.
------- Edit 2
Thanks to #jfriend00 I now realize performance.now() doesn't measure the actual CPU time the node process executed, but just the time since when the process started.
The question now is
if there's an existing way to get actual CPU time
is this a feature request for node/v8
unless the node process doesn't have enough information from the OS to provide this
You're unlikely to be able to accurately measure the time for one trivial line of code. In fact, the overhead in executing performance.now() is probably many times higher than the time to execute one trivial line of code. You have to be careful that what you're measuring takes substantially longer to execute than the uncertainty or overhead of the measurement itself. Measuring very small executions times is not going to be an accurate endeavor.
1,3 and 5 in your list are also all possibilities. You aren't guaranteed that your code gets a dedicated CPU core that is never interrupted to service some other thread in the system. In my Windows system, even when my nodejs is the only "app" running, there are hundreds of other threads devoted to various OS services that may or may not request some time to run while my nodejs app is running and eventually get some time slice of the CPU core my nodejs app was using.
And, as best I know, performance.now() is just getting a high resolution timer from the OS that's relative to some epoch time. It has no idea when your thread is and isn't running on a CPU core and wouldn't have any way to adjust for that. It just gets a high resolution timestamp which you can compare to some other high resolution timestamp. The time elapsed is not CPU time for your thread. It's just clock time elapsed.
Is any of these a likely reason, or is it something else?
Yes, they all sound likely.
is there a way to make a more accurate measurement for the algorithm to use?
No, sub-millisecond time measurements are generally not reliable, and almost never a good idea. (Doesn't matter whether a timing API promises micro/nanosecond precision or whatever; chances are that (1) it doesn't hold up in practice, and (2) trying to rely on it creates more problems than it solves. You've just found an example of that.)
Even measuring milliseconds is fraught with peril. I once investigated a case of surprising performance, where it turned out that on that particular combination of hardware and OS, after 16ms of full load the CPU ~tripled its clock rate, which of course had nothing to do with the code that appeared to behave weirdly.
EDIT to reply to edited question:
The question now is
if there's an existing way to get actual CPU time
No.
is this a feature request for node/v8
No, because...
unless the node process doesn't have enough information from the OS to provide this
...yes.
I'm trying to solve the following scenario in nodejs in a performant meaner.
I have a 100Mb worth of jsons which I need to process and the time function to process each entry is about O(sweet_jesus(n)). In real time it takes about ~4-5 seconds for each entry.
The only silver lining that I can totally run the processing of each entry individually (about 900 entries in total), they are unrelated.
My first choice was to go for worker_threads with node-worker-threads-pool:
import fs from 'fs';
import path from 'path';
import _ from 'lodash';
import moment from 'moment';
import workerPool from 'node-worker-threads-pool';
function generateShortEvaluationsByWorkers(){
const pool = new workerPool.StaticPool({
size: 10,
task: path.resolve('src/simulator/evaluationGenerator.js')
});
let simulationEvaluations = [];
const promises = [];
fs.readdirSync(path.resolve(`results/companies`)).forEach(file => {
const rawData = fs.readFileSync(path.resolve(`results/companies/${file}`));
const company = JSON.parse(rawData);
console.log(new Date(), ": company parsed, sending it for processing:", file);
promises.push(pool.exec(company).then(result=>{
simulationEvaluations.push(result);
}));
});
Promise.all(promises).then(()=>{
fs.writeFileSync(
path.resolve(`results/bundles/simulationEvaluations.json`), JSON.stringify(simulationEvaluations, null, 2)
);
pool.destroy();
})
}
The above code runs beautifully, it shows that the I/O - of reading all the files and feeding it to the pool - takes about 5-6 seconds...
But after that there is absolutely no difference whatsoever compared to running whole thing in a single thread. The logs do show that the processing of the individual files no longer happen in order as before, so I guess there are some threading happening in the background, but the total time does not change one bit. It takes about an hour either way.
Also my hyper-threaded Intel 8750 with 6 cores (12 logical) shows 86% utilization goes to the node process. So my alleged 10 separate thread doesn't even manage to utilize one full core. - EDIT: I was a retard it does make a huge difference I wrote down the times wrong...
After this I crank the thread pool size up to 100 and slice the number of files down to a 100. And that's where freaky stuff starts to happen. First, all my CPU cores go brrrr and my laptop properly melts through the table as one would expect. OS gives zero responsiveness everything is a slideshow.
The first 20 or so files gets processed within the same second after which the processing of individual files go to ~3 seconds each (neatly after each other, one message 3-5 seconds after the other). The last 10 or so files gets processed within the same second again.
Why does 10 threads doesn't make a difference compared to 1 thread?
Shouldn't I see files to be processed in clusters, where the cluster size is comparable to the number of logical cores, instead of timestamps one after the other?
Is there a way to "leave" a core to process something else, while calculations still go to Neptune with all the other cores?
EDIT: I wont delete this, maybe somebody will learn from it :)
So to answer my own questions:
It does, I could not measure, could not write, and could not read my CPU meter either at this point... totally my fault
This one I still don't fully get, but after a few runs I suspect that when you start a whole buttload of threads, you make the whole system hang so much just by the strain of starting them all that by the time its able to spew out the first log, its already done with a bunch of calculation.
Yeah this is also kinda obvious, do not use so many threads that the thread management itself will make the OS throw a shitfit.
In the end I got the best results with 11 threads btw.
I’m new to Python and I'm struggling to understand some things in multiprocessing/threading. I want to speed up a function and have been trying different approaches from the multiprocessing module, but I can’t get it to run any faster. It’s possible it won’t run any faster, but I wanted to be sure this is the case before giving up. This isn’t a full description, but the most time-consuming activities are:
-repeatedly generating random data (10,000 rows and 10 columns)
-using a pre-fit model to predict an outcome for each row and
-comparing each predicted value to an initial value.
It performs this multiple times depending on how many of the predicted values equal the initial value, updating the parameters of the distribution each time. The output of the function is a single numeric value.
I want to loop over several of these initial values and end up with a list of the output values. I was hoping to get multiple iterations to run concurrently (but I’m open to anything that could make it faster). I’ve been ignorantly attempting pool.apply, starmap and Process but haven’t seen a change in time.
My questions are:
Based on the description of what I’m doing, is my program I/O or CPU bound? (Is it possible to tell from that? Is this even the right question to be asking?)
Should I be using multithreading or multiprocessing?
How can I determine if the iterations are running concurrently or not?
Given you didn't mention anything about drives, I'm going to assume it's not very IO bound (although still possible). Are you using multiple threads/processes yet? If not, that's definitely your issue.
I'd probably look at Pythons Thread library and because of the loop to create data, maybe the thread pool library. You just need all of your threads running that rand function at the same time.
EDIT: I forgot to mention. If you open Task Manager/System Monitor, you should be able to see load per CPU/Thread. If only one is maxed at any given time, you aren't concurrent.
Example: I wrote a quick example to help with the thread pool. Your 10,000 item list with 10 columns was not even noticeable on my i7. I increased the columns to 10,000 and it used 4GB of RAM and probably 30 seconds of 100% CPU # 3.4GHz.
from multiprocessing import Pool, Array
import random
def thread_function(_):
"""Return a random number."""
l = []
for _ in range(10000):
l.append(random.randint(0, 10000))
return l
if __name__ == '__main__':
rand_list = Array('i', range(10000))
with Pool() as pool:
rand_list = pool.map(thread_function, rand_list)
print(len(rand_list))
So I have a Python 3.7 program that uses Threading library to multiprocess tasks
def myFunc(stName,ndName,ltName):
##logic here
names = open('names.txt').read().splitlines() ## more than 30k name
for i in names:
processThread = threading.Thread(target=myFunc, args=(i,name2nd,lName,))
processThread.start()
time.sleep(0.4)
I have to open multiple windows to complete the tasks with different inputs, but eventually I ran into a very laggy situation where I cant even browse my OSX , I tried to use the multiprocessing library to solve the issue but unfortunately, multiprocessing seems not to be working correctly in OSX .
Anyone can advise ?
This behavior is to be expected. If myFunc is a CPU intensive task that takes time, you are potentially starting up to 30k threads doing this task which will use all the machine resources.
Another potential issue with your code is that Threads are expensive in term of memory (each thread uses 8MB of memory). Creating 30k threads would use up to 240GB of memory which your machine probably doesn't have, and will lead to an OutOfMemoryError.
Finally, another issue with that code is that your main routine is starting up all those threads, but not waiting for any of them to finish executing. This means that the last started threads will most likely not run until the end.
I would recommend using a ThreadPoolExecutor to solve all those issues:
from concurrent.futures.thread import ThreadPoolExecutor
def myFunc(stName,ndName,ltName):
##logic here
names = open('names.txt').read().splitlines() ## more than 30k name
num_workers = 8
with ThreadPoolExecutor(max_workers=num_workers) as executor:
for i in names:
executor.map(myFunc, (i, name2nd, lName))
You can play with num_workers to find a balance between amount of resources being used by this program and speed of execution that fits you.
Looking at picking up stock data using urls from nasdaq. For 4000 stocks am thinking of doing each in a thread so 4000 url threads. Has anyone tried this ? Does it overload windows stack?
I suggest using concurrent.futures.ThreadPoolExecutor() with default max workers as too many threads will cause a giant overhead.
Max workers default to (processor_count) * 5 which is good in your case I believe.
A client using asyncio is also an option but it's quite a bit more complicated.