I wrote a test to test the lookup speed of Set in Nodejs (v8.4).
const size = 5000000;
const lookups = 1000000;
const set = new Set();
for (let i = 0; i < size; i++) {
set.add(i);
}
const samples = [];
for (let i = 0; i < lookups; i++) {
samples.push(Math.floor(Math.random() * size));
}
const start = Date.now();
for (const key of samples) {
set.has(key);
}
console.log(`size: ${size}, time: ${Date.now() - start}`);
After running it with size = 5000, 50000, 500000, and 5000000, the result is surprising to me:
size: 5000, time: 29
size: 50000, time: 41
size: 500000, time: 81
size: 5000000, time: 130
I expected the time it takes is relatively constant. But it increases substantially as the number of items in the Set increases. Isn't the lookup supposed to be O(1)? What am I missing here?
Update 1:
After reading some comments and answers, I understand the point everyone is trying to make here. Maybe my question should be 'What is causing the increase in time?'. In hash map implementation, with the same number of lookups, the reason for increase in lookup time can only be there are more key collisions.
Update 2:
After more research, here is what I found:
V8 uses ordered hash table for both Set and Map implementation
According to this link, there are performance impact on the lookup time for ordered hash map, while unordered hash map's performance stays constant.
However, V8's ordered hash table implementation is based on this, and that doesn't seem to add any overhead to the look up time with increasing number of items.
Regardless of whether the JS Set implementation is actually O(1) or not (I'm not sure it is), you should not expect O(1) operations to result in speed that is identical across calls. It is a means of measuring the operation complexity rather than the actual throughput speed.
To demonstrate this, consider the use case of sorting an array of numbers. You can sort using array.sort which I believe is O(n * log(n)) in Node.js. You can also create a (bad, but amusing) O(n) implementation using timeouts (ignore complexity of adding to the array, etc):
// input data
let array = [
681, 762, 198, 347, 340,
73, 989, 967, 409, 752,
660, 914, 711, 153, 691,
35, 112, 907, 970, 67
];
// buffer of new
let sorted = [];
// O(n) sorting algorithm
array.forEach(function (num) {
setTimeout(sorted.push.bind(sorted, num), num);
});
// ensure sort finished
setTimeout(function () {
console.log(sorted);
}, 2000);
Of course, the first implementation is faster - but in terms of complexity, the second one is "better". The point is that you should only really be using O to estimate, it does not guarantee any specific amount of time. If you called the O(n) above with an array of 20 numbers (so the same length) but it had only two digit numbers, it would be a large execution time difference.
Stupid example, but it should hopefully support the point I'm trying to make :)
Caching and memory locality. V8's implementation of Set lookup has O(1) theoretical complexity, but real hardware has its own constraints and characteristics. Specifically, not every memory access has the same speed. Theoretical complexity analysis is only concerned with the number of operations, not the speed of each operation.
Update for updated question:
This answers your updated question! When you make many requests to a small Set, it will be likely that the CPU has cached the relevant chunks of memory, making many of the lookups faster than they would be if the data had to be retrieved from memory. There don't have to be more collisions for this effect to happen; it is simply the case that accessing a small memory region repeatedly is faster than spreading out the same number of accesses over a large memory region.
In fact, you can measure the same effect (with smaller magnitude) with an array:
const size = 5000000;
const lookups = 1000000;
const array = new Array(size);
for (let i = 0; i < size; i++) {
array[i] = 1;
}
const start = Date.now();
var result = 0;
for (var i = 0; i < lookups; i++) {
var sample = Math.floor(Math.random() * size);
result += array[sample];
}
const end = Date.now();
console.log(`size: ${size}, time: ${end - start}`);
A million lookups of random indices on a 5,000-element array will be faster than a million lookups of random indices on a 5,000,000 element array.
The reason is that for a smaller data structure, there's a greater likelihood that the random accesses will read elements that are already in the CPU's cache.
In theory you could be right, a Set could have a lookup of O(1), but the JS set definition is very specific on the algorithm. See ECMA Script definition. There is a loop over all elements included.
Try have a look at various HashSet implementation you can find for example here, there might be one with O(1) .has-speed.
Related
How much data can be processed using params, saved and session declarations?
How do these declaration affect performance and memory allocation/consumption (stack usage, data copy, etc.)?
What methods can be used in case of static/dynamic arrays with 10k-100k elements?
Params
An untyped param is expanded like a macro any time it is referenced, so resource consumption depends on its use. If you have a param with a large amount of data, then it usually means that the value is a compile-time list ([...]) with many elements, and you use a #foreach loop to process it. A #foreach loop is always unrolled, which gives long compile times and large generated code.
If a param is typed in a template, then that template evaluates the param once and stores a copy in heap-allocated memory. The data is shared between all instances of the device. Cost should be negligible.
Session
Data is heap-stored, one copy per device instance.
Saved
Pretty much like data, but adds a presumably negligible small per-module cost for attribute registration.
There's two more variants of data:
Constant C tables
header %{ const int data[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; %}
extern const int data;
Creates one super-cheap module-local instance.
Independent startup memoized method
independent startup memoized method data() -> (const int *) {
int *ret = new int[10];
for (local int i = 0; i < 10; i++) {
ret[i] = i;
}
return ret;
}
The data will be heap-allocated, initialized once, and shared across instances. Initialization is done by code, which saves size if it's easy to express the data programmatically, but can be cumbersome if it's just a table of irregular data.
I tried one code at hackerearth : https://www.hackerearth.com/practice/data-structures/stacks/basics-of-stacks/practice-problems/algorithm/fight-for-laddus/description/
The speed seems fine however the memory usage exceed the 256mb limit by nearly 2.8 times.
In java and python the memory is 5 times less however the time is nearly twice.
What factor can be used to optimise the memory usage in nodejs code implementation?
Here is nodejs implementation:
// Sample code to perform I/O:
process.stdin.resume();
process.stdin.setEncoding("utf-8");
var stdin_input = "";
process.stdin.on("data", function (input) {
stdin_input += input; // Reading input from STDIN
});
process.stdin.on("end", function () {
main(stdin_input);
});
function main(input) {
let arr = input.split("\n");
let testCases = parseInt(arr[0], 10);
arr.splice(0,1);
finalStr = "";
while(testCases > 0){
let inputArray = (arr[arr.length - testCases*2 + 1]).split(" ");
let inputArrayLength = inputArray.length;
testCases = testCases - 1;
frequencyObject = { };
for(let i = 0; i < inputArrayLength; ++i) {
if(!frequencyObject[inputArray[i]])
{
frequencyObject[inputArray[i]] = 0;
}
++frequencyObject[inputArray[i]];
}
let finalArray = [];
finalArray[inputArrayLength-1] = -1;
let stack = [];
stack.push(inputArrayLength-1);
for(let i = inputArrayLength-2; i>=0; i--)
{
let stackLength = stack.length;
while(stackLength > 0 && frequencyObject[inputArray[stack[stackLength-1]]] <= frequencyObject[inputArray[i]])
{
stack.pop();
stackLength--;
}
if (stackLength > 0) {
finalArray[i] = inputArray[stack[stackLength-1]];
} else {
finalArray[i] = -1;
}
stack.push(i);
}
console.log(finalArray.join(" ") + "\n")
}
}
What factor can be used to optimize the memory usage in nodejs code implementation?
Here are some things to consider:
Don't buffer any more input data than you need to before you process it or output it.
Try to avoid making copies of data. Use the data in place if possible. Remember that all string operations create a new string that is likely a copy of the original data. And, many array operations like .map(), .filter(), etc... create new copies of the original array.
Keep in mind that garbage collection is delayed and is typically done during idle time. So, for example, modifying strings in a loop may create a lot of temporary objects that all must exist at once, even though most or all of them will be garbage collected when the loop is done. This creates a poor peak memory usage.
Buffering
The first thing I notice is that you read the entire input file into memory before you process any of it. Right away for large input files, you're going to use a lot of memory. Instead, what you want to do is read enough of a chunk to get the next testCase and then process it.
FYI, this incremental reading/processing will make the code significantly more complicated to write (I've written an implementation myself) because you have to handle partially read lines, but it will hold down memory use a bunch and that's what you asked for.
Copies of Data
After reading the entire input file into memory, you immediately make a copy of it all with this:
let arr = input.split("\n");
So, now you've more than doubled the amount of memory the input data is taking up. Instead of just one string for all the input, you now still have all of that in memory, but you've now broken it up into hundreds of other strings (each with a little overhead of its own for a new string and of course a copy of each line).
Modifying Strings in a Loop
When you're creating your final result which you call finalStr, you're doing this over and over again:
finalStr = finalStr + finalArray.join(" ") + "\n"
This is going to create tons and tons of incremental strings that will likely end up all in memory at once because garbage collection probably won't run until the loop is over. As an example, if you had 100 lines of output that were each 100 characters long so the total output (not count line termination characters) was 100 x 100 = 10,000 characters, then constructing this in a loop like you are would create temporary strings of 100, 200, 300, 400, ... 10,000 which would consume 5000 (avg length) * 100 (number of temporary strings) = 500,000 characters. That's 50x the total output size consumed in temporary string objects.
So, not only does this create tons of incremental strings each one larger than the previous one (since you're adding onto it), it also creates your entire output in memory before writing any of it out to stdout.
Instead, you can incremental output each line to stdout as you construct each line. This will put the worst case memory usage at probably about 2x the output size whereas you're at 50x or worse.
If i create a first setInterval that push every 1 milisecond an item in my array, and then i have another setInterval (every one second) that copy this array and reset it (the original one).
Will i be sure that i don't erase data ? since the first interval write every 1 milisecond, and the second interval reset the array every one second ?
Here is a jsfiddle http://jsfiddle.net/t1d20usr/
var data = [];
var i = 0;
var interval = setInterval(function() {
data.push(i);
i++;
if(i== 10000) {
clearInterval(interval);
}
}, 1);
setInterval(function() {
var recentData = data;
//i want to be sure that this will not erase something set between the set of recentData and the reset of this array
data = [];
$('.container').append(recentData.join(',')');
}, 1000);
It works great, but due to the logic, i wonder if sometimes i could lost data.
Why am i doing this ? Because i get a lot of requests from different clients (socket emits) and i want to broadcast their request to others client only once every second instead of broadcasting on each emit from each client that is overkill. This is similar to how multiplayer game servers works. (My jsfiddle and the intervals is an example to simulate requests, i don't do it like that ! Eventually i will get emits at different intervals and will broadcast them every 30ms or something)
This is robust. Why? Javascript is single threaded. Each interval function runs to completion before the next one starts.
You might consider thinking about this as a queue. Your 1ms interval function puts elements onto the queue, and your 1s function takes all the queued elements off the queue in one go.
Your technique of replacing the data array with an empty one works well. You won't get any duplicates that way.
If you wanted to consume one item from your data array, you could use
const item = data.length > 0 ? data.shift() : null
but that does a lot of shuffling of the elements of the array. Use your favorite search engine to find higher-performance queue implementations.
The code is safe. As O. Jones said, the single threaded nature of javascript will ensure that you will not loose any element in the array.
But I'd like to add a consideration: don't expect that, when the 1s interval is reached, the length of data is 1000 (1s = 1000ms). The length of the data array will always be different and often much lower than 1000.
Expecially under stress (1ms interval is a bit stressing), javascript can't handle the load taking a time lower than 1ms.
Look at this modified version of your fiddle.
On my machine, this is logging a length in a range of 248 / 253 elements pushed every 1s.
var data = [];
var i = 0;
var interval = setInterval(function() {
data.push(i);
i++;
if(i== 10000) {
clearInterval(interval);
}
}, 1);
setInterval(function() {
console.log(data.length)
var recentData = data;
data = [];
$('.container').append(recentData.join(',') + '<span class="iteration">/</span>');
}, 1000);
Why is the speed of nodejs array shift/push operations not linear in the size of the array? There is a dramatic knee at 87370 that completely crushes the system.
Try this, first with 87369 elements in q, then with 87370. (Or, on a 64-bit system, try 85983 and 85984.) For me, the former runs in .05 seconds; the latter, in 80 seconds -- 1600 times slower. (observed on 32-bit debian linux with node v0.10.29)
q = [];
// preload the queue with some data
for (i=0; i<87369; i++) q.push({});
// fetch oldest waiting item and push new item
for (i=0; i<100000; i++) {
q.shift();
q.push({});
if (i%10000 === 0) process.stdout.write(".");
}
64-bit debian linux v0.10.29 crawls starting at 85984 and runs in .06 / 56 seconds. Node v0.11.13 has similar breakpoints, but at different array sizes.
Shift is a very slow operation for arrays as you need to move all the elements but V8 is able to use a trick to perform it fast when the array contents fit in a page (1mb).
Empty arrays start with 4 slots and as you keep pushing, it will resize the array using formula 1.5 * (old length + 1) + 16.
var j = 4;
while (j < 87369) {
j = (j + 1) + Math.floor(j / 2) + 16
console.log(j);
}
Prints:
23
51
93
156
251
393
606
926
1406
2126
3206
4826
7256
10901
16368
24569
36870
55322
83000
124517
So your array size ends up actually being 124517 items which makes it too large.
You can actually preallocate your array just to the right size and it should be able to fast shift again:
var q = new Array(87369); // Fits in a page so fast shift is possible
// preload the queue with some data
for (i=0; i<87369; i++) q[i] = {};
If you need larger than that, use the right data structure
I started digging into the v8 sources, but I still don't understand it.
I instrumented deps/v8/src/builtins.cc:MoveElemens (called from Builtin_ArrayShift, which implements the shift with a memmove), and it clearly shows the slowdown: only 1000 shifts per second because each one takes 1ms:
AR: at 1417982255.050970: MoveElements sec = 0.000809
AR: at 1417982255.052314: MoveElements sec = 0.001341
AR: at 1417982255.053542: MoveElements sec = 0.001224
AR: at 1417982255.054360: MoveElements sec = 0.000815
AR: at 1417982255.055684: MoveElements sec = 0.001321
AR: at 1417982255.056501: MoveElements sec = 0.000814
of which the memmove is 0.000040 seconds, the bulk is the heap->RecordWrites (deps/v8/src/heap-inl.h):
void Heap::RecordWrites(Address address, int start, int len) {
if (!InNewSpace(address)) {
for (int i = 0; i < len; i++) {
store_buffer_.Mark(address + start + i * kPointerSize);
}
}
}
which is (store-buffer-inl.h)
void StoreBuffer::Mark(Address addr) {
ASSERT(!heap_->cell_space()->Contains(addr));
ASSERT(!heap_->code_space()->Contains(addr));
Address* top = reinterpret_cast<Address*>(heap_->store_buffer_top());
*top++ = addr;
heap_->public_set_store_buffer_top(top);
if ((reinterpret_cast<uintptr_t>(top) & kStoreBufferOverflowBit) != 0) {
ASSERT(top == limit_);
Compact();
} else {
ASSERT(top < limit_);
}
}
when the code is running slow, there are runs of shift/push ops followed by runs of 5-6 calls to Compact() for every MoveElements. When it's running fast, MoveElements isn't called until a handful of times at the end, and just a single compaction when it finishes.
I'm guessing memory compaction might be thrashing, but it's not falling in place for me yet.
Edit: forget that last edit about output buffering artifacts, I was filtering duplicates.
this bug had been reported to google, who closed it without studying the issue.
https://code.google.com/p/v8/issues/detail?id=3059
When shifting out and calling tasks (functions) from a queue (array)
the GC(?) is stalling for an inordinate length of time.
114467 shifts is OK
114468 shifts is problematic, symptoms occur
the response:
he GC has nothing to do with this, and nothing is stalling either.
Array.shift() is an expensive operation, as it requires all array
elements to be moved. For most areas of the heap, V8 has implemented a
special trick to hide this cost: it simply bumps the pointer to the
beginning of the object by one, effectively cutting off the first
element. However, when an array is so large that it must be placed in
"large object space", this trick cannot be applied as object starts
must be aligned, so on every .shift() operation all elements must
actually be moved in memory.
I'm not sure there's a whole lot we can do about this. If you want a
"Queue" object in JavaScript with guaranteed O(1) complexity for
.enqueue() and .dequeue() operations, you may want to implement your
own.
Edit: I just caught the subtle "all elements must be moved" part -- is RecordWrites not GC but an actual element copy then? The memmove of the array contents is 0.04 milliseconds. The RecordWrites loop is 96% of the 1.1 ms runtime.
Edit: if "aligned" means the first object must be at first address, that's what memmove does. What is RecordWrites?
I have a piece of code as follows. I want to improve time complexity for this.
This is a thread and I can have upto maximum of 2000 threads that execute this function at the same time
On top of that, I wait for file descriptors that are ready from a pollset. MAX_RTP_SESSIONS is also huge (value of 5000 or more). so its a big for loop and therefore i can see performance getting affected. [When value of MAX_RTP_SESSIONS is reduced to just 500, i can see a huge improvement in performance]
But i will have to use 2000 threads and also 5000 sessions. I wish i could find a way to change time complexity from o(n^2) to atleast o(n) or better. Any ideas are really appreciated!
//..
retval=epoll_wait(epfd, pollset, EPOLL_MAX_EVENTS, mSecTimeout)
//..
sem_wait(&sem_sessions);
for(i = 0; i< retVal; i++) {
for (j=0; j < MAX_RTP_SESSIONS; j++) {
if ((g_rtp_sessions[j].destroy==FALSE) &&
(g_rtp_sessions[j].used!=FALSE) &&
(g_rtp_sessions[j].p_rtp->rtp_socket->fd == pollset[i].data.fd))
{
if (0 < rtp_recv_data(....)) {
rtp_update(...)
}
}
}
}
sem_post(&sem_sessions);
//..
Sort pollset, then you can do a binary search on it which should lead to a O(n log n) algorithm.