Deno on multi-core machines - multithreading

In Node.js there is the cluster module to utilize all available cores on the machine which is pretty great, especially when used with the node module pm2. But I am pretty stoked about some features of Deno but I have wondered about how to best run it on a multi-core machine.
I understand that there is workers which works great for a specific task but for normal web requests it seems like performance of multi-core machines is wasted somewhat? What is the best strategy to get maximum availability and utilization of my hardware in Deno?
I am a bit worried that if you only have a single process going on and there is some CPU intensive task for whatever reason it will "block" all other requests coming in. In node.js the cluster module would solve this, since another process would handle the request but I am unsure on how to handle this in Deno?
I think you could run several instances in Deno on different ports and then have some kind of load balancer in front of it but that seems like quite a complex setup in comparison. I also get that you could use some kind of service like Deno Deploy or whatever, but I already have hardware that I want to run it on.
What are the alternatives for me?
Thanks in advance for you sage advice and better wisdom.

In Deno, like in a web browser, you should be able to use Web Workers to utilize 100% of a multi-core CPU.
In a cluster you need a "manager" node (which can be a worker itself too as needed/appropriate). In a similar fashion the Web Worker API can be used to create however many dedicated workers as desired. This means the main thread should never block as it can delegate all tasks that will potentially block to its workers. Tasks that won't block (e.g. simple database or other I/O bound calls) can be done directly on the main thread like normal.
Deno also supports navigator.hardwareConcurrency so you can query about available hardware and determine the number of desired workers accordingly. You might not need to define any limits though. Spawning a new dedicated worker from the same source as a previously spawned dedicated worker may be fast enough to do so on demand. Even so there may be value in reusing dedicated workers rather than spawning a new one for every request.
With Transferable Objects large data sets can be made available to/from workers without copying the data. This along with messaging makes it pretty straight forward to delegate tasks while avoiding performance bottlenecks from copying large data sets.
Depending on your use cases you might also use a library like Comlink "that removes the mental barrier of thinking about postMessage and hides the fact that you are working with workers."
e.g.
main.ts
import { serve } from "https://deno.land/std#0.133.0/http/server.ts";
import ComlinkRequestHandler from "./ComlinkRequestHandler.ts";
serve(async function handler(request) {
const worker = new Worker(new URL("./worker.ts", import.meta.url).href, {
type: "module",
});
const handler = ComlinkRequestHandler.wrap(worker);
return await handler(request);
});
worker.ts
/// <reference no-default-lib="true"/>
/// <reference lib="deno.worker" />
import ComlinkRequestHandler from "./ComlinkRequestHandler.ts";
ComlinkRequestHandler.expose(async (request) => {
const body = await request.text();
return new Response(`Hello to ${request.url}\n\nReceived:\n\n${body}\n`);
});
ComlinkRequestHandler.ts
import * as Comlink from "https://cdn.skypack.dev/comlink#4.3.1?dts";
interface RequestMessage extends Omit<RequestInit, "body" | "signal"> {
url: string;
headers: Record<string, string>;
hasBody: boolean;
}
interface ResponseMessage extends ResponseInit {
headers: Record<string, string>;
hasBody: boolean;
}
export default class ComlinkRequestHandler {
#handler: (request: Request) => Promise<Response>;
#responseBodyReader: ReadableStreamDefaultReader<Uint8Array> | undefined;
static expose(handler: (request: Request) => Promise<Response>) {
Comlink.expose(new ComlinkRequestHandler(handler));
}
static wrap(worker: Worker) {
const { handleRequest, nextResponseBodyChunk } =
Comlink.wrap<ComlinkRequestHandler>(worker);
return async (request: Request): Promise<Response> => {
const requestBodyReader = request.body?.getReader();
const requestMessage: RequestMessage = {
url: request.url,
hasBody: requestBodyReader !== undefined,
cache: request.cache,
credentials: request.credentials,
headers: Object.fromEntries(request.headers.entries()),
integrity: request.integrity,
keepalive: request.keepalive,
method: request.method,
mode: request.mode,
redirect: request.redirect,
referrer: request.referrer,
referrerPolicy: request.referrerPolicy,
};
const nextRequestBodyChunk = Comlink.proxy(async () => {
if (requestBodyReader === undefined) return undefined;
const { value } = await requestBodyReader.read();
return value;
});
const { hasBody: responseHasBody, ...responseInit } = await handleRequest(
requestMessage,
nextRequestBodyChunk
);
const responseBodyInit: BodyInit | null = responseHasBody
? new ReadableStream({
start(controller) {
async function push() {
const value = await nextResponseBodyChunk();
if (value === undefined) {
controller.close();
return;
}
controller.enqueue(value);
push();
}
push();
},
})
: null;
return new Response(responseBodyInit, responseInit);
};
}
constructor(handler: (request: Request) => Promise<Response>) {
this.#handler = handler;
}
async handleRequest(
{ url, hasBody, ...init }: RequestMessage,
nextRequestBodyChunk: () => Promise<Uint8Array | undefined>
): Promise<ResponseMessage> {
const request = new Request(
url,
hasBody
? {
...init,
body: new ReadableStream({
start(controller) {
async function push() {
const value = await nextRequestBodyChunk();
if (value === undefined) {
controller.close();
return;
}
controller.enqueue(value);
push();
}
push();
},
}),
}
: init
);
const response = await this.#handler(request);
this.#responseBodyReader = response.body?.getReader();
return {
hasBody: this.#responseBodyReader !== undefined,
headers: Object.fromEntries(response.headers.entries()),
status: response.status,
statusText: response.statusText,
};
}
async nextResponseBodyChunk(): Promise<Uint8Array | undefined> {
if (this.#responseBodyReader === undefined) return undefined;
const { value } = await this.#responseBodyReader.read();
return value;
}
}
Example usage:
% deno run --allow-net --allow-read main.ts
% curl -X POST --data '{"answer":42}' http://localhost:8000/foo/bar
Hello to http://localhost:8000/foo/bar
Received:
{"answer":42}
There's probably a better way to do this (e.g. via Comlink.transferHandlers and registering transfer handlers for Request, Response, and/or ReadableStream) but the idea is the same and will handle even large request or response payloads as the bodies are streamed via messaging.

It all depends on what workload you would like to push to the threads. If you are happy with the performance of the built in Deno HTTP server running on the main thread but you need to leverage multithreading to create the responses more efficiently then it's simple as of Deno v1.29.4.
The HTTP server will give you an async iterator server like
import { serve } from "https://deno.land/std/http/server.ts";
const server = serve({ port: 8000 });
Then you may use the built in functionality pooledMap like
import { pooledMap } from "https://deno.land/std#0.173.0/async/pool.ts";
const ress = pooledMap( window.navigator.hardwareConcurrency - 1
, server
, req => new Promise(v => v(respondWith(req))
);
for await (const res of ress) {
// respond with res
}
Where respondWith is just a function which handles the recieved request and generates the respond object. If respondWith is already an async function then you don't even need to wrap it into a promise.
However, in case you would like to run multiple Deno HTTP servers on separate therads then that's also possible but you need a load balancer like GoBetween at the head. In this case you should instantiate multiple Deno HTTP servers at separate threads and receive their requsets at the main thread as separate async iterators. To achieve this, per thread you can do like;
At the worker side i.e. ./servers/server_800X.ts;
import { serve } from "https://deno.land/std/http/server.ts";
const server = serve({ port: 800X });
console.log("Listening on http://localhost:800X/");
for await (const req of server) {
postMessage({ type: "request", req });
}
and at the main thread you can easily convert the correspodning worker http server into an async iterator like
async function* server_800X() {
worker_800X.onmessage = event => {
if (event.data.type === "request") {
yield event.data.req;
}
};
}
for await (const req of server_800X()) {
// Handle the request here in the main thread
}
You should also be able to multiplex either the HTTP (req) or the res async iterators by using the MuxAsyncIterators functionality in to a single stream and then spawn by pooledMap. So if you have 2 http servers working on server_8000.ts and server_8001.ts then you can multiplex them into a single async iterator like
const muxedServer = new MuxAsyncIterator<Request>();
muxedServer.add(server_8000);
muxedServer.add(server_8001);
for await (const req of muxedServer) {
// repond accordingly(*)
}
Obviously you should also be able to spawn new threads to process requests received from the muxedServer by utilizing pooledMap as shown above.
(*) In case you choose to use a load balancer and multiple Deno http servers then you should assign special headers to the requests at the load balancer, designating the server ID that it's been diverted to. This way, by inspecting this speical header you can decide from which server to respond for any particular request.

Related

Why Does my AWS lambda function randomly fail when using private elasticache network calls as well as external API calls?

I am trying to write a caching function that returns cached elasticcache data or makes an api call to retrieve that data. However, the lambda function seems to be very unrealiable and timing out often.
It seems that the issue is having redis calls as well as public api calls causes the issue. I can confirm that I have setup aws correctly with a subnet with an internet gateway and a private subnet with a nat gateway. The function works, but lonly 10 % of the time.The remaining times exceution is stopped right before making the API call.
I have also noticed that the api calls fail after creating the redis client. If I make the external api call prior to making the redis check it seems the function is a lot more reliable and doesn't time out.
Not sure what to do. Is it best practice to seperate these 2 tasks or am I doing something wrong?
let data = null;
module.exports.handler = async (event) => {
//context.callbackWaitsForEmptyEventLoop = false;
let client;
try {
client = new Redis(
6379,
"redis://---.---.ng.0001.use1.cache.amazonaws.com"
);
client.get(event.token, async (err, result) => {
if (err) {
console.error(err);
} else {
data = result;
await client.quit();
}
});
if (data && new Date().getTime() / 1000 - eval(data).timestamp < 30) {
res.send(`({
"address": "${token}",
"price": "${eval(data).price}",
"timestamp": "${eval(data).timestamp}"
})`);
} else {
getPrice(event); //fetch api data
}
```
There a lot of misunderstand in your code. I'll try to guide you to fix it and understand how to do that correctly.
You are mixing asynchronous and synchronous code in your function.
You should use JSON.parse instead of eval to parse the data because eval allows arbitrary code to be executed in your function
You're using the res.send to return response to the client instead of callback. Remember the usage of res.send is only in express and you're using a lambda and to return the result to client you need to use callback function
To help you in this task, I completely rewrite your code solving these misundersand.
const Redis = require('ioredis');
module.exports.handler = async (event, context, callback) => {
// prefer to use lambda env instead of put directly in the code
const client = new Redis(
"REDIS_PORT_ENV",
"REDIS_HOST_ENV"
);
const data = await client.get(event.token);
client.quit();
const parsedData = JSON.parse(data);
if (parsedDate && new Date().getTime() / 1000 - parsedData.timestamp < 30) {
callback(null, {
address: event.token,
price: parsedData.price,
timestamp: parsedData.timestamp
});
} else {
const dataFromApi = await getPrice(event);
callback(null, dataFromApi);
}
};
There another usage with lambdas that return an object instead of pass a object inside callback, but I think you get the idea and understood your mistakes.
Follow the docs about correctly usage of lambda:
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/using-lambda-functions.html
To undestand more about async and sync in javascript:
https://www.freecodecamp.org/news/synchronous-vs-asynchronous-in-javascript/
JSON.parse x eval: JSON.parse vs. eval()

Can a SharedArrayBuffer be picked up by garbage collection in Node?

I'm trying to build a Node application using worker threads, divided into three parts.
The primary thread that delegates tasks
A dedicated worker thread that updates shared data
A pool of worker threads that run calculations on shared data
The shared data is in the form of several SharedArrayBuffer objects operating like a pseudo-database. I would like to be able to update the data without needing to pause calculations, and I'm ok with a few tasks using slightly stale data. The flow I've come up with is:
Primary thread passes data to update thread
Update thread creates a whole new SharedArrayBuffer and populates it with updated data.
Update thread returns a pointer to the new buffer back to primary thread.
Primary thread caches the latest pointer in a variable, overwriting its previous value, and passes it to each worker thread with each task.
Worker threads don't retain these pointers at all after executing their operations.
The problem is, this seems to create a memory leak in the resident state stack when I run a prototype that frequently makes updates and swaps out the shared buffers. Garbage collection appears to make a couple of passes removing the discarded buffers, but then it climbs continuously until the application slows and eventually hangs or crashes.
How can I guarantee that a SharedArrayBuffer will get picked up by garbage collection when I'm done with it, or it it even possible? I've seen hints to the effect that as long as all references to it are removed from all threads it will eventually get picked up, but not a clear answer.
I'm using the threads.js library to abstract the worker thread operations. Here's a summary of my prototype:
app.ts:
import { ModuleThread, Pool, spawn, Worker } from "threads";
import { WriterModule } from "./workers/writer-worker";
import { CalculateModule } from "./workers/calculate-worker";
class App {
calculatePool = Pool<ModuleThread<CalculateModule>>
(() => spawn(new Worker('./workers/calculate-worker')), { size: 6 });
writerThread: ModuleThread<WriterModule>;
sharedBuffer: SharedArrayBuffer;
dataView: DataView;
constructor() {
this.sharedBuffer = new SharedArrayBuffer(1000000);
this.dataView = new DataView(this.sharedBuffer);
}
async start(): Promise<void> {
this.writerThread = await spawn<WriterModule>(new Worker('./workers/writer-worker'));
await this.writerThread.init(this.sharedBuffer);
await this.update();
// Arbitrary delay between updates
setInterval(() => this.update(), 5000);
while (true) {
// Arbitrary delay between tasks
await new Promise<void>(resolve => setTimeout(() => resolve(), 250));
this.calculate();
}
}
async update(): Promise<void> {
const updates: any[] = [];
// generates updates
this.sharedBuffer = await this.writerThread.update(updates);
this.dataView = new DataView(this.sharedBuffer);
}
async calculate(): Promise<void> {
const task = this.calculatePool.queue(async (calc) => calc.calculate(this.sharedBuffer));
const sum: number = await task;
// Use result
}
}
const app = new App();
app.start();
writer-worker.ts:
import { expose } from "threads";
let sharedBuffer: SharedArrayBuffer;
const writerModule = {
async init(startingBuffer: SharedArrayBuffer): Promise<void> {
sharedBuffer = startingBuffer;
},
async update(data: any[]): Promise<SharedArrayBuffer> {
// Arbitrary update time
await new Promise<void>(resolve => setTimeout(() => resolve(), 500));
const newSharedBuffer = new SharedArrayBuffer(1000000);
// Copy some values from the old buffer over, perform some mutations, etc.
sharedBuffer = newSharedBuffer;
return sharedBuffer;
},
}
export type WriterModule = typeof writerModule;
expose(writerModule);
calculate-worker.ts
import { expose } from "threads";
const calculateModule = {
async calculate(sharedBuffer: SharedArrayBuffer): Promise<number> {
const view = new DataView(sharedBuffer);
// Arbitrary calculation time
await new Promise<void>(resolve => setTimeout(() => resolve(), 100));
// Run arbitrary calculation
return sum;
}
}
export type CalculateModule = typeof calculateModule;
expose(calculateModule);

Multiple delays in Javascript/Nodejs Promise

I'm working on a proxy that caches files and I'm trying to add some logic that prevents multiple clients from downloading the same files before the proxy has a chance to cache them.
Basically, the logic I'm trying to implement is the following:
Client 1 requests a file. The proxy checks if the file is cached. If it's not, it requests it from the server, caches it, then sends it to the client.
Client 2 requests the same file after client 1 requested it, but before the proxy has a chance to cache it. So the proxy will tell client 2 to wait a few seconds because there is already a download in progress.
A better approach would probably be to give client 2 a "try again later" message, but let's just say that's currently not an option.
I'm using Nodejs with the anyproxy library. According to the documentation, delayed responses are possible by using promises.
However, I don't really see a way to achieve what I want using Promises. From what I can tell, I could do something like this:
module.exports = {
*beforeSendRequest(requestDetail) {
if(thereIsADownloadInProgressFor(requestDetail.url)) {
return new Promise((resolve, reject) => {
setTimeout(() => { // delay
resolve({ response: responseDetail.response });
}, 10000);
});
}
}
};
But that would mean simply waiting for a maximum amount of time and hoping the download finishes by then.
And I don't want that.
I would prefer to be able to do something like this (but with Promises, somehow):
module.exports = {
*beforeSendRequest(requestDetail) {
if(thereIsADownloadInProgressFor(requestDetail.url)) {
var i = 0;
for(i = 0 ; i < 10 ; i++) {
JustSleep(1000);
if(!thereIsADownloadInProgressFor(requestDetail.url))
return { response: responseDetail.response };
}
}
}
};
Is there any way I can achieve this with Promises in Nodejs?
Thanks!
You can use a Map to cache your file downloads.
The mapping in Map would be url -> Promise { file }
// Map { url => Promise { file } }
const cache = new Map()
const thereIsADownloadInProgressFor = url => cache.has(url)
const getCachedFilePromise = url => cache.get(url)
const downloadFile = async url => {/* download file code here */}
const setAndReturnCachedFilePromise = url => {
const filePromise = downloadFile(url)
cache.set(url, filePromise)
return filePromise
}
module.exports = {
beforeSendRequest(requestDetail) {
if(thereIsADownloadInProgressFor(requestDetail.url)) {
return getCachedFilePromise(requestDetail.url).then(file => ({ response: file }))
} else {
return setAndReturnCachedFilePromise(requestDetail.url).then(file => ({ response: file }))
}
}
};
You don't need to send a try again response, simply serve the same data to both requests. All you need to do is store the requests somewhere in the caching system and trigger all of them when the fetching is done.
Here's a cache implementation that does only a single fetch for multiple requests. No delays and no try-laters:
export class class Cache {
constructor() {
this.resultCache = {}; // this object is the cache storage
}
async get(key, cachedFunction) {
let cached = this.resultCache[key];
if (cached === undefined) { // No cache so fetch data
this.resultCache[key] = {
pending: [] // This is the magic, store further
// requests in this pending array.
// This way pending requests are directly
// linked to this cache data
}
try {
let result = await cachedFunction(); // Wait for result
// Once we get result we need to resolve all pending
// promises. Loop through the pending array and
// resolve them. See code below for how we store pending
// requests.. it will make sense:
this.resultCache[key].pending
.forEach(waiter => waiter.resolve(result));
// Store the result of the cache so later we don't
// have to fetch it again:
this.resultCache[key] = {
data: result
}
// Return result to original promise:
return result;
// Note: yes, this means pending promises will get triggered
// before the original promise is resolved but normally
// this does not matter. You will need to modify the
// logic if you want promises to resolve in original order
}
catch (err) { // Error when fetching result
// We still need to trigger all pending promises to tell
// them about the error. Only we reject them instead of
// resolving them:
if (this.resultCache[key]) {
this.resultCache[key].pending
.forEach((waiter: any) => waiter.reject(err));
}
throw err;
}
}
else if (cached.data === undefined && cached.pending !== undefined) {
// Here's the condition where there was a previous request for
// the same data. Instead of fetching the data again we store
// this request in the existing pending array.
let wait = new Promise((resolve, reject) => {
// This is the "waiter" object above. It is basically
// It is basically the resolve and reject functions
// of this promise:
cached.pending.push({
resolve: resolve,
reject: reject
});
});
return await wait; // await response form original request.
// The code above will cause this to return.
}
else {
// Return cached data as normal
return cached.data;
}
}
}
The code may look a bit complicated but it is actually quite simple. First we need a way to store the cached data. Normally I'd just use a regular object for this:
{ key : result }
Where the cached data is stored in the result. But we also need to store additional metadata such as pending requests for the same result. So we need to modify our cache storage:
{ key : {
data: result,
pending: [ array of requests ]
}
}
All this is invisible and transparent to code using this Cache class.
Usage:
const cache = new Cache();
// Illustrated with w3c fetch API but you may use anything:
cache.get( URL , () => fetch(URL) )
Note that wrapping the fetch in an anonymous function is important because we want the Cache.get() function to conditionally call the fetch to avoid multiple fetch being called. It also gives the Cache class flexibility to handle any kind of asynchronous operation.
Here's another example for caching a setTimeout. It's not very useful but it illustrates the flexibility of the API:
cache.get( 'example' , () => {
return new Promise((resolve, reject) => {
setTimeout(resolve, 1000);
});
});
Note that the Cache class above does not have any invalidations or expiry logic for the sake of clarity but it's fairly easy to add them. For example if you want the cache to expire after some time you can just store the timestamp along with the other cache data:
{ key : {
data: result,
timestamp: timestamp,
pending: [ array of requests ]
}
}
Then in the "no-cache" logic simply detect the expiry time:
if (cached === undefined || (cached.timestamp + timeout) < now) ...

How to stop async code from running Node.JS

I'm creating a program where I constantly run and stop async code, but I need a good way to stop the code.
Currently, I have tried to methods:
Method 1:
When a method is running, and another method is called to stop the first method, I start an infinite loop to stop that code from running and then remove the method from the queue(array)
I'm 100% sure that this is the worst way to accomplish it, and it works very buggy.
Code:
class test{
async Start(){
const response = await request(options);
if(stopped){
while(true){
await timeout(10)
}
}
}
}
Code 2:
var tests = [];
Start(){
const test = new test();
tests.push(test)
tests.Start();
}
Stop(){
tests.forEach((t, i) => {t.stopped = true;};
tests = [];
}
Method 2:
I load the different methods into Workers, and when I need to stop the code, I just terminate the Worker.
It always takes a lot of time(1 sec) to create the Worker, and therefore not the best way, since I need the code to run without 1-2 sec pauses.
Code:
const Worker = require("tiny-worker");
const code = new Worker(path.resolve(__dirname, "./Code/Code.js"))
Stopping:
code.terminate()
Is there any other way that I can stop async code?
The program contains Request using nodejs Request-promise module, so program is waiting for requests, it's hard to stop the code without one of the 2 methods.
Is there any other way that I can stop async code?
Keep in mind the basic of how Nodejs works. I think there is some misunderstanding here.
It execute the actual function in the actual context, if encounters an async operation the event loop will schedule it's execetution somewhere in the future. There is no way to remove that scheduled execution.
More info on event loop here.
In general for manage this kind of situations you shuold use flags or semaphores.
The program contains Request using nodejs Request-promise module, so program is waiting for requests, it's hard to stop the code
If you need to hard "stop the code" you can do something like
func stop() {
process.exit()
}
But if i'm getting it right, you're launching requests every x time, at some point you need to stop sending the request without managing the response.
You can't de-schedule the response managemente portion, but you can add some logic in it to (when it will be runned) check if the "request loop" has been stopped.
let loop_is_stopped = false
let sending_loop = null
func sendRequest() {
const response = await request(options) // "wait here"
// following lines are scheduled after the request promise is resolved
if (loop_is_stopped) {
return
}
// do something with the response
}
func start() {
sending_loop = setInterval(sendRequest, 1000)
}
func stop() {
loop_is_stopped = true
clearInterval(sending_loop)
}
module.exports = { start, stop }
We can use Promise.all without killing whole app (process.exit()), here is my example (you can use another trigger for calling controller.abort()):
const controller = new AbortController();
class Workflow {
static async startTask() {
await new Promise((res) => setTimeout(() => {
res(console.log('RESOLVE'))
}, 3000))
}
}
class ScheduleTask {
static async start() {
return await Promise.all([
new Promise((_res, rej) => { if (controller.signal.aborted) return rej('YAY') }),
Workflow.startTask()
])
}
}
setTimeout(() => {
controller.abort()
console.log("ABORTED!!!");
}, 1500)
const run = async () => {
try {
await ScheduleTask.start()
console.log("DONE")
} catch (err) {
console.log("ERROR", err.name)
}
}
run()
// ABORTED!!!
// RESOLVE
"DONE" will never be showen.
res will be complited
Maybe would be better to run your code as script with it's own process.pid and when we need to interrupt this functionality we can kill this process by pid in another place of your code process.kill.

node.js process out of memory in http.request loop

In my node.js server i cant figure out, why it runs out of memory. My node.js server makes a remote http request for each http request it receives, therefore i've tried to replicate the problem with the below sample script, that also runs out of memory.
This only happens if the iterations in the for loop are very high.
From my point of view, the problem is related to the fact that node.js is queueing the remote http requests. How to avoid this?
This is the sample script:
(function() {
var http, i, mypost, post_data;
http = require('http');
post_data = 'signature=XXX%7CPSFA%7Cxxxxx_value%7CMyclass%7CMysubclass%7CMxxxxx&schedule=schedule_name_6569&company=XXXX';
mypost = function(post_data, cb) {
var post_options, req;
post_options = {
host: 'myhost.com',
port: 8000,
path: '/set_xxxx',
method: 'POST',
headers: {
'Content-Length': post_data.length
}
};
req = http.request(post_options, function(res) {
var res_data;
res.setEncoding('utf-8');
res_data = '';
res.on('data', function(chunk) {
return res_data += chunk;
});
return res.on('end', function() {
return cb();
});
});
req.on('error', function(e) {
return console.debug('TM problem with request: ' + e.message);
});
req.write(post_data);
return req.end;
};
for (i = 1; i <= 1000000; i++) {
mypost(post_data, function() {});
}
}).call(this);
$ node -v
v0.4.9
$ node sample.js
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory
Tks in advance
gulden PT
Constraining the flow of requests into the server
It's possible to prevent overload of the built-in Server and its HTTP/HTTPS variants by setting the maxConnections property on the instance. Setting this property will cause node to stop accept()ing connections and force the operating system to drop requests when the listen() backlog is full and the application is already handling maxConnections requests.
Throttling outgoing requests
Sometimes, it's necessary to throttle outgoing requests, as in the example script from the question.
Using node directly or using a generic pool
As the question demonstrates, unchecked use of the node network subsystem directly can result in out of memory errors. Something like node-pool makes the active pool management attractive, but it doesn't solve the fundamental problem of unconstrained queuing. The reason for this is that node-pool doesn't provide any feedback about the state of the client pool.
UPDATE: As of v1.0.7 node-pool includes a patch inspired by this post to add a boolean return value to acquire(). The code in the following section is no longer necessary and the example with the streams pattern is working code with node-pool.
Cracking open the abstraction
As demonstrated by Andrey Sidorov, a solution can be reached by tracking the queue size explicitly and mingling the queuing code with the requesting code:
var useExplicitThrottling = function () {
var active = 0
var remaining = 10
var queueRequests = function () {
while(active < 2 && --remaining >= 0) {
active++;
pool.acquire(function (err, client) {
if (err) {
console.log("Error acquiring from pool")
if (--active < 2) queueRequests()
return
}
console.log("Handling request with client " + client)
setTimeout(function () {
pool.release(client)
if(--active < 2) {
queueRequests()
}
}, 1000)
})
}
}
queueRequests(10)
console.log("Finished!")
}
Borrowing the streams pattern
The streams pattern is a solution which is idiomatic in node. Streams have a write operation which returns false when the stream cannot buffer more data. The same pattern can be applied to a pool object with acquire() returning false when the maximum number of clients have been acquired. A drain event is emitted when the number of active clients drops below the maximum. The pool abstraction is closed again and it's possible to omit explicit references to the pool size.
var useStreams = function () {
var queueRequests = function (remaining) {
var full = false
pool.once('drain', function() {
if (remaining) queueRequests(remaining)
})
while(!full && --remaining >= 0) {
console.log("Sending request...")
full = !pool.acquire(function (err, client) {
if (err) {
console.log("Error acquiring from pool")
return
}
console.log("Handling request with client " + client)
setTimeout(pool.release, 1000, client)
})
}
}
queueRequests(10)
console.log("Finished!")
}
Fibers
An alternative solution can be obtained by providing a blocking abstraction on top of the queue. The fibers module exposes coroutines that are implemented in C++. By using fibers, it's possible to block an execution context without blocking the node event loop. While I find this approach to be quite elegant, it is often overlooked in the node community because of a curious aversion to all things synchronous-looking. Notice that, excluding the callcc utility, the actual loop logic is wonderfully concise.
/* This is the call-with-current-continuation found in Scheme and other
* Lisps. It captures the current call context and passes a callback to
* resume it as an argument to the function. Here, I've modified it to fit
* JavaScript and node.js paradigms by making it a method on Function
* objects and using function (err, result) style callbacks.
*/
Function.prototype.callcc = function(context /* args... */) {
var that = this,
caller = Fiber.current,
fiber = Fiber(function () {
that.apply(context, Array.prototype.slice.call(arguments, 1).concat(
function (err, result) {
if (err)
caller.throwInto(err)
else
caller.run(result)
}
))
})
process.nextTick(fiber.run.bind(fiber))
return Fiber.yield()
}
var useFibers = function () {
var remaining = 10
while(--remaining >= 0) {
console.log("Sending request...")
try {
client = pool.acquire.callcc(this)
console.log("Handling request with client " + client);
setTimeout(pool.release, 1000, client)
} catch (x) {
console.log("Error acquiring from pool")
}
}
console.log("Finished!")
}
Conclusion
There are a number of correct ways to approach the problem. However, for library authors or applications that require a single pool to be shared in many contexts it is best to properly encapsulate the pool. Doing so helps prevent errors and produces cleaner, more modular code. Preventing unconstrained queuing then becomes an evented dance or a coroutine pattern. I hope this answer dispels a lot of FUD and confusion around blocking-style code and asynchronous behavior and encourages you to write code which makes you happy.
yes, you trying to queue 1000000 requests before even starting them. This version keeps limited number of request (100):
function do_1000000_req( cb )
{
num_active = 0;
num_finished = 0;
num_sheduled = 0;
function shedule()
{
while (num_active < 100 && num_sheduled < 1000000) {
num_active++;
num_sheduled++;
mypost(function() {
num_active--;
num_finished++;
if (num_finished == 1000000)
{
cb();
return;
} else if (num_sheduled < 1000000)
shedule();
});
}
}
}
do_1000000_req( function() {
console.log('done!');
});
the node-pool module can help you. For more détails, see this post (in french), http://blog.touv.fr/2011/08/http-request-loop-in-nodejs.html

Resources