Using the following code in NodeJs:
const { fork } = require('child_process');
const thread = fork(path.join(__dirname, '/thread.js'));
thread.on('message', (results) => {
console.log('RES', results.length);
if (results.error) {
res.send({ error: true, message: results.message });
return;
}
res.send(results.data);
});
thread.on('error', (err) => {
console.log(err);
});
thread.send({ data: JSON.stringify(dataToProcess) });
thread.on('exit', () => {
if (thread) {
thread.kill();
return;
}
});
When sending larger (1.5mb) messages from child to parent, it doesn't send anything. Smaller messages are sent without issue. Is there some hard limit? If so, can it be increased?
Testing different payload sizes now:
1mb - fails to send
0.5mb - fails
0.2mb - fails
0.15mb - ok
0.1mb - ok
On windows its OK....linux seems to have a limit
I wasn't able to figure it out, but I did change the fork to web-workers and it seemed to send out larger payloads. I don't know the full repercussions yet, but I'll continue to monitor it
Related
I have a node js service that consumes messages from Kafka and processes it through various steps of transformation logic. During the processing, services use Redis and mongo for storage and caching purposes. In the end, it sends the transformed message to another destination via UDP packets.
On startup, it starts consuming message from Kafka after a while, it crashes down with the unhandled error: ERR_CANNOT_SEND unable to send data(see below picture).
restarting the application resolves the issue temporarily.
I initially thought it might have to do with the forwarding through UDP sockets, but the forwarding destinations are reachable from the consumer!
I'd appreciate any help here. I'm kinda stuck here.
Consumer code:
const readFromKafka = ({host, topic, source}, transformationService) => {
const logger = createChildLogger(`kafka-consumer-${topic}`);
const options = {
// connect directly to kafka broker (instantiates a KafkaClient)
kafkaHost: host,
groupId: `${topic}-group`,
protocol: ['roundrobin'], // and so on the other kafka config.
};
logger.info(`starting kafka consumer on ${host} for ${topic}`);
const consumer = new ConsumerGroup(options, [topic]);
consumer.on('error', (err) => logger.error(err));
consumer.on('message', async ({value, offset}) => {
logger.info(`recieved ${topic}`, value);
if (value) {
const final = await transformationService([
JSON.parse(Buffer.from(value, 'binary').toString()),
]);
logger.info('Message recieved', {instanceID: final[0].instanceId, trace: final[1]});
} else {
logger.error(`invalid message: ${topic} ${value}`);
}
return;
});
consumer.on('rebalanced', () => {
logger.info('cosumer is rebalancing');
});
return consumer;
};
Consumer Service startup and error handling code:
//init is the async function used to initialise the cache and other config and components.
const init = async() =>{
//initialize cache, configs.
}
//startConsumer is the async function that connects to Kafka,
//and add a callback for the onMessage listener which processes the message through the transformation service.
const startConsumer = async ({ ...config}) => {
//calls to fetch info like topic, transformationService etc.
//readFromKafka function defn pasted above
readFromKafka( {topicConfig}, transformationService);
};
init()
.then(startConsumer)
.catch((err) => {
logger.error(err);
});
Forwarding code through UDP sockets.
Following code throws the unhandled error intermittently as this seemed to work for the first few thousands of messages, and then suddenly it crashes
const udpSender = (msg, destinations) => {
return Object.values(destinations)
.map(({id, host, port}) => {
return new Promise((resolve) => {
dgram.createSocket('udp4').send(msg, 0, msg.length, port, host, (err) => {
resolve({
id,
timestamp: Date.now(),
logs: err || 'Sent succesfully',
});
});
});
});
};
Based on our comment exchange, I believe the issue is just that you're running out of resources.
Throughout the lifetime of your app, every time you send a message you open up a brand new socket. However, you're not doing any cleanup after sending that message, and so that socket stays open indefinitely. Your open sockets then continue to pile up, consuming resources, until you eventually run out of... something. Perhaps memory, perhaps ports, perhaps something else, but ultimately your app crashes.
Luckily, the solution isn't too convoluted: just reuse existing sockets. In fact, you can just reuse one socket for the entirety of the application if you wanted, as internally socket.send handles queueing for you, so no need to do any smart hand-offs. However, if you wanted a little more concurrency, here's a quick implementation of a round-robin queue where we've created a pool of 10 sockets in advance which we just grab from whenever we want to send a message:
const MAX_CONCURRENT_SOCKETS = 10;
var rrIndex = 0;
const rrSocketPool = (() => {
var arr = [];
for (let i = 0; i < MAX_CONCURRENT_SOCKETS; i++) {
let sock = dgram.createSocket('udp4');
arr.push(sock);
}
return arr;
})();
const udpSender = (msg, destinations) => {
return Object.values(destinations)
.map(({ id, host, port }) => {
return new Promise((resolve) => {
var sock = rrSocketPool[rrIndex];
rrIndex = (rrIndex + 1) % MAX_CONCURRENT_SOCKETS;
sock.send(msg, 0, msg.length, port, host, (err) => {
resolve({
id,
timestamp: Date.now(),
logs: err || 'Sent succesfully',
});
});
});
});
};
Be aware that this implementation is still naïve for a few reasons, mostly because there's still no error handling on the sockets themselves, only on their .send method. You should look at the docs for more info about catching events such as error events, especially if this is a production server that's supposed to run indefinitely, but basically the error-handling you've put inside your .send callback will only work... if an error occurs in a call to .send. If between sending messages, while your sockets are idle, some system-level error outside of your control occurs and causes your sockets to break, your socket may then emit an error event, which will go unhandled (like what's happening in your current implementation, with the intermittent errors that you see prior to the fatal one). At that point they may now be permanently unusable, meaning they should be replaced/reinstated or otherwise dealt with (or alternatively, just force the app to restart and call it a day, like I do :-) ).
twilioClient.chat.services(service_SID)
.channels
.each(channels => console.log(channels.sid));
From the above code, how can I check if the request is Success or not.
What I tried is :
twilioClient.chat.services(service_SID)
.channels
.each(channels => console.log(channels.sid))
.then(function (err, docs) {
if (err) {
//console.log('error ' + err);
return res.status(500).send('Problem in retrieving channels');
}
res.status(200).json({
message: 'Channels retrieved sucessfully',
docs: docs
});
})
I need something like this to know the response. Did I need to promise?. I dont know about promise yet. Can someone pls provide an example or tutorial.
Twilio developer evangelist here.
When using the each function to map over the remote resource, it's not using a Promise. each expects to work. However, you can provide a function to each that can be called once the request is complete or if there is an error. You can pass that function as the option done in the second argument. Here is how you would do that:
twilioClient.chat.services(service_SID)
.channels
.each((channel => console.log(channel.sid)), { done: error => {
if (error) {
console.error("There was an error loading the channels.", error);
} else {
console.log("All the channels were successfully loaded.")
}
});
If you are looking to load the channels in one go, then each might not be the right function for you. You can also use list which returns the list of channels rather than a channel at a time. For example:
twilioClient.chat.services(service_SID)
.channels
.list({ limit: 50 }, (error, channels) => {
if (error) {
console.error("There was an error loading the channels.", error);
} else {
console.log("Here are your channels: ", channels);
}
});
Let me know if that helps at all.
I am working on a crawler. I have a list of URL need to be requested. There are several hundreds of request at the same time if I don't set it to be async. I am afraid that it would explode my bandwidth or produce to much network access to the target website. What should I do?
Here is what I am doing:
urlList.forEach((url, index) => {
console.log('Fetching ' + url);
request(url, function(error, response, body) {
//do sth for body
});
});
I want one request is called after one request is completed.
You can use something like Promise library e.g. snippet
const Promise = require("bluebird");
const axios = require("axios");
//Axios wrapper for error handling
const axios_wrapper = (options) => {
return axios(...options)
.then((r) => {
return Promise.resolve({
data: r.data,
error: null,
});
})
.catch((e) => {
return Promise.resolve({
data: null,
error: e.response ? e.response.data : e,
});
});
};
Promise.map(
urls,
(k) => {
return axios_wrapper({
method: "GET",
url: k,
});
},
{ concurrency: 1 } // Here 1 represents how many requests you want to run in parallel
)
.then((r) => {
console.log(r);
//Here r will be an array of objects like {data: [{}], error: null}, where if the request was successfull it will have data value present otherwise error value will be non-null
})
.catch((e) => {
console.error(e);
});
The things you need to watch for are:
Whether the target site has rate limiting and you may be blocked from access if you try to request too much too fast?
How many simultaneous requests the target site can handle without degrading its performance?
How much bandwidth your server has on its end of things?
How many simultaneous requests your own server can have in flight and process without causing excess memory usage or a pegged CPU.
In general, the scheme for managing all this is to create a way to tune how many requests you launch. There are many different ways to control this by number of simultaneous requests, number of requests per second, amount of data used, etc...
The simplest way to start would be to just control how many simultaneous requests you make. That can be done like this:
function runRequests(arrayOfData, maxInFlight, fn) {
return new Promise((resolve, reject) => {
let index = 0;
let inFlight = 0;
function next() {
while (inFlight < maxInFlight && index < arrayOfData.length) {
++inFlight;
fn(arrayOfData[index++]).then(result => {
--inFlight;
next();
}).catch(err => {
--inFlight;
console.log(err);
// purposely eat the error and let the rest of the processing continue
// if you want to stop further processing, you can call reject() here
next();
});
}
if (inFlight === 0) {
// all done
resolve();
}
}
next();
});
}
And, then you would use that like this:
const rp = require('request-promise');
// run the whole urlList, no more than 10 at a time
runRequests(urlList, 10, function(url) {
return rp(url).then(function(data) {
// process fetched data here for one url
}).catch(function(err) {
console.log(url, err);
});
}).then(function() {
// all requests done here
});
This can be made as sophisticated as you want by adding a time element to it (no more than N requests per second) or even a bandwidth element to it.
I want one request is called after one request is completed.
That's a very slow way to do things. If you really want that, then you can just pass a 1 for the maxInFlight parameter to the above function, but typically, things would work a lot faster and not cause problems by allowing somewhere between 5 and 50 simultaneous requests. Only testing would tell you where the sweet spot is for your particular target sites and your particular server infrastructure and amount of processing you need to do on the results.
you can use set timeout function to process all request within loop. for that you must know maximum time to process a request.
I have millions of rows in my Cassandra db that I want to stream to the client in a zip file (don't want a potentially huge zip file in memory). I am using the stream() function from the Cassandra-Node driver, piping to a Transformer which extracts the one field from each row that I care about and appends a newline, and pipes to archive which pipes to the Express Response object. This seems to work fine but I can't figure out how to properly handle errors during streaming. I have to set the appropriate headers/status before streaming for the client, but if there is an error during the streaming, on the dbStream for example, I want to clean up all of the pipes and reset the response status to be something like 404. But If I try to reset the status after the headers are set and the streaming starts, I get Can't set headers after they are sent. I've looked all over and can't find how to properly handle errors in Node when piping/streaming to the Response object. How can the client tell if valid data was actually streamed if I can't send a proper response code on error? Can anyone help?
function streamNamesToWriteStream(query, res, options) {
return new Promise((resolve, reject) => {
let success = true;
const dbStream = db.client.stream(query);
const rowTransformer = new Transform({
objectMode: true,
transform(row, encoding, callback) {
try {
const vote = row.name + '\n';
callback(null, vote);
} catch (err) {
callback(null, err.message + '\n');
}
}
});
// Handle res events
res.on('error', (err) => {
logger.error(`res ${res} error`);
return reject(err);
});
dbStream.on('error', function(err) {
res.status(404).send() // Can't set headers after they are sent.
logger.debug(`dbStream error: ${err}`);
success = false;
//res.end();
//return reject(err);
});
res.writeHead(200, {
'Content-Type': 'application/zip',
'Content-disposition': 'attachment; filename=myFile.zip'
});
const archive = archiver.create('zip');
archive.on('error', function(err) { throw err; });
archive.on('end', function(err) {
logger.debug(`Archive done`);
//res.status(404).end()
});
archive.pipe(res, {
//end:false
});
archive.append(dbStream.pipe(rowTransformer), { name: 'file1.txt' });
archive.append(dbStream.pipe(rowTransformer), { name: 'file1.txt' });
archive.finalize();
});
}
Obviously it's too late to change the headers, so there's going to have to be application logic to detect a problem. Here's some ideas I have:
Write an unambiguous sentinel of some kind at the end of the stream when an error occurs. The consumer of the zip file will then need to look for that value to check for a problem.
Perhaps more simply, have the consumer execute a verification on the integrity of the zip archive. Presumably if the stream fails the zip will be corrupted.
I've asked some general questions around this topic before (node and blocking). This time the question is a little more specific.
Let's say I've got a node/express app which has a handle that is accepting HTTP requests (doesn't matter, say they're simple GETs).
And it has a separate handler which reads messages off of a RabbitMQ queue, as they arrive, and then does a read from Mongo (Mongo is on a different machine), followed by a write.
If Mongo was "very" busy, would/could that cause the HTTP handler to appear unavailable?
I'm using the Mongo native driver. I would think any blocking that is occurring while the Mongo driver waits for a response from the server would have Node happily accepting and handling HTTP requests, but I don't know for sure.
In a related scenario, swap-out a busy Mongo for a handler that reads a Rabbit message and PUTs a record into a "very" busy ElasticSearch. Will that cause issues with the HTTP handler?
I'd go straight to testing it, but that's a little tricky and gets expensive testing every time I'm not sure what the theory is. So I thought I'd ask.
Here's a (simplified) example of the code:
// HTTP handler...
app.post('/eventcapture/event', (req: express.Request, res: express.Response) => {
var evt: eventDS.IEvent = ('TypeID' in req.body) ? req.body : JSON.parse(req.body);
//create an id
evt._id = uuid.v4();
bus.Publish(evt)
.then((success) => {
res.jsonp(200, { success: true });
})
.catch((failReason:Error) => {
console.error('[ERROR] - Failure writing event: %s,%s', failReason.name, failReason.message);
logError(failReason, evt);
res.jsonp(500, { success: false, reason: failReason });
});
});
// We generically define additional handlers in an array, and then kick them off with a loop.
// Here we have one handler which reads an event, goes to mongo to get additional data which
// it adds into the event before publishing it back out. And a second handler which will catch
// these "augmented" events and push them into Mongo
var processes = [
{
enabled: true,
name: 'augmenter',
inType: 'EventCapture:RawEvent',
handler: (event: eventDS.IEvent) => {
console.log('[LOG] - augment event: %s', event._id);
Profile.FindOne({ _id: event.User.ProfileID })
.then((profile) => {
if (profile) {
console.log('[LOG] - found Profile: %s', profile._id);
event.User.Email = profile.PersonalDetail.Email;
//other values also...
//change the TypeID for publishing
event.TypeID = 'EventCapture:AugmentedEvent';
return event;
}
else throw new Error(util.format('unable to find profile: %s', event.User.ProfileID));
})
.then((augmentedEvent) => bus.Publish(augmentedEvent)) //publish the event back out
.catch((failReason:Error) => {
console.error('[ERROR] - failure publishing augmented event: %s, %s, %s', event._id, failReason.name, failReason.message);
logError(failReason, event);
});
}
},
{
enabled: true,
name: 'mongo',
inType: 'EventCapture:AugmentedEvent',
handler: (event: eventDS.IEvent) => {
console.log('[LOG] - push to mongo: %s', event.User.ProfileID);
Event.Save(event, { safe: true })
.then((success) => console.log('[LOG] - pushed to mongo: %s', event._id))
.catch((failReason:Error) => {
console.error('[ERROR] - failure pushing to mongo: %s, %s', event._id, failReason);
logError(failReason, event);
});
}
}
];
processes.forEach((process, idx, allProcesses) => {
if (process.enabled) {
bus.Subscribe(process.name, process.inType, process.handler);
}
});
No. This is the awesomeness of async programming. Node can do other things while it waits for mongodb to get back to it. You can assume that popular node modules like mongodb write things in an async fashion.
Here's a video that goes into a lot of detail about the event loop: http://vimeo.com/96425312?utm_source=nodeweekly&utm_medium=email
At the end of the day, things like the mongo driver are written using node's low level io and network libraries. These libraries enforce async flow. The author of a package would have to go out of her way to make it sync.