nodejs - writing strings to socket takes much time - node.js

I heard that "Writing strings to socket takes more time in nodejs because core modules does not allow copying data directly to the socket but it requires intermediate copy in memory before going to socket". I heard this line saying by Ryan Dahl himself in an interview. I will post the link once it is found.
please correct me if I am wrong in understanding any of these, thanks.
My question is - can we skip this intermediate copying issue by modifying any code in core modules of the node ? I have experienced 5-6 seconds lag in my server when it copies bulky/large/huge strings to 150+ sockets.
I am trying to minimize the amount of data to broadcst but still on the other hand can we optimize this copying of strings to socket ?
As per the comment, adding more contents.
Example of what I am doing -
I am broadcasting leaderboard of n(>100) users[all these are in one room]. It is in JSON format. "leaderboard" is an array of players. Every object of player contains name,email,profile_pic_url,score,rank.
All objects are in json format.
user information is fetched from redis, and then rank is calculated. Then this leaderboard is broadcasted in the room.
Above operation happens every 2 seconds. So after first successful broadcast, I can see a lag.
adding the code - I am using
socket.io for accepting the connections
redis store
room feature of socket.io
code -
io.sockets.in(RoomID).emit(StateName, LeaderboardObject);

can we skip this intermediate copying issue by modifying any code in core modules of the node ?
No.
You're using Socket.IO where the bulk of the work happens in JavaScript, not in a compiled extension. Even if you did find a way to get around the buffer copying, you wouldn't be able to use it in this case.
I suggest posting a separate question, asking about the actual speed problem you are having and ways to optimize your code.

Related

In socket i/o, how to send a message from an array of messages to respective room in an array of rooms without any loop?

socket i/o: I am using nodejs in my php application. I have an array of connected rooms say rooms={room1,room2,room3} and an array of respective message say messages={message1, message2, message3}.
Till now i am emitting my message as below:
io.sockets.in('room1').emit('message', 'message1');
This works fine but i am worried that once number of rooms will increase, looping over all will give performance hit as well as big delays.
Is there any way that i can directly send an array of message to an array of room like following?:
io.sockets.in(rooms).emit('message', messages);
which eventually should send message1 to room1... respectively.
thank you.!
I don't know of a built-in method to do this. However, I also don't think it's necessary for you yet. Here's why I think you should just stick to using a loop for now.
How do you think your suggested calls would work? At some point, somebody's code is going to have to create a loop to touch every room and every message. A 3rd party library might have optimized their loop (or it might be really buggy and slow, just depends), but it's still going to have to loop.
In a lot of cases it is better to get your features working first and then go back to evaluate and fix slow points in your code. Read here for a lot more guidance on this topic. Note that I'm not advising you to blindly code without thinking ahead, just to not sweat the small stuff. I especially appreciate the way the second answer in the link suggests appropriate considerations for each stage of development.

Streaming output from program to an arbitrary number of programs under Linux?

How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream, but the programs reading the stream do block if there's no output from the first-mentioned program?
I've been trying to Google around for a while now, but all I find is methods where the program does block if nothing is reading the stream.
How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream
Your requirements as stated can not possibly be satisfied without some form of a buffer.
Most straightforward option is to write the output to the file and let consumers read that file.
Another option is to have a ring-buffer in a form of a memory mapped file. As the capacity of a ring-buffer is normally fixed there needs to be a policy for dealing with slow consumers. Options are: block the producer; terminate the slow consumer; let the slow consumer somehow recover when it missed data.
Many years ago I wrote something like what you describe for an audio stream processing app (http://hewgill.com/nwr/). It's on github as splitter.cpp and has a small man page.
The splitter program currently does not support dynamically changing the set of output programs. The output programs are fixed when the command is started.
Without knowing exactly what sort of data you are talking about (how large is the data, what format is it, etc, etc) it is hard to come up with a concrete answer. Let's say for example you want a "ticker-tape" application that sends out information for share purchases on the stock exchange, you could quite easily have a server that accepts a socket from each application, starts a thread and sends the relevant data as it appears from the recoder at the stock market. I'm not aware of any "multiplexer" that exists today (but Greg's one may be a starting point). If you use (for example) XML to package the data, you could send the second half of a packet, and the client code would detect that it's not complete, so throws it away.
If, on the other hand, you are sending out high detail live update weather maps for the whole country, the data is probably large enough that you don't want to wait for a full new one to arrive, so you need some sort of lock'n'load protocol that sets the current updated map, and then sends that one out until (say) 1 minute later you have a new one. Again, it's not that complex to write some code to do this, but it's quite a different set of code to the "ticker tape" solution above, because the packet of data is larger, and getting "half a packet" is quite wasteful and completely useless.
If you are streaming live video from the 2016 Olympics in Brazil, then you probably want a further diffferent solution, as timing is everything with video, and you need the client to buffer, pick up key-frames, throw away "stale" frames, etc, etc, and the server will have to be different.

Nodejs - How to maintain a global datastructure

So I have a backend implementation in node.js which mainly contains a global array of JSON objects. The JSON objects are populated by user requests (POSTS). So the size of the global array increases proportionally with the number of users. The JSON objects inside the array are not identical. This is a really bad architecture to begin with. But I just went with what I knew and decided to learn on the fly.
I'm running this on a AWS micro instance with 6GB RAM.
How to purge this global array before it explodes?
Options that I have thought of:
At a periodic interval write the global array to a file and purge. Disadvantage here is that if there are any clients in the middle of a transaction, that transaction state is lost.
Restart the server every day and write the global array into a file at that time. Same disadvantage as above.
Follow 1 or 2, and for every incoming request - if the global array is empty look for the corresponding JSON object in the file. This seems absolutely absurd and stupid.
Somehow I can't think of any other solution without having to completely rewrite the nodejs application. Can you guys think of any .. ? Will greatly appreciate any discussion on this.
I see that you are using memory as a storage. If that is the case and your code is synchronous (you don't seem to use database, so it might), then actually solution 1. is correct. This is because JavaScript is single-threaded, which means that when one code is running the other cannot run. There is no concurrency in JavaScript. This is only a illusion, because Node.js is sooooo fast.
So your cleaning code won't fire until the transaction is over. This is of course assuming that your code is synchronous (and from what I see it might be).
But still there are like 150 reasons for not doing that. The most important is that you are reinventing the wheel! Let the database do the hard work for you. Using proper database will save you all the trouble in the future. There are many possibilites: MySQL, PostgreSQL, MongoDB (my favourite), CouchDB and many many other. It shouldn't matter at this point which one. Just pick one.
I would suggest that you start saving your JSON to a non-relational DB like http://www.couchbase.com/.
Couchbase is extremely easy to setup and use even in a cluster. It uses a simple key-value design so saving data is as simple as:
couchbaseClient.set("someKey", "yourJSON")
then to retrieve your data:
data = couchbaseClient.set("someKey")
The system is also extremely fast and is used by OMGPOP for Draw Something. http://blog.couchbase.com/preparing-massive-growth-revisited

Is there a Node.js data store that doesn't need to be installed for a small project that works on both *NIX and Windows?

I have a small project that I was using node-dirty for, but it's not really for production use and I've had way to many surprises with it so I would like to switch. I was looking at using sqlite, but compiling a client for it seems troublesome. Is there something like node-dirty (i.e. a pure Node.js implementation of a data store), but that's more suited for a small project that doesn't have more than a few hundred sets of data. I've faced the following problems with node-dirty that I would expect an altenrative data store not to do:
Saving a Date object makes it come out as a string when reloading the data (but during execution it remains a Date object). I'm fine with having to serialize the Date object myself, as long as I get out the same thing it lets me put into it.
Iterating over data and deleting something in the same forEach loop makes the iteration stop.
My client is reporting deleted data re-appearing and I've intermittently seen this too, I have no idea why.
How much data do you have? For some projects it's reasonable to just have things in memory and persist them by dumping a JSON file with all the data.
Just use npm to install a NoSQL module like redis or mongodb.

Returning LOTS of items from a MongoDB via Node.js

I'm returning A LOT (500k+) documents from a MongoDB collection in Node.js. It's not for display on a website, but rather for data some number crunching. If I grab ALL of those documents, the system freezes. Is there a better way to grab it all?
I'm thinking pagination might work?
Edit: This is already outside the main node.js server event loop, so "the system freezes" does not mean "incoming requests are not being processed"
After learning more about your situation, I have some ideas:
Do as much as you can in a Map/Reduce function in Mongo - perhaps if you throw less data at Node that might be the solution.
Perhaps this much data is eating all your memory on your system. Your "freeze" could be V8 stopping the system to do a garbage collection (see this SO question). You could Use V8 flag --trace-gc to log GCs & prove this hypothesis. (thanks to another SO answer about V8 and Garbage collection
Pagination, like you suggested may help. Perhaps even splitting up your data even further into worker queues (create one worker task with references to records 1-10, another with references to records 11-20, etc). Depending on your calculation
Perhaps pre-processing your data - ie: somehow returning much smaller data for each record. Or not using an ORM for this particular calculation, if you're using one now. Making sure each record has only the data you need in it means less data to transfer and less memory your app needs.
I would put your big fetch+process task on a worker queue, background process, or forking mechanism (there are a lot of different options here).
That way you do your calculations outside of your main event loop and keep that free to process other requests. While you should be doing your Mongo lookup in a callback, the calculations themselves may take up time, thus "freezing" node - you're not giving it a break to process other requests.
Since you don't need them all at the same time (that's what I've deduced from you asking about pagination), perhaps it's better to separate those 500k stuff into smaller chunks to be processed at the nextTick?
You could also use something like Kue to queue the chunks and process them later (thus not everything in the same time).

Resources