Serving Images over Websockets with NodeJS & SocketIO - node.js

I am trying to develop a very simple image server with NodeJS & SocketIO. A project I am working on requires me to load several hundred images on page-load (customer requirement). Currently, a HTTP request is made for each image via use of the HTML "img" tag. With the reduced latency and overall efficiency of websockets compared to HTTP or Ajax, I was hoping to improve performance by sending images over websockets instead.
Unfortunately, reading images from the server's file-system with NodeJS and sending them over websockets with SocketIO has been significantly slower than the traditional HTTP requests served over Apache. Below is my server code:
var express = require('express'),
app = express(),
http = require('http'),
fs = require("fs"),
mime = require('mime'),
server = http.createServer(app),
io = require('socket.io').listen(server);
server.listen(151);
io.sockets.on('connection',function(socket){
socket.emit('connected');
socket.on('getImageData',function(file,callback){
var path = 'c:/restricted_dir/'+file;
fs.readFile(path,function(err,data){
if (!err){
var prefix = "data:" + mime.lookup(path) + ";base64,";
var base64Image = prefix+data.toString('base64');
socket.emit('imageData',data,callback);
}
});
});
});
I have also tried buffering with "createReadStream", but I saw no significant speed improvements with this. I should also note that it is desirable to receive the image data as a Base64-encoded dataURI so I can simply throw that into the "src" attribute of the "img" tag. I understand Base64 means roughly a 30% increase in the data's size, but even when using binary image data, it still takes about 10 times longer than HTTP.
EDIT:
I suppose the real question here is, "are websockets really the best way to serve static files?" After further thought and additional reading, I strongly suspect the issue here is related to parallel processing. Since NodeJS operates on a single thread, maybe it is not the best solution for serving all these static image files? Does anyone have any thoughts on this?

Browsers usually open multiple connections to the same server to perform requests in parallel, and can also perform multiple requests per single connection, whereas you only have one websocket connection.
Also, the combo fs.readFile()/Base64-encode/socket.emit() introduces a significant overhead, where a regular httpd can use system calls like sendfile() and don't even have to touch the file contents before they are being sent to the client.
The single-threaded nature of Node isn't an issue here, because Node can do I/O (which is what you're doing, minus the Base64-encoding) really well.
So I would say that websockets aren't very suitable for static file serving :)

Related

Node app that fetches, processes, and formats data for consumption by a frontend app on another server

I currently have a frontend-only app that fetches 5-6 different JSON feeds, grabs some necessary data from each of them, and then renders a page based on said data. I'd like to move the data fetching / processing part of the app to a server-side node application which outputs one simple JSON file which the frontend app can fetch and easily render.
There are two noteworthy complications for this project:
1) The new backend app will have to live on a different server than its frontend counterpart
2) Some of the feeds change fairly often, so I'll need the backend processing to constantly check for changes (every 5-10 seconds). Currently with the frontend-only app, the browser fetches the latest versions of the feeds on load. I'd like to replicate this behavior as closely as possible
My thought process for solving this took me in two directions:
The first is to setup an express application that uses setTimeout to constantly check for new data to process. This data is then sent as a response to a simple GET request:
const express = require('express');
let app = express();
let processedData = {};
const getData = () => {...} // returns a promise that fetches and processes data
/* use an immediately invoked function with setTimeout to fetch the data
* when the program starts and then once every 5 seconds after that */
(function refreshData() {
getData.then((data) => {
processedData = data;
});
setTimeout(refreshData, 5000);
})();
app.get('/', (req, res) => {
res.send(processedData);
});
app.listen(port, () => {
console.log(`Started on port ${port}`);
});
I would then run a simple get request from the client (after properly adjusting CORS headers) to get the JSON object.
My questions about this approach are pretty generic: Is this even a good solution to this problem? Will this drive up hosting costs based on processing / client GET requests? Is setTimeout a good way to have a task run repeatedly on the server?
The other solution I'm considering would deal with setting up an AWS Lambda that writes the resulting JSON to an s3 bucket. It looks like the minimum interval for scheduling an AWS Lambda function is 1 minute, however. I imagine I could set up 3 or 4 identical Lambda functions and offset them by 10-15 seconds, however that seems so hacky that it makes me physically uncomfortable.
Any suggestions / pointers / solutions would be greatly appreciated. I am not yet a super experienced backend developer, so please ELI5 wherever you deem fit.
A few pointers.
Use crontasks for periodic processing of data. This is far preferable especially if you are formatting a lot of data.
Don't setup multiple Lambda functions for the same task. It's going to be messy to maintain all those functions.
After processing / fetching the feed, you can store the JSON file in your own server or S3. Note that if it's S3, then you are paying and waiting for a network operation. You can read the file from your express app and just send the response back to your clients.
Depending on the file size and your load in the server you might want to add a caching server so that you can cache the response until new JSON data is available.

What is the most efficient way of sending files between NodeJS servers?

Introduction
Say that on the same local network we have two Node JS servers set up with Express: Server A for API and Server F for form.
Server A is an API server where it takes the request and saves it to MongoDB database (files are stored as Buffer and their details as other fields)
Server F serves up a form, handles the form post and sends the form's data to Server A.
What is the most efficient way to send files between two NodeJS servers where the receiving server is Express API? Where does the file size matter?
1. HTTP Way
If the files I'm sending are PDF files (that won't exceed 50mb) is it efficient to send the whole contents as a string over HTTP?
Algorithm is as follows:
Server F handles the file request using https://www.npmjs.com/package/multer and saves the file
then Server F reads this file and makes an HTTP request via https://github.com/request/request along with some details on the file
Server A receives this request and turns the file contents from string to Buffer and saves a record in MongoDB along with the file details.
In this algorithm, both Server A (when storing into MongoDB) and Server F (when it was sending it over to Server A) have read the file into the memory, and the request between the two servers was about the same size as the file. (Are 50Mb requests alright?)
However, one thing to consider is that -with this method- I would be using the ExpressJS style of API for the whole process and it would be consistent with the rest of the app where the /list, /details requests are also defined in the routes. I like consistency.
2. Socket.IO Way
In contrast to this algorithm, I've explored https://github.com/nkzawa/socket.io-stream way which broke away from the consistency of the HTTP API on Server A (as the handler for socket.io events are defined not in the routes but the file that has var server = http.createServer(app);).
Server F handles the form data as such in routes/some_route.js:
router.post('/', multer({dest: './uploads/'}).single('file'), function (req, res) {
var api_request = {};
api_request.name = req.body.name;
//add other fields to api_request ...
var has_file = req.hasOwnProperty('file');
var io = require('socket.io-client');
var transaction_sent = false;
var socket = io.connect('http://localhost:3000');
socket.on('connect', function () {
console.log("socket connected to 3000");
if (transaction_sent === false) {
var ss = require('socket.io-stream');
var stream = ss.createStream();
ss(socket).emit('transaction new', stream, api_request);
if (has_file) {
var fs = require('fs');
var filename = req.file.destination + req.file.filename;
console.log('sending with file: ', filename);
fs.createReadStream(filename).pipe(stream);
}
if (!has_file) {
console.log('sending without file.');
}
transaction_sent = true;
//get the response via socket
socket.on('transaction new sent', function (data) {
console.log('response from 3000:', data);
//there might be a better way to close socket. But this works.
socket.close();
console.log('Closed socket to 3000');
});
}
});
});
I said I'd be dealing with PDF files that are < 50Mb. However, if I use this program to send larger files in the future, is socket.io a better way to handle 1GB files as it's using stream?
This method does send the file and the details across but I'm new to this library and don't know if it should be used for this purpose or if there is a better way of utilizing it.
Final thoughts
What alternative methods should I explore?
Should I send the file over SCP and make an HTTP request with file details including where I've sent it- thus, separating the protocols of files and API requests?
Should I always use streams because they don't store the whole file into memory? (that's how they work, right?)
This https://github.com/liamks/Delivery.js ?
References:
File/Data transfer between two node.js servers this got me to try socket-stream way.
transfer files between two node.js servers over http for HTTP way
There are plenty of ways to achieve this , but not so much to do it right !
socket io and wesockets are efficient when you use them with a browser , but since you don't , there is no need for it.
The first method you can try is to use the builtin Net module of nodejs, basically it will make a tcp connection between the servers and pass the data.
you should also keep in mind that you need to send chunks of data not the entire file , the socket.write method of the net module seems to be a good fit for your case check it : https://nodejs.org/api/net.html
But depending on the size of your files and concurrency , memory consumption can be quite large.
if you are running linux on both servers you could even send the files at ground zero with a simple linux command called scp
nohup scp -rpC /var/www/httpdocs/* remote_user#remote_domain.com:/var/www/httpdocs &
You can even do this with windows to linux or the other way.
http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
the client scp for windows is pscp.exe
Hope this helps !

nodejs multithread for the same resource

I'm quite new to nodejs and I'm doing some experiments.
What I get from them (and I hope I'm wrong!) is that nodejs couldn't serve many concurrent requests for the same resource without putting them in sequence.
Consider following code (I use Express framework in the following example):
var express = require('express');
var app = express();
app.get('/otherURL', function (req, res) {
res.send('otherURL!');
});
app.get('/slowfasturl', function (req, res)
{
var test = Math.round(Math.random());
if(test == "0")
{
var i;
setTimeout
(
function()
{
res.send('slow!');
}, 10000
);
}
else
{
res.send('fast!');
}
});
app.listen(3000, function () {
console.log('app listening on port 3000!');
});
The piece of code above exposes two endpoints:
http://127.0.0.1:3000/otherurl , that just reply with "otherURL!" simple text
http://127.0.0.1:3000/slowfasturl , that randomly follow one of the two behaviors below:
scenario 1 : reply immediately with "fast!" simple text
or
scenario 2 : reply after 10 seconds with "slow!" simple text
My test:
I've opened several windows of chrome calling at the same time the slowfasturl URL and I've noticed that the first request that falls in the "scenario 2", causes the blocking of all the other requests fired subsequentely (with other windows of chrome), indipendently of the fact that these ones are fallen into "scenario 1" (and so return "slow!") or "scenario 2" (and so return "fast!"). The requests are blocked at least until the first one (the one falling in the "scenario 2") is not completed.
How do you explain this behavior? Are all the requests made to the same resource served in sequence?
I experience a different behavior if while the request fallen in the "scenario 2" is waiting for the response, a second request is done to another resource (e.g. the otherurl URL explained above). In this case the second request is completed immediately without waiting for the first one
thank you
Davide
As far as I remember, the requests are blocked browser side.
Your browser is preventing those parallel requests but your server can process them. Try in different browsers or using curl and it should work.
The behavior you observe can only be explained through any sequencing which browser does. Node does not service requests in sequence, instead it works on an event driven model, leveraging the libuv framework
I have ran your test case with non-browser client, and confirmed that requests do not influence each other.
To gain further evidence, I suggest the following:
Isolate the problem scope. Remove express (http abstraction) and use either http (base http impl), or even net (TCP) module.
Use non-browser client. I suggest ab (if you are in Linux) - apache benchmarking tool, specifically for web server performance measurement.
I used
ab -t 60 -c 100 http://127.0.0.1:3000/slowfasturl
collect data for 60 seconds, for 100 concurrent clients.
Make it more deterministic by replacing Math.random with a counter, and toggling between a huge timeout and a small timeout.
Check result to see the rate and pattern of slow and fast responses.
Hope this helps.
Davide: This question needs an elaboration, so adding as another answer rather than comment, which has space constraints.
If you are hinting at the current node model as a problem:
Traditional languages (and runtimes) caused code to be run in sequence. Threads were used to scale this but has side effects such as:
i) shared data access need sequencing, ii) I/O operations block. Node is the result of a careful integration between three entities
libuv(multiplexer), v8 (executor), and node (orchestrator) to address those issues. This model ensures improved performance and scalability under web and cloud deployments. So there is no problem with this approach.
If you are hinting at further improvements to manage stressful CPU bound operations in node where there will be waiting period yes, leveraging the multi-core and introducing more threads to share the CPU intensive workload would be the right way.
Hope this helps.

Why should I use Restify?

I had the requirement to build up a REST API in node.js and was looking for a more light-weight framework than express.js which probably avoids the unwanted features and would act like a custom-built framework for building REST APIs. Restify from its intro is recommended for the same case.
Reading Why use restify and not express? seemed like restify is a good choice.
But the surprise came when I tried out both with a load.
I made a sample REST API on Restify and flooded it with 1000 requests per second. Surprise to me the route started not responding after a while. The same app built on express.js handled all.
I am currently applying the load to API via
var FnPush = setInterval(function() {
for(i=0;i<1000;i++)
SendMsg(makeMsg(i));
}, 1000);
function SendMsg(msg) {
var post_data = querystring.stringify(msg);
var post_options = {
host: target.host,
port: target.port,
path: target.path,
agent: false,
method: 'POST',
headers: {
'Content-Type': 'application/x-www-form-urlencoded',
'Content-Length': post_data.length,
"connection": "close"
}
};
var post_req = http.request(post_options, function(res) {});
post_req.write(post_data);
post_req.on('error', function(e) {
});
post_req.end();
}
Does the results I have got seem sensible? And if so is express more efficient than restify in this scenario? Or is there any error in the way I tested them out?
updated in response to comments
behavior of restify
when fed with a load of more than 1000 req.s it stopped processing in just 1 sec receiving till 1015 req.s and then doing nothing. ie. the counter i implemented for counting incoming requests stopped increment after 1015.
when fed with a load of even 100 reqs. per second it received till 1015 and gone non responsive after that.
Corrigendum: this information is now wrong, keep scrolling!
there was an issue with the script causing the Restify test to be conducted on an unintended route. This caused the connection to be kept alive causing improved performance due to reduced overhead.
This is 2015 and I think the situation has changed a lot since. Raygun.io has posted a recent benchmark comparing hapi, express and restify.
It says:
We also identified that Restify keeps connections alive which removes the overhead of creating a connection each time when getting called from the same client. To be fair, we have also tested Restify with the configuration flag of closing the connection. You’ll see a substantial decrease in throughput in that scenario for obvious reasons.
Looks like Restify is a winner here for easier service deployments. Especially if you’re building a service that receives lots of requests from the same clients and want to move quickly. You of course get a lot more bang for buck than naked Node since you have features like DTrace support baked in.
This is 2017 and the latest performance test by Raygun.io comparing hapi, express, restify and Koa.
It shows that Koa is faster than other frameworks, but as this question is about express and restify, Express is faster than restify.
And it is written in the post
This shows that indeed Restify is slower than reported in my initial
test.
According to the Node Knockout description:
restify is a node.js module purpose built to create REST web services in Node. restify makes lots of the hard problems of building such a service, like versioning, error handling and content-negotiation easier. It also provides built in DTrace probes that you get for free to quickly find out where your application’s performance problems lie. Lastly, it provides a robust client API that handles retry/backoff for you on failed connections, along with some other niceties.
Performance issues and bugs can probably be fixed. Maybe that description will be adequate motivation.
I ran into a similar problem benchmarking multiple frameworks on OS X via ab. Some of the stacks died, consistently, after around the 1000th request.
I bumped the limit significantly, and the problem disappeared.
You can you check your maxfiles is at with ulimit, (or launchctl limit < OS X only) and see what the maximum is.
Hope that helps.
In 2021, benchmarking done by Fastify (https://www.fastify.io/benchmarks/) indicates that Restify is now slightly faster than Express.
The code used to run the benchmark can be found here https://github.com/fastify/benchmarks/.
i was confused with express or restify or perfectAPI. even tried developing a module in all of them. the main requirement was to make a RESTapi. but finally ended up with express, tested my self with the request per second made on all the framework, the express gave better result than others. Though in some cases restify outshines express but express seams to win the race. I thumbs up for express. And yes i also came across locomotive js, some MVC framework build on top of express. If anyone looking for complete MVC app using express and jade, go for locomotive.

Differences between socket.io and websockets

What are the differences between socket.io and websockets in
node.js?
Are they both server push technologies?
The only differences I felt was,
socket.io allowed me to send/emit messages by specifying an event name.
In the case of socket.io a message from server will reach on all clients, but for the same in websockets I was forced to keep an array of all connections and loop through it to send messages to all clients.
Also,
I wonder why web inspectors (like Chrome/firebug/fiddler) are unable to catch these messages (from socket.io/websocket) from server?
Please clarify this.
Misconceptions
There are few common misconceptions regarding WebSocket and Socket.IO:
The first misconception is that using Socket.IO is significantly easier than using WebSocket which doesn't seem to be the case. See examples below.
The second misconception is that WebSocket is not widely supported in the browsers. See below for more info.
The third misconception is that Socket.IO downgrades the connection as a fallback on older browsers. It actually assumes that the browser is old and starts an AJAX connection to the server, that gets later upgraded on browsers supporting WebSocket, after some traffic is exchanged. See below for details.
My experiment
I wrote an npm module to demonstrate the difference between WebSocket and Socket.IO:
https://www.npmjs.com/package/websocket-vs-socket.io
https://github.com/rsp/node-websocket-vs-socket.io
It is a simple example of server-side and client-side code - the client connects to the server using either WebSocket or Socket.IO and the server sends three messages in 1s intervals, which are added to the DOM by the client.
Server-side
Compare the server-side example of using WebSocket and Socket.IO to do the same in an Express.js app:
WebSocket Server
WebSocket server example using Express.js:
var path = require('path');
var app = require('express')();
var ws = require('express-ws')(app);
app.get('/', (req, res) => {
console.error('express connection');
res.sendFile(path.join(__dirname, 'ws.html'));
});
app.ws('/', (s, req) => {
console.error('websocket connection');
for (var t = 0; t < 3; t++)
setTimeout(() => s.send('message from server', ()=>{}), 1000*t);
});
app.listen(3001, () => console.error('listening on http://localhost:3001/'));
console.error('websocket example');
Source: https://github.com/rsp/node-websocket-vs-socket.io/blob/master/ws.js
Socket.IO Server
Socket.IO server example using Express.js:
var path = require('path');
var app = require('express')();
var http = require('http').Server(app);
var io = require('socket.io')(http);
app.get('/', (req, res) => {
console.error('express connection');
res.sendFile(path.join(__dirname, 'si.html'));
});
io.on('connection', s => {
console.error('socket.io connection');
for (var t = 0; t < 3; t++)
setTimeout(() => s.emit('message', 'message from server'), 1000*t);
});
http.listen(3002, () => console.error('listening on http://localhost:3002/'));
console.error('socket.io example');
Source: https://github.com/rsp/node-websocket-vs-socket.io/blob/master/si.js
Client-side
Compare the client-side example of using WebSocket and Socket.IO to do the same in the browser:
WebSocket Client
WebSocket client example using vanilla JavaScript:
var l = document.getElementById('l');
var log = function (m) {
var i = document.createElement('li');
i.innerText = new Date().toISOString()+' '+m;
l.appendChild(i);
}
log('opening websocket connection');
var s = new WebSocket('ws://'+window.location.host+'/');
s.addEventListener('error', function (m) { log("error"); });
s.addEventListener('open', function (m) { log("websocket connection open"); });
s.addEventListener('message', function (m) { log(m.data); });
Source: https://github.com/rsp/node-websocket-vs-socket.io/blob/master/ws.html
Socket.IO Client
Socket.IO client example using vanilla JavaScript:
var l = document.getElementById('l');
var log = function (m) {
var i = document.createElement('li');
i.innerText = new Date().toISOString()+' '+m;
l.appendChild(i);
}
log('opening socket.io connection');
var s = io();
s.on('connect_error', function (m) { log("error"); });
s.on('connect', function (m) { log("socket.io connection open"); });
s.on('message', function (m) { log(m); });
Source: https://github.com/rsp/node-websocket-vs-socket.io/blob/master/si.html
Network traffic
To see the difference in network traffic you can run my test. Here are the results that I got:
WebSocket Results
2 requests, 1.50 KB, 0.05 s
From those 2 requests:
HTML page itself
connection upgrade to WebSocket
(The connection upgrade request is visible on the developer tools with a 101 Switching Protocols response.)
Socket.IO Results
6 requests, 181.56 KB, 0.25 s
From those 6 requests:
the HTML page itself
Socket.IO's JavaScript (180 kilobytes)
first long polling AJAX request
second long polling AJAX request
third long polling AJAX request
connection upgrade to WebSocket
Screenshots
WebSocket results that I got on localhost:
Socket.IO results that I got on localhost:
Test yourself
Quick start:
# Install:
npm i -g websocket-vs-socket.io
# Run the server:
websocket-vs-socket.io
Open http://localhost:3001/ in your browser, open developer tools with Shift+Ctrl+I, open the Network tab and reload the page with Ctrl+R to see the network traffic for the WebSocket version.
Open http://localhost:3002/ in your browser, open developer tools with Shift+Ctrl+I, open the Network tab and reload the page with Ctrl+R to see the network traffic for the Socket.IO version.
To uninstall:
# Uninstall:
npm rm -g websocket-vs-socket.io
Browser compatibility
As of June 2016 WebSocket works on everything except Opera Mini, including IE higher than 9.
This is the browser compatibility of WebSocket on Can I Use as of June 2016:
See http://caniuse.com/websockets for up-to-date info.
Its advantages are that it simplifies the usage of WebSockets as you described in #2, and probably more importantly it provides fail-overs to other protocols in the event that WebSockets are not supported on the browser or server. I would avoid using WebSockets directly unless you are very familiar with what environments they don't work and you are capable of working around those limitations.
This is a good read on both WebSockets and Socket.IO.
http://davidwalsh.name/websocket
tl;dr;
Comparing them is like comparing Restaurant food (maybe expensive sometimes, and maybe not 100% you want it) with homemade food, where you have to gather and grow each one of the ingredients on your own.
Maybe if you just want to eat an apple, the latter is better. But if you want something complicated and you're alone, it's really not worth cooking and making all the ingredients by yourself.
I've worked with both of these. Here is my experience.
SocketIO
Has autoconnect
Has namespaces
Has rooms
Has subscriptions service
Has a pre-designed protocol of communication
(talking about the protocol to subscribe, unsubscribe or send a message to a specific room, you must all design them yourself in websockets)
Has good logging support
Has integration with services such as redis
Has fallback in case WS is not supported (well, it's more and more rare circumstance though)
It's a library. Which means, it's actually helping your cause in every way. Websockets is a protocol, not a library, which SocketIO uses anyway.
The whole architecture is supported and designed by someone who is not you, thus you dont have to spend time designing and implementing anything from the above, but you can go straight to coding business rules.
Has a community because it's a library (you can't have a community for HTTP or Websockets :P They're just standards/protocols)
Websockets
You have the absolute control, depending on who you are, this can be very good or very bad
It's as light as it gets (remember, its a protocol, not a library)
You design your own architecture & protocol
Has no autoconnect, you implement it yourself if yo want it
Has no subscription service, you design it
Has no logging, you implement it
Has no fallback support
Has no rooms, or namespaces. If you want such concepts, you implement them yourself
Has no support for anything, you will be the one who implements everything
You first have to focus on the technical parts and designing everything that comes and goes from and to your Websockets
You have to debug your designs first, and this is going to take you a long time
Obviously, you can see I'm biased to SocketIO. I would love to say so, but I'm really really not.
I'm really battling not to use SocketIO. I dont wanna use it. I like designing my own stuff and solving my own problems myself.
But if you want to have a business and not just a 1000 lines project, and you're going to choose Websockets, you're going to have to implement every single thing yourself. You have to debug everything. You have to make your own subscription service. Your own protocol. Your own everything. And you have to make sure everything is quite sophisticated. And you'll make A LOT of mistakes along the way. You'll spend tons of time designing and debugging everything. I did and still do. I'm using websockets and the reason I'm here is because they're unbearable for a one guy trying to deal with solving business rules for his startup and instead dealing with Websocket designing jargon.
Choosing Websockets for a big application ain't an easy option if you're a one guy army or a small team trying to implement complex features. I've wrote more code in Websockets than I ever wrote with SocketIO in the past, for ten times simpler things than I did with SocketIO.
All I have to say is ... Choose SocketIO if you want a finished product and design. (unless you want something very simple in functionality)
Im going to provide an argument against using socket.io.
I think using socket.io solely because it has fallbacks isnt a good idea. Let IE8 RIP.
In the past there have been many cases where new versions of NodeJS has broken socket.io. You can check these lists for examples... https://github.com/socketio/socket.io/issues?q=install+error
If you go to develop an Android app or something that needs to work with your existing app, you would probably be okay working with WS right away, socket.io might give you some trouble there...
Plus the WS module for Node.JS is amazingly simple to use.
Using Socket.IO is basically like using jQuery - you want to support older browsers, you need to write less code and the library will provide with fallbacks. Socket.io uses the websockets technology if available, and if not, checks the best communication type available and uses it.
https://socket.io/docs/#What-Socket-IO-is-not (with my emphasis)
What Socket.IO is not
Socket.IO is NOT a WebSocket implementation. Although Socket.IO indeed uses WebSocket as a transport when possible, it adds some metadata to each packet: the packet type, the namespace and the packet id when a message acknowledgement is needed. That is why a WebSocket client will not be able to successfully connect to a Socket.IO server, and a Socket.IO client will not be able to connect to a WebSocket server either. Please see the protocol specification here.
// WARNING: the client will NOT be able to connect!
const client = io('ws://echo.websocket.org');
I would like provide one more answer in 2021. socket.io has become actively maintained again since 2020 Sept. During 2019 to 2020 Aug(almost 2 years) there was basically no activity at all and I had thought the project may be dead.
Socket.io also published an article called Why Socket.IO in 2020?, except for a fallback to HTTP long-polling, I think these 2 features are what socket.io provides and websocket lacks of
auto-reconnection
a way to broadcast data to a given set of clients (rooms/namespace)
One more feature I find socket.io convenient is for ws server development, especially I use docker for my server deployment. Because I always start more than 1 server instances, cross ws server communication is a must and socket.io provide https://socket.io/docs/v4/redis-adapter/ for it.
With redis-adapter, scaling server process to multiple nodes is easy while load balance for ws server is hard. Check here https://socket.io/docs/v4/using-multiple-nodes/ for further information.
Even if modern browsers support WebSockets now, I think there is no need to throw SocketIO away and it still has its place in any nowadays project. It's easy to understand, and personally, I learned how WebSockets work thanks to SocketIO.
As said in this topic, there's a plenty of integration libraries for Angular, React, etc. and definition types for TypeScript and other programming languages.
The other point I would add to the differences between Socket.io and WebSockets is that clustering with Socket.io is not a big deal. Socket.io offers Adapters that can be used to link it with Redis to enhance scalability. You have ioredis and socket.io-redis for example.
Yes I know, SocketCluster exists, but that's off-topic.
Socket.IO uses WebSocket and when WebSocket is not available uses fallback algo to make real time connections.
TLDR:
'Socket.io' is an application layer specification that can be implemented on top of/using the application layer specification 'websockets'.
websocket spec
socket.io spec
I think the simple answer here is in basic web technology definitions:
Specification: A documented standard detailing the requirements for a program to achieve in order to be labeled as "an implimentation of some sepc." It is important to achieve this rubber stamp when building programs, because any program is only as good at the machine executing the code. Programming is fundamentally built upon specifications, and if, they are not followed code will not execute correctly. However, a specification does nothing. It is just a text document.
Implementation: This is actual, executable code that accomplishes what the specification says to do.
Application Layer - System that defines messages and handshakes sent over transport. This is the stuff you have to know when working with HTTP/Websockets/Socketio. It defines how the connections will be made, authenticated, data will be sent, and how it will arrive.

Resources