How to scrape socket.io updates to a third-party site? - node.js

I basically want to know if its possible to use Socket.io using the server-side only with no client side? BUT I want to know if my server-side can instead connect with a different site that I cannot use Socket.io to connect to.

Use PhantomJS to load the third-party site and then inject your own javascript into the page to catch events and send those events back to your own server.

socket.io is a two-way connection. Client <--> Server. You must have a socket.io endpoint at both ends to even establish a connection in the first place. And, then once you establish the connection, you must have agreed upon messages that can be exchanged between the two ends for it to do anything useful.
It is not useful to have a server-side socket.io that doesn't actually connect to anything and nothing connects to it. It wouldn't be doing anything, just sitting there waiting for someone to connect to it.
It is possible to have two cooperating servers connect to one another with socket.io (one server just acts like a client in that case by initiating the connection to the other server). But, again both endpoints must participate in the connection for the connection to even be established and certainly for it to do anything useful.
If you just want to download the contents of a site for scraping purposes, then you would not use socket.io for that. You would just use the nodejs http module (or any of several other modules built on top of it). Your server would essentially pretend to be a browser. It would request a web page from any random web server using HTTP (not socket.io). That web server would return the web page via the normal HTTP request. Your receiving server can then do whatever it wants with that web page (scrape it, whatever).

Related

why we use socket.io client we can make app with only using via socket.io server?

I having some doubts that:-
what is need to use the socket.io client we can use only the socket.io server to stop refreshing the app.
what is different between the socket.io client and socket.io server.
check this link
socket-io.client is the code for the client-side implementation of socket.io. That code may be used either by a browser client or by a server process that is initiating a socket.io connection to some other server (thus playing the client-side role in a socket.io connection).
A server that is not initiating socket.io connections to other servers would not use this code. This has been made a little more confusing that it probably should be because when using socket.io, it appears that both client and server are using the same socket.io.js file (because they both refer to a file with the same name), but is not actually the case. The server is using a different file than the client.
From the Github page for socket-io.client:
A standalone build of socket.io-client is exposed automatically by the socket.io server as /socket.io/socket.io.js. Alternatively you can serve the file socket.io.js found at the root of this repository.
Keep in mind that there are unique features that belong to client and server so it should not be a surprise that they use some different code. Though they share code for parsing the protocol and things like that, the server has the ability to run a server or hook into an existing web server and it has methods like .join() and .leave() and data structures that keep track of all the connected sockets and is expected to live in the node.js environment. The client has the ability to initiate a connection (send the right http request), do polling if webSockets are not supported, build on a native webSocket implementation if present, etc....

Is socket.io or websocket safe to use with an express server?

I was wondering to have a realtime system made with express and i came to know about socket.io and websockets. But the way they are used i.e.
const io = socket.io("https://example.com") ;
Is it safe to use. Since the url for socket connection is available at client side any third party service can enjoy and exploit the services by connecting from their service. I don't have much idea about socket.io so correct me if I am wrong.
Kindly don't mark this question as duplicate since I found a similar question but the answer to it was related to game development, here I am specific about updating clients whenever any updates are there on the server side. Clients may be website made with angular or apps made with Android studio.
Any help is highly appreciable.
socket.io is widely used. It is perfectly fine for use in production.
Regarding the authentication part, A websocket connection b/w client and browser is established via http upgrade request(in http/1.1).
If you have an authentication mechanism in place for your application using cookie and session then you should be safe. No one can establish websocket connection directly without first logging in. On top of this you can limit connection per user to ensure that a registered user cant further exploit the connection using the cookie data.

How to access TCP Socket via web client

I have a program in an embedded device that outputs an xml string to a socket. The embedded device has lighthttpd has a web server. I want to use a web based client (no flash/silverlight) to connect to the socket and pull the xml data every second.
I looked at Node.js with Socket.io to get what I want to do, but I am not clear about how to proceed. Searching through the Node.js and Socket.io documentation and examples I see standard client-server behavior, nothing regarding what I am trying to do.
Basically, the web server is just there to accept a connection from a client on the socket that the embedded application is outputting data to. Basically the web server's purpose is to just let the client retrieve data from the raw tcp socket that the embedded application is writing to. Please advice.
I solved the problem using Websockify, which acts as bridge between a TCP Socket and a browser.
The html client will connect to a websocket, and Websockify will listen on the websocket port and transmit data between the websocket and the tcp socket.
Web browsers have the ability to do HTTP requests (which can be web page requests or Ajax requests for data) and webSocket connections. You will need to pick one of these two mechanisms if you're sticking with stock browser access.
If the lighthttpd web server in the embedded device does not support webSockets, then your choice will like be an Ajax call from the browser to your server. This is basically just an HTTP request that make return something different than a web page (often JSON data) and is designed to fetch data from the server into a web client.
If the lighthttpd web server does support webSockets, then you could use a webSocket connection to fetch the data too. This has an advantage of being a persistent connection and allows for the server to directly send data to the client (without the client even requesting more data) whenever it wants to (more efficient for constant updates).
An Ajax connection is generally not persistent. A client sends an Ajax request, the server returns the answer and the connection is closed. The next request starts a new Ajax request.
Either Ajax requests or webSocket connections should work just fine for your use. All browsers still in use support Ajax. WebSockets are supported in modern browsers (IE10 and higher).
Once you decide upon a client connection strategy, then you'd build your web app on the embedded device that served as the middleman between the browser and the data on the embedded device. It would collect the appropriate data from the embedded device and then be able to send that to browser clients that connected and requested the data.
I'm not sure exactly why you mentioned node.js. In this circumstance, it would be used as the web server and the environment for building your app and the logic that collects the data from your device and feeds it to the requesting web browser, but it sounds like you already have lighthttpd for this purpose. Personally, I recommend node.js if it works in your environment. Combined with socket.io (for webSocket support), it's a very nice way to connect browsers directly to an embedded device. I have an attic fan controller written in node.js and running on a Raspberry Pi. The node.js app monitors temperature probes and controls relays that switch attic fans and node.js also serves as a web server for me to administer and monitor the node.js. All-in-all, it's a pretty slick environment if you already know and like programming in Javascript and there's a rich set of add-in modules to extend its capabilities available through NPM. If, however, your embedded device isn't a common device that there is already support for node.js on or it doesn't already have node.js on it, then you'd be facing a porting tasks to make node.js run on it which might be more work than using some other development environment that already runs on the device like lighthttpd.

socket.io without running a node server

I have a web application that requires PUSH notifications. I looked into node.js and socket.io and have an example that's working. The question I have is, Is it possible to use socket.io only in my client side JS without running a node.js server?
Can a third party server just send requests to a proxy server and may be socket.io just listens to a port on the proxy server and sends back events to it?
Thanks,
You need a server side technology to send data back and forth via web sockets. Socket.io is a communication layer. Which means, you need to have a server side method to send data.
However,
You can use various third party services to use web sockets and notifications. They are relatively easy to use, and they have support for many other languages.
Check some of these out:
http://pusher.com/
https://www.firebase.com/
http://www.pubnub.com/
https://www.tambur.io/
https://fanout.io/
You don't need to run Node.js to have a real time push notifications. You can use a third party service that does it for you. Most of them are cheap, sometimes free for low traffic instances.

Is it a good practice to use Socket.IO's emit() instead of all HTTP requests?

I set up a Node.js HTTP server. It listens to path '/' and returns an empty HTML template on a get request.
This template includes Require.js client script, which creates Socket.IO connection with a server.
Then all communication between client and server is provided by Web Sockets.
On connection, server requires authentication; if there are authentication cookies then client sends them to server for validation, if no cookies then client renders login view and waits for user input, etc.
So far everything works, after validating credentials I create a SID for user and use it to manage his access rights. Then I render main view and application starts.
Questions:
Is there a need to use HTTPS instead of HTTP since I'm only using HTTP for sending script to the client? (Note: I'm planning to use Local Storage instead of cookies)
Are the any downfalls in using pure Web Sockets without HTTP?
If it works, why nobody's using that?
Is there a need to use HTTPS instead of HTTP since I'm only using HTTP
for sending script to the client? (Note: I'm planning to use Local
Storage instead of cookies)
No, HTTP/HTTPS is required for handshake for websockets. Choice of HTTP or HTTPS is from security point of view. If you want to use it for simply sending script then there is no harm. If you want to implement user login / authentication in your pages then HTTPS should be used.
Are the any downfalls in using pure Web Sockets without HTTP?
Web sockets and HTTP are very different. If you use pure Web Sockets you will miss out on HTTP. HTTP is the preferred choice for cross-platform web services. It is good for document traversal/retrieval, but it is one way. Web socket provides full-duplex communications channels over a single TCP connection and allows us to get rid of the workarounds and hacks like Ajax, Reverse Ajax, Comet etc. Important thing to note is that both can coexist. So aim for web sockets without leaving out HTTP.
If it works, why nobody's using that?
We live in the age of HTTP, web sockets are relatively new. In the long term, web sockets will gain popularity and take up larger share of web services. Many browsers until recently did not support web sockets properly. See here, IE 10 is the latest and only version in IE to support web sockets. nginx, a wildly popular server did not support web sockets until Feb-March 2013. It will take time for web sockets to become mainstream but it will.
Your question is pretty similar to this one
Why use AJAX when WebSockets is available?
At the end of the day they were both created for different things although you can use web sockets for most, if not everything which can be done in normal HTTP requests.
I'd recommend using HTTPS as you do seem to be sending authentication data over websockets (which will also use the SSL, no?) but then it depends on your definition of 'need'.
Downfalls - Lack of support for older browsers
It's not used this this in many other situations because it's not necessary and it's still 'relatively new'.

Resources