Whats the best way to share objects between webservers on different hosts - node.js

TLDR;
I am wondering what is the best way to reliably share an object or other data between n number of webservers on n number of machines?
I have looked at the likes of redis but it seems that this would not be what I am actually looking for here. I am now thinking that something like IPC over remote / RPC might be more appropriate? Is there a better way to do this given it will be called at minimum 10 times over a 30 second interval which can exponentially grow as the number of users running servers grows too.
Example & current use case:
I run a multiplayer mod for a game which receives a decent level of traffic and we are starting to notice cases where requests get dropped sometimes. The backend webserver is written in NodeJS and uses express in a couple of places too. We are in the process of restructuring the system and we have now come to restructuring the part of the system that handles a heartbeat from each server that members of the public host. This information is then shared out to the players so they can decide which server to join.
Based on my own research I am looking to host the service on several different machines for redundancy. These machines are then linked over vlan / vswitch so that they have a secure method to communicate with eachother. The database system is already setup to replicate this way however I cannot see a performance inclined way to handle the sharing of objects containing information about the servers that have communicated with each webhost.
If it helps the system works something like this:
Users server -> my load balancer -> webhost (backend).
Player -> my load balancer -> webhost (backend) returns info on all currently online servers.
In the example above and what is currently in use is a single instance webserver which handles the requests and processes needed.

Just an idea while the community proposes answers: consider reading about Apache Thrift. It is not such as IPC like, but more an RPC like. If the architecture of your servers, or the different components or the "backend network" is in "star"... with one "master" I shoud consider that possibility.
If the architecture of your backend is not like that... but a group of "independent" entities, it comes to my mind to solve the functionality with some "data bus" such as private MQTT broker and a group of members, subscribed or publishing data for the rest of the network. The most optimal serialization strategy for the object would be in my opinion Google Protobuf.
The integration of Mqtt with nodeJS is very simple, and if the weight of the packets is not too big, and you can admit some latency, I would really recommend you to make some tests using Mqtt with a publish/subscription QoS=2. It would not take great efforts to substitute de underlying communications library that you are using.
Once that is said, it seems that there's another solution: Kafka, that seems very interesting (I don't really know it).
Your choice will depend on the nature of your data, mostly: weight of the packets, frequency per user, and the latency you are willing to admit for the worst of the scenarios.

Related

Options for getting a CPU intensive job off my web server?

I have been working on a Web App for visualizing live data. It is crucial that this data is kept up to date on the client side without such updates being invoked directly by the client (e.g. no button presses or refreshing the page). Currently, on page load, I grab the current data set from a database (DynamoDB) via Ajax, and subsequent updates are pushed to any listening clients every 5 minutes via a Websockets connection (using Socket.io).
I have overlooked the computational load of this update job. It has to mine some data, process it, update the database, and send the update out to all clients. As a result, the web server is left unresponsive for about 30 seconds with each update. Furthermore, my current architecture limits me from putting my server behind a load balancer, which is something I anticipate coming up in the future. For both these reasons, I really need to get this update job off my web server.
I am relatively inexperienced in web development, and I don't feel I am knowledgeable enough about these technologies to know the drawbacks of the solutions I have come up with. Currently, I am considering:
Break the update off into a separate process so it does not block the Node event loop. This would solve my issue in the short term, but if I ever want to load balance my application, I can't have the update running on multiple machines.
Drop Websockets entirely and just have the client query the database every 5 minutes, while a separate process (or separate server if I want load balancing) keeps the database up to date without interacting directly with the client. Will this kind of access pattern put too much load on my db?
Have a separate server run the update, and send the result via Websockets (or maybe some other protocol) to my load balanced application servers, which then push that update to all listening clients as usual. Is this even possible?
Perhaps there are other solutions. It seems like this would be a relatively common problem, so I was hoping I could find some guidance here. What are the potential issues with the solutions I have proposed, and are there other possible solutions that my suit my use case better?
It sounds like you want one process sitting somewhere which crunches the data and publishes it to a stream. Clients can then subscribe to the stream as and when they like. Redis handles streams nicely, you could process your data and push it into a redis stream. You could then create a small node service which subscribes to the redis stream and pushes the formatted data out over a websocket or via polling.
In this scenario you can then scale up either the publishing process (the one crunching the numbers) if your data load goes up, or scale up your subscribed process (which serves the data over a websocket to browsers) if you get an influx of clients watching the data.
You can also easily distribute the hosting of these services across other machines, and even write them in different languages if you decide the number crunching needs something like threading.
You're then left with the issue of clients (web browsers) consuming this data with a load balance in-between. This can be a hard problem if you use websockets and is bundled with pros and cons. But importantly you'll have separated your data crunching from your result publishing and that'll isolate out your issue to only the load balancing.
I have done pretty much the same to check ressources on some of our servers.
I have a C# service getting the information on each server that we manage, sending them to a queue (Amq).
From there, I have a stomp client fetching data from amq and emiting them to a websocket.
My main micro service is fetching the data to save them into a db.
My visualisation webapp is connected to the same ws and is fetching the data as they are sent to display them.
The Amq step isn't mandatory at all, it's just something I had to work with (historical).
I don't know what type of data your are working with, so I don't know if my solution can apply to you.
Don't hesitate if I'm not clear or you have any question.
This is a big question and I'm not going to try and give you a definitive answer.
For option 2
It really depends on how expensive your queries are. You can make DynamoDB fast if you pay for enough throughput. That said, on the face it, re-loading your whole dataset, when that sounds like its probably large, probably isn't good engineering.
For option 3
This option seems best to me if its achievable, although admittedly its hard to say with such a complex system - obviously you can't share your whole project.
Given your are already using AWS you might want to look into AWS Lambda. If you can move the update process into a stand alone job, you can host it on lambda and move the load off the web server. Lambda is essentially infinitely scalable and you only pay for the compute you use.
This really depends on you being able to split the update task off into a separate service. Its likely you would need a fair bit of refactoring to isolate it as a service. If you can break little bits off at a time, and make the move gradually, even better.
If you consider trying this, and you've not used Lambda before, I would definitely start small with some hello world examples. Then try a very simple service in your application, and build up to taking on the update service.
You might also consider looking in AWS Simple Message Queue Service to handle the comms between clients and server.
Database tuning
If a lot of your update time is spent waiting for database actions to complete, rather than server processing, you can consider tuning that side of things up. Things to consider are:
Buying more throughput
Using batch operations (as these move load to DynamoDB from your server)
Tuning keys, indexes and database access

How to scale a NodeJS stateful application

I am currently working on a web-based MMORPG game and would like to setup an auto-scaling strategy based on Docker and DigitalOcean droplets.
However, I am wondering how I could manage to do so:
My game server would have to be splittable across different Docker containers BUT every game server instance should act as if it was only one gigantic game server. That means that every modification happening in one (character moving) should also be mirrored in every other game server.
I am trying to get this to work (at least conceptually) but can't find a way to synchronize all my instances properly. Should I use a master only broadcasting events or is there an alternative?
I was wondering the same thing about my MySQL database: since every game server would have to read/write from/to the db, how would I make it scale properly as the game gets bigger and bigger? The best solution I could think of was to keep the database on a single server which would be very powerful.
I understand that this could be easy if all game servers didn't have to "share" their state but this is primarily thought so that I can scale quickly in case of a sudden spike of activity.
(There will be different "global" game servers like A, B, C... but each of those global game servers should be, behind the scenes, composed of 1-X docker containers running the "real" game server so that the "global" game server is only a concept)
The problem you state is too generic and it's difficult to give a concrete response. However let me be reckless and give you some general-purpose scaling advices:
Remove counters from databases. Instead primary keys that are auto-incremented IDs, try to assign random UUIDs.
Change data that must be validated against a central point by data that is self contained. For example, for authentication, instead of having the User Credentials in a DB, use JSON Web Tokens that can be verified by any host.
Use techniques such as Consistent Hashing to balance the load without need of load balancers. Of course use hashing functions that distribute well, to avoid/minimize collisions.
The above advices are basically about changing the design to migrate from stateful to stateless in as much as aspects as you can. If you anyway need to provide stateful parts, try to guess which entities will have more chance to share stateful data and allocate them in the same (or nearly server). For example, if there are cities in your game, try to allocate in the same server the users that are in the same city, since they are more willing to interact between them (and share stateful data) than users that are in different cities.
Of course if the city is too big and it's very crowded, you will probably need to partition the city in more servers to avoid overloading the server.
Your question is too broad and a general scaling problem as others have mentioned. It'd have been helpful if you'd stated more clearly what your system requirements are.
If it has to be real-time, then you can choose Redis as your main DB but then you'd need slaves (for replication) and you would not be able to scale automatically as you go*, since Redis doesn't support that. I assume that's not a good option when you're working with games (Sudden spikes are probable)
*there seems to be some managed solutions, you need to check them out
If it can be near real-time, using Apache Kafka can prove to be useful.
There's also a highly scalable DB which has everything you need called CockroachDB (I'm a contributor, yay!) but you need to run tests to see if it meets your latency requirements.
Overall, going with a very powerful server is a bad choice, since there's a ceiling and it'd cost you more to scale vertically.
There's a great benefit in scaling horizontally such an application. I'll try to write down some ideas.
Option 1 (stateful):
When planning stateful applications you need to take care about synchronisation of the state (via PubSub, Network Broadcasting or something else) and be aware that every synchronisation will take time to occur (when not blocking each operation). If this is ok for you, lets go ahead.
Let's say you have 80k operations per second on your whole cluster. That means that every process need to synchronise 80k state changes per second. This will be your bottleneck. Handling 80k changes per second is quiet a big challenge for a Node.js application (because it's single threaded and therefore blocking).
At the end you'll need to provision precisely the maximum amount of changes you want to be able to sync and perform some tests with different programming languages. The overhead of synchronising needs to be added to the general work load of the application. It could be beneficial to use some multithreaded language like C, Java/Scala or Go.
Option 2 (stateful with routing):*
In some cases it's feasible to implement a different kind of scaling.
When for example your application can be broken down into areas of a map, you could start with one app replication which holds the full map and when it scales up, it shares the map in a proportional way.
You'll need to implement some routing between the application servers, for example to change the state in city A of world B => call server xyz. This could be done automatically but downscaling will be a challenge.
This solution requires more care and knowledge about the application and is not as fault tolerant as option 1 but it could scale endlessly.
Option 3 (stateless):
Move the state to some other application and solve the problem elsewhere (like Redis, Etcd, ...)

Latency when calling from one API to another API

I have this scenario that I have a external .net Core API gateway that my mobile device talk to and then many individual internal "micro" services that the gateway calls.
My question is for both IIS/Kestrel on local servers and on Azure (since my dev setup is in Azure but production on personal servers).
The question is if in both cases the calls to the internal api's work out of the box with as little latency as possibel? Or can I do something to "connect" the external and internal api's to get less latency?
I realy can't find any good data on this so I think I don't have the right terminology for this question.
Hope you can help me shed a light on this.
Short answer: Use the async/await pattern in your .NET code when calling remote services. Put services and servers as physically close together as possible to reduce network overhead.
If your code calls some other service over HTTPS, there will always be some latency associated with things like:
Network stack overhead
HTTPS handshaking
Packet routing
Actually waiting for the remote server to process and return a response
The first two are minimal, and you can't do much about them. The time it takes for a packet to get between two physical locations can vary between 5ms and 200ms (or more), and this is why it's common to place services in the same geographic region or datacenter.
As far as your code goes, as long as you're using the async/await pattern, your code is already running at the best network speed. If the latency isn't up to your requirements, you'll need to either make your servers faster, or move them closer together.
Fair warning: I'm not a network engineer, and this is a pretty high-level overview. I'd recommend asking technical networking questions over at Server Fault.

Strategy to implement a scalable chat server

I am looking to implement some sort of chat server. And I want it to scale. This seems like a big question, so I guess I expect the answers to be direction pointers, sort of exploratory.
The end-user clients are web or phone client. I think some sort of websocket implementation, such as Socket.IO is nice.
On the server side I wish to use Node.js. I want the architecture to be scalable so that the number of users are not limited (well, within reason, the chance of big hit is not expected, and if it is, the chance of having smarter, experienced people to work on it is reasonable instead of currently just me coding) The number of users per chatroom is hopefully not limited, or maybe some fixed large number. And that means I need to scale horizontally using several servers written in Node.
Suppose some load balancer (and hopefully in the future not a single point of failure, but I don't know how I would achieve that, or maybe just move to AWS) are dispatching SocketIO connections from the end clients to the chat servers. Different users connection to different servers may be in the same room, so the messages need to be send to other servers.
How would I feasibly implement something like this? Hopefully not too complex.
Questions:
(1) If all servers need to handle all messages as users can be logged on via any of the servers, does this scale?
(2) Do I need some sort of message queue for the servers to talk among them? Is Pub-sub from Rabbitmq usable for this? Or if zeromq, how would I scale with pub sub? The Zeromq guide is has explanations for scaling to more than one server with REQ/REP type of applications. But not Pub Sub.
(3) Or should I start with XMPP?
I am hoping to make it work as easy as possible.
There's a rather good explanation at the Socket.io site. Have a look at
http://socket.io/docs/using-multiple-nodes/
It suggests using Nginx as HTTP load balancer, Node.js clustering (with sticky sessions) and Redis as the message backend.
I think your goals should be achievable with little to none coding involved, only using the given modules and configuration mechanisms.

Using Linux Virtual Server for load balancing of zones in MMO game

I'm a developer of a MMO game and currently we're at my company facing some scalability issues which, I think, can be resolved with proper clustering of the game world.
I don't really want to reinvent the wheel that's why I think Linux Virtual Server could be a good choice especially with some Level 7 load balancing technique.
I'm currently looking at ktcpvs as a load balancing solution and wonder if it's a proper choice.
The main idea is to have a number of zones("locations" in terms of my game) running on dedicated servers. When a player decides to go to some specific location the load balancer decides which zone server will be actually serving the player(that's actually why I need a Level 7 load balancer)
What do you folks think about all said above?
Update: I posted the same question to LVS users mailing list http://marc.info/?l=linux-virtual-server&m=124976265209769&w=2
Update: I also started the similar topic on the gamedev.net forum http://www.gamedev.net/community/forums/topic.asp?topic_id=544386
In order to address your question we need to understand whether you need volume or response, but it is difficult to get both at the same time.
Layer 7 load balancing - is data based application level balancing, so the data content of the network packet needs to be routed to an end-point. You can achieve volume (more users) by implementing routing at the application level, service level or kernel level.
Scalability - I assume you are running out of memory, CPU resources and network bandwidth.
Application level - your application logic receives an application packet and routes accordingly.
Service level - your system framework (front end service of some kind) receives the packet and through a module - performs the routing (think of custom apache module, even network driver modules - like writing a network filter)
Kernel level - Performs routing at network packet level.
The closer you move to the metal, the better your response will be. I suggest using dedicated linux server up-front to perform the routing - go native, not virtual. Use multiple or teamed network adapters for the WAN and a dedicated adapter for each end-point (one+ wan, one each for each connected app server)
If response time is important then you need a kernel/supervisor state solution, it will save you a few context switches but be aware that you need to limit hops at all costs and could better be served by fewer, larger machines and your scalability will always be limited. There is a risk in using KTCPVS, it is quite old and not actively updated. If you judge that it works for you great, otherwise consider writing something akin to a network filter as long as it runs in system state.
If volume is important but response time is secondary, implement a custom built high-speed socket switch built in C++ running in problem/user state. It is the easiest to maintain and will offer the best scalability.
You will need to build some prototypes to figure out what suits your needs best.
Final thoughts -
Before doing any of the above first ensure that you have optimized your game design. You may know most of this, I list it here for the benefit of all.
(a) messages should fit comfortably within one network packet, less than 1500 bytes for most home routers
(b) Try to fit the logic of the routing in your game client instead of your servers. A simple download of a small table with zones and IP addresses to a client will allow you to forego all of the above.
(c) Try to limit zone visibility by to the clients, they should know about their zones and adjacent zones only (if you implement the point b above)
Hope this helps, sorry I cannot be more specific regarding KTCPVS.
You haven't specified where the bottleneck is. Network Traffic? Disk IO? CPU Cycles?
Assuming you mean a layer 7 load balancer and don't have enough CPU power, I think LVS ist not the optimal choice. I have done Web Server load balancing with LVS, which works straightforward and isn't exactly complicated.
But I think load balancing an MMORP this way needs considerable amounts of additional code in LVS, it might be easier to do the load balancing with a multithreaded application distributed over some multicore server. But this isn't fully scalable, this only gets you to 16 cores without prohibitve cost increase.
The biggest issue in something like this is what happens when players are near a boundary. Obviously they need to be able to see and interact with each other, but they're on separate servers. So you need some pretty fancy inter-server communication, sometimes just duplicating messages to both servers. It can get even more complicated when someone is near a "corner", and then you have to deal with 4 servers!
The book Massively Multiplayer Game Development has a chapter on "The Pitfalls of Shared Server Boundaries" which covers this issue in detail.
I haven't heard of Linux Virtual Server before now, so I don't understand how it fits. I think your actual server application needs to support this game-specific load balancing, rather than trying to run a cluster and assuming that it will automatically know how to split up your application (which it won't). If I were you, I would write the server program to handle its own piece of land, and it should connect to the pieces of land around it, and then design a server-to-server protocol for the passing of these messages ("here comes a player, I'm going to start telling you about him!" "make sure to tell me about messages near our boundary", "okay the player is out of my territory and into yours, here's his detailed data", etc). I think it's a bit more complicated than just running a different flavor of Linux and assuming you'll get automatic load balancing.
Why are you moving the distribution logic to the loadbalancer? It's a component that's not free and can break. It seems your clients are quite aware of which zone they're in. It seems they could very well connect to zone<n>.example.com. You'd then handle loadbalancing at DNS level.

Resources