Multi-node connection management - linux

I'm designing an application involving multi-node communications using Infiniband (ibv_*). What is the standard way to keep connections between nodes? I'm thinking of O(N^2) connections for all pairs of node as the easiest one, but it's kind of silly and not scalable.

The question is kinda simple and short, but the real answer is VERY long...
First of all, be sure that you really need to use ibv_... stuff.
Are you using Infiniband or ROCE?
Next, analyze the expected communication pattern of your application.
You're talking about scalability, which probably means that you have a massively parallel application in mind.
Do you really need to invent your own communication layer?
Can't you use existing solutions?
There's a whole CS field that deals with this kind of problems - HPC (High Performance Computing).
Perhaps MPI/UPC/some other library will solve your problem?
If you still need to write your own ibv_... application with lots and lots of machines, then you need to consider:
do you need RC or UD connections?
if you have the newest Mellanox HCA (Connect-IB) then there's also an option of DC
what are the scalability requirements?
how sensitive is the application to latency/BW?
To summarize:
if you need to have a massively parallel IB verbs application, and you need RC, you'd better open RC connections on-demand
if you have to have all the RC connection opened in advance, then there's no other way - O(n^2) connections case in inevitable
if it fits your needs, consider using UD
check that existing solutions are not what you need

Related

How to scale a NodeJS stateful application

I am currently working on a web-based MMORPG game and would like to setup an auto-scaling strategy based on Docker and DigitalOcean droplets.
However, I am wondering how I could manage to do so:
My game server would have to be splittable across different Docker containers BUT every game server instance should act as if it was only one gigantic game server. That means that every modification happening in one (character moving) should also be mirrored in every other game server.
I am trying to get this to work (at least conceptually) but can't find a way to synchronize all my instances properly. Should I use a master only broadcasting events or is there an alternative?
I was wondering the same thing about my MySQL database: since every game server would have to read/write from/to the db, how would I make it scale properly as the game gets bigger and bigger? The best solution I could think of was to keep the database on a single server which would be very powerful.
I understand that this could be easy if all game servers didn't have to "share" their state but this is primarily thought so that I can scale quickly in case of a sudden spike of activity.
(There will be different "global" game servers like A, B, C... but each of those global game servers should be, behind the scenes, composed of 1-X docker containers running the "real" game server so that the "global" game server is only a concept)
The problem you state is too generic and it's difficult to give a concrete response. However let me be reckless and give you some general-purpose scaling advices:
Remove counters from databases. Instead primary keys that are auto-incremented IDs, try to assign random UUIDs.
Change data that must be validated against a central point by data that is self contained. For example, for authentication, instead of having the User Credentials in a DB, use JSON Web Tokens that can be verified by any host.
Use techniques such as Consistent Hashing to balance the load without need of load balancers. Of course use hashing functions that distribute well, to avoid/minimize collisions.
The above advices are basically about changing the design to migrate from stateful to stateless in as much as aspects as you can. If you anyway need to provide stateful parts, try to guess which entities will have more chance to share stateful data and allocate them in the same (or nearly server). For example, if there are cities in your game, try to allocate in the same server the users that are in the same city, since they are more willing to interact between them (and share stateful data) than users that are in different cities.
Of course if the city is too big and it's very crowded, you will probably need to partition the city in more servers to avoid overloading the server.
Your question is too broad and a general scaling problem as others have mentioned. It'd have been helpful if you'd stated more clearly what your system requirements are.
If it has to be real-time, then you can choose Redis as your main DB but then you'd need slaves (for replication) and you would not be able to scale automatically as you go*, since Redis doesn't support that. I assume that's not a good option when you're working with games (Sudden spikes are probable)
*there seems to be some managed solutions, you need to check them out
If it can be near real-time, using Apache Kafka can prove to be useful.
There's also a highly scalable DB which has everything you need called CockroachDB (I'm a contributor, yay!) but you need to run tests to see if it meets your latency requirements.
Overall, going with a very powerful server is a bad choice, since there's a ceiling and it'd cost you more to scale vertically.
There's a great benefit in scaling horizontally such an application. I'll try to write down some ideas.
Option 1 (stateful):
When planning stateful applications you need to take care about synchronisation of the state (via PubSub, Network Broadcasting or something else) and be aware that every synchronisation will take time to occur (when not blocking each operation). If this is ok for you, lets go ahead.
Let's say you have 80k operations per second on your whole cluster. That means that every process need to synchronise 80k state changes per second. This will be your bottleneck. Handling 80k changes per second is quiet a big challenge for a Node.js application (because it's single threaded and therefore blocking).
At the end you'll need to provision precisely the maximum amount of changes you want to be able to sync and perform some tests with different programming languages. The overhead of synchronising needs to be added to the general work load of the application. It could be beneficial to use some multithreaded language like C, Java/Scala or Go.
Option 2 (stateful with routing):*
In some cases it's feasible to implement a different kind of scaling.
When for example your application can be broken down into areas of a map, you could start with one app replication which holds the full map and when it scales up, it shares the map in a proportional way.
You'll need to implement some routing between the application servers, for example to change the state in city A of world B => call server xyz. This could be done automatically but downscaling will be a challenge.
This solution requires more care and knowledge about the application and is not as fault tolerant as option 1 but it could scale endlessly.
Option 3 (stateless):
Move the state to some other application and solve the problem elsewhere (like Redis, Etcd, ...)

Can I use Node to build a payment gateway software?

I know it is a subjective question, but the reason I ask this question is because
Node.js is not good with heavy computational task
Node.js has some issue with memory leak.
By having the problems above, would node be a good use case to build a payment gateway software?
I'm very comfortable with node, but there are many people said that its better to use other language like golang or scala for this type of systems.
Let me know what you guys think about, whether I should use Node or other languages.
Yes, node.js would be perfectly fine for payment gateway software. An appropriate design using clustering or off-loading computation tasks to child processes could easily help optimize heavy computational tasks.
And, node.js is being used by many heavy traffic commercial sites without memory leak issues. Memory leaks are an issue with faulty software design, not with the platform.
Further, the very nature of payment gateway software (being the middleman in a transaction between two other networking endpoints) is very well set up for the node.js async design that handles lots of in-flight transactions very efficiently.
As with pretty much any major back-end system these days, you just have to design your app to work the way the platform performs best and you could probably use any of the systems you mention just fine.

Integration of bounded contexts locally

In "Implementing Domain-Driven Design", Vernon give detailed examples for integrating bounded context with a messaging or REST based solution, it also mention database integration, but I understand it is not a very clean solution to share database or at least db tables between BC.
But what if the 2 BCs I want to integrate are hosted locally on the same server, is it really a good idea to use a messaging/rest/rpc solution ? (which seems more suitable for a remotely hosted BC to me)
Otherwise, except with DB integration, what are the other alternatives ? Hosting both BC in the same process and calling it directly (still using adapters and translators for clean seperation) ?
Thanks
You could look into using something like 0MQ for inter-process communication on the same server. I've also in the past just hosted things in the same process as you suggest and just used interfaces / in-memory messaging to separate out contexts.
Everything is about trade-offs in the end, so you just need to decide what level of isolation you are willing to accept. The simplest solution would be to separate inside a solution via folders and interfaces, the other end of the spectrum being completely separate servers.
I don't think that location should come into play w.r.t. integration between BCs.
There really are other factors to consider such as guaranteed delivery to the recipient in order to ensure that the processing takes place. This should be required whether or not the two BCs are hosted on the same server.
Another reason to ignore location is that when you need to scale, your architecture should be able to handle it from the get-go.
As tomliversidge mentioned it is possible to use some deployment mechanisms such as non-durable messaging to speed up things but there will definitely be a trade-off and that has to be a conscious decision.

synthetic multi-node crossbar system implementation

I am implementing a system composed of a collection of small systems, ie. Raspberry, Yun, Beaglebone, the occasional PC. Crossbar.io has real promise ... but, as I understand it, doesn't currently support multiple nodes. Am I correct? Does anyone know when that might happen?
In the meantime it occurred to me that each individual node can offer an http interface that I might be able to use for my purposes. My initial thought is to crate workers that wrap access to the web the interface on subsidiary nodes. This fits the overall architecture of the system I want to create - does it have any merit? Is it tractable? I'm new to websockets - and insight would be a great help.
Thanks for your time,
Al
In general that does sound like a fit for Crossbar.io.
There is no timeline on multi-node (i.e. multiple routers), but we hope to have at least hot-standby nodes for high availability ready in Q1. Other than for high availability, I think that a single instance should provide sufficient performance for most applications out there - on a single current (non-high-end) Xeon we're talking tens of thousands of events a second, and concurrent connections are mostly limited by RAM (and 100s of thousands on a single box are definitely not a problem). (If you need more than that then I'd be very interested in your specific use case - we want to learn more about our users.)
I don't completely understand the second part of your question: What precisely is the architecture you're planning here? If you're talking about the integrated Web server, then with recent optimizations (it can now use multiple cores) this should be enough for even moderately big sites, and with SPAs you're not likely to ever run into performance issues.
Hope this helps, and I'll be glad to answer in more detail once you've clarified the second part.

High Scalability in Domain-Driven Design

I'm using DDD for a service-oriented application intended to transmit a high volume of messages between a high volume of web clients (i.e., browsers).
Because in the context of required functionality, the need for transmission outweighs the need for storage, I love the idea of relying on RAM primarily and minimizing use of the database.
However I'm unclear on how to architect this from a scalability point of view. A web farm creates high availability of service endpoints and domain logic processing. But no matter how many servers I have, it seems they must all share a common repository so that their data is consistent.
How do I build this repository so that it's as scalable as possible? How can it be splashed across an array of physical machines in a manner such that all machines are consistent and each couldn't care less if another goes down?
Also since touching the database will be required occasionally (e.g., when a client goes missing and messages intended for it must be stored until it returns), how should I organize my memory-based code and data access layer? Are they both considered "the repository"?
There are several ways to solve this issue. No single answer can really cover it all...
One method to ensure your scalability is to simply scale the hardware. Write your web services to be stateless so that you can run a web farm (all running the same identical services, pointing to the same DB) and turn your DB into a cluster. Clustered databases run over multiple servers and work on the same storage. However, this scenario can get complicated and expensive quite quickly.
Some interesting links:
http://scale-out-blog.blogspot.com/2009/09/future-of-database-clustering.html
http://en.wikipedia.org/wiki/Server_farm
Another method is to look at architecture. CQRS is a common architectural model that ensures scalability. For instance, this architecture model -- its name stands for Command/Query Responsibility Segregation -- builds different databases for reading and writing. This seems contradictory, but if you study it, it becomes natural and you wonder why you've never thought of it before. Simply put, most apps do a lot more reading than writing, and writing tends to be a lot more complicated than reading (requiring business rule validation etc.), so why not separate the two? You can use your expensive transactional database for writing and then your cheap, maybe Non-SQL based or open source, database over multiple reading servers. Your read model is then optimized for the screens of your application(s), whereas the write model is optimized solely for writing and is in fact a DDD-based set of repositories.
There's just not enough room here to cover this option in detail, but CQRS is a good way of achieving scalability and even ease of development, once you have a CQRS framework in place. There are many other advantages to CQRS, such as ease of auditing (if you combine it with the "event sourcing" technique, which is common in CQRS-based environments).
Some interesting links:
http://cqrsinfo.com
http://abdullin.com/cqrs
http://blog.fossmo.net/post/Command-and-Query-Responsibility-Segregation-(CQRS).aspx
Are you ready for some reading? There are a lot of options, but I believe you should start by learning about the advantages of modern distributed NoSQL dbs, and enjoy learning from the experience learned in facebook, linkedin and other friends. Start here:
http://highscalability.com/
http://nosql-database.org/

Resources