How does PNRP find other peers on the internet? - p2p

I'd like to know how PNRP manages to detect other peers in the Global cloud (which I assume is over the internet). I've read it uses "an architecture similar to distributed hash table systems", but that doesn't really tell me much.
It seems to me that at some point, my node has to send some sort of message to somewhere that eventually returns any other peer addresses the algorithm finds, but where does it send this initial message to if there's no central server to ask and it's not aware of any other nodes at this point?

If you are asking only about Global PNRP cloud then perhaps you can find useful the following phrase from "1.3.3.1 Discovering a Cloud" of PNRP protocol specification:
... To discover some nodes in the Global PNRP cloud, the discovering node
contacts one of two well-known seed servers whose addresses are
resolved via a Domain Name System (DNS) lookup ...
But I think you probably would like to read the full text and follow the links:
To discover nodes on the same link, a node uses the Simple Service
Discovery Protocol (SSDP) (as specified in [UPNPARCH1], section 1) to
discover other nearby nodes that are already in the cloud. If there
are no other nodes in the cloud of interest that exist on the node's
link, then the discovering node uses a seed server to find some. To
discover some nodes in the Global PNRP cloud, the discovering node
contacts one of two well-known seed servers whose addresses are
resolved via a Domain Name System (DNS) lookup. To discover some nodes
in a Site cloud, the discovering node must know the name or address of
a seed server via some other method (for example, manual
configuration, or supplied by an application).

Related

Whats the best way to share objects between webservers on different hosts

TLDR;
I am wondering what is the best way to reliably share an object or other data between n number of webservers on n number of machines?
I have looked at the likes of redis but it seems that this would not be what I am actually looking for here. I am now thinking that something like IPC over remote / RPC might be more appropriate? Is there a better way to do this given it will be called at minimum 10 times over a 30 second interval which can exponentially grow as the number of users running servers grows too.
Example & current use case:
I run a multiplayer mod for a game which receives a decent level of traffic and we are starting to notice cases where requests get dropped sometimes. The backend webserver is written in NodeJS and uses express in a couple of places too. We are in the process of restructuring the system and we have now come to restructuring the part of the system that handles a heartbeat from each server that members of the public host. This information is then shared out to the players so they can decide which server to join.
Based on my own research I am looking to host the service on several different machines for redundancy. These machines are then linked over vlan / vswitch so that they have a secure method to communicate with eachother. The database system is already setup to replicate this way however I cannot see a performance inclined way to handle the sharing of objects containing information about the servers that have communicated with each webhost.
If it helps the system works something like this:
Users server -> my load balancer -> webhost (backend).
Player -> my load balancer -> webhost (backend) returns info on all currently online servers.
In the example above and what is currently in use is a single instance webserver which handles the requests and processes needed.
Just an idea while the community proposes answers: consider reading about Apache Thrift. It is not such as IPC like, but more an RPC like. If the architecture of your servers, or the different components or the "backend network" is in "star"... with one "master" I shoud consider that possibility.
If the architecture of your backend is not like that... but a group of "independent" entities, it comes to my mind to solve the functionality with some "data bus" such as private MQTT broker and a group of members, subscribed or publishing data for the rest of the network. The most optimal serialization strategy for the object would be in my opinion Google Protobuf.
The integration of Mqtt with nodeJS is very simple, and if the weight of the packets is not too big, and you can admit some latency, I would really recommend you to make some tests using Mqtt with a publish/subscription QoS=2. It would not take great efforts to substitute de underlying communications library that you are using.
Once that is said, it seems that there's another solution: Kafka, that seems very interesting (I don't really know it).
Your choice will depend on the nature of your data, mostly: weight of the packets, frequency per user, and the latency you are willing to admit for the worst of the scenarios.

Can I run a microservice which keeps a port open in the cloud?

I'm new to microservices. I envision them as a set of processes running in two or more machines (I suppose for a given process two instances must be run in separate machines for reliability). In that setup, depending on the kind of clients I have there may be one process working as a TCP server serving on a specific high port and speaking a non-HTTP protocol.
However, for my low-bandwidth, testing purposes, I haven't found a free cloud service which provides that kind of environment (machines to run processes on – say, Java on Linux – while keeping a high port open).
Maybe the facilities I'm expecting are only available to paying customers, or maybe implementing a microservice architecture in the cloud goes beyond simply running processes in machines and sharing a database? Could someone clarify? (and if possible direct me to one such free service)
Yes, you are right when you say Microservices are more about independent service (processes) that can be deployed in one or more cloud machines. Each service can communicate to other using non-http protocol like Message brokers, Thrift, Remote Procedure call (RPC) etc.
As the architecture point of view, services should mostly be decoupled enough to handle complexity of distributed computing. see the image on Microservices Architecture link
There's a concept of API Gateway which could be used for authentication and service registration and discovery purpose.
Coming back to your question, you can test microservices on single cloud (by running each service on different port) and use API Gateway to discover the service path for references here are the links which are worth to look at these.
For concept see links: Microservices.io and stackoverflow question
For Implementation: zookeeper and Auth0 (this is what i'm using)
If you are java lover great to look at infoQ article
Some of the free source that might can help in building and testing microservices are: Google App Engine, hook.io

Using Hazelcast as a service directory?

I am exploring the notion of using Hazelcast (or any another caching framework) to advertise services within a cluster. Ideally when a cluster member departs then its services (or objects advertising them) should be removed from the cache.
Is this at all possible?
It is possible for sure.
The question is: which solution do you like.
If the services can be stored in a map, you could create a map with a ttl of e.g. a few minutes and each member needs to refresh its service to prevent the services from expiring.
An alternative solution is to listen to member changes using the membershiplistener and once a member leaves, the services that belong to that member need to be removed from the map.
If you don't like none of this, you could create your own SPI based implementation. The SPI is the lower level infrastructure used by hazelcast to create its distributed datastructures. A lot more work, but also a lot of flexibility.
So there are many solutions.

What are the best papers for learning about algorithms for communicating updates in a distributed system?

I have a distributed system in mind (multiple nodes in a single datacenter) that I want to have the following properties:
nodes can enter and leave the system at any time.
There is no data replication between nodes.
Which node the client makes use of is up to the client (i.e. it could be consistent hashing, it could be something else)
no master (i.e. no central point of failure)
each node may receive a piece of information that needs to be forwarded to the rest of the nodes
What algorithms (links to papers are best) are suitable for this?
(I assume some of the answers will include P2P algorithms, but most of them that I've encountered in the past have acted more like distributed hash tables, where nodes enter and take over some part of the keyspace, etc. I also recognize that multicast with simple UDP messages might be appropriate here, but what existing work would help make the messaging reliable?)
How about trying to implement ADHOC nodes with JXTA? See the Practical JXTA II book available online at Scribd.

Using Linux Virtual Server for load balancing of zones in MMO game

I'm a developer of a MMO game and currently we're at my company facing some scalability issues which, I think, can be resolved with proper clustering of the game world.
I don't really want to reinvent the wheel that's why I think Linux Virtual Server could be a good choice especially with some Level 7 load balancing technique.
I'm currently looking at ktcpvs as a load balancing solution and wonder if it's a proper choice.
The main idea is to have a number of zones("locations" in terms of my game) running on dedicated servers. When a player decides to go to some specific location the load balancer decides which zone server will be actually serving the player(that's actually why I need a Level 7 load balancer)
What do you folks think about all said above?
Update: I posted the same question to LVS users mailing list http://marc.info/?l=linux-virtual-server&m=124976265209769&w=2
Update: I also started the similar topic on the gamedev.net forum http://www.gamedev.net/community/forums/topic.asp?topic_id=544386
In order to address your question we need to understand whether you need volume or response, but it is difficult to get both at the same time.
Layer 7 load balancing - is data based application level balancing, so the data content of the network packet needs to be routed to an end-point. You can achieve volume (more users) by implementing routing at the application level, service level or kernel level.
Scalability - I assume you are running out of memory, CPU resources and network bandwidth.
Application level - your application logic receives an application packet and routes accordingly.
Service level - your system framework (front end service of some kind) receives the packet and through a module - performs the routing (think of custom apache module, even network driver modules - like writing a network filter)
Kernel level - Performs routing at network packet level.
The closer you move to the metal, the better your response will be. I suggest using dedicated linux server up-front to perform the routing - go native, not virtual. Use multiple or teamed network adapters for the WAN and a dedicated adapter for each end-point (one+ wan, one each for each connected app server)
If response time is important then you need a kernel/supervisor state solution, it will save you a few context switches but be aware that you need to limit hops at all costs and could better be served by fewer, larger machines and your scalability will always be limited. There is a risk in using KTCPVS, it is quite old and not actively updated. If you judge that it works for you great, otherwise consider writing something akin to a network filter as long as it runs in system state.
If volume is important but response time is secondary, implement a custom built high-speed socket switch built in C++ running in problem/user state. It is the easiest to maintain and will offer the best scalability.
You will need to build some prototypes to figure out what suits your needs best.
Final thoughts -
Before doing any of the above first ensure that you have optimized your game design. You may know most of this, I list it here for the benefit of all.
(a) messages should fit comfortably within one network packet, less than 1500 bytes for most home routers
(b) Try to fit the logic of the routing in your game client instead of your servers. A simple download of a small table with zones and IP addresses to a client will allow you to forego all of the above.
(c) Try to limit zone visibility by to the clients, they should know about their zones and adjacent zones only (if you implement the point b above)
Hope this helps, sorry I cannot be more specific regarding KTCPVS.
You haven't specified where the bottleneck is. Network Traffic? Disk IO? CPU Cycles?
Assuming you mean a layer 7 load balancer and don't have enough CPU power, I think LVS ist not the optimal choice. I have done Web Server load balancing with LVS, which works straightforward and isn't exactly complicated.
But I think load balancing an MMORP this way needs considerable amounts of additional code in LVS, it might be easier to do the load balancing with a multithreaded application distributed over some multicore server. But this isn't fully scalable, this only gets you to 16 cores without prohibitve cost increase.
The biggest issue in something like this is what happens when players are near a boundary. Obviously they need to be able to see and interact with each other, but they're on separate servers. So you need some pretty fancy inter-server communication, sometimes just duplicating messages to both servers. It can get even more complicated when someone is near a "corner", and then you have to deal with 4 servers!
The book Massively Multiplayer Game Development has a chapter on "The Pitfalls of Shared Server Boundaries" which covers this issue in detail.
I haven't heard of Linux Virtual Server before now, so I don't understand how it fits. I think your actual server application needs to support this game-specific load balancing, rather than trying to run a cluster and assuming that it will automatically know how to split up your application (which it won't). If I were you, I would write the server program to handle its own piece of land, and it should connect to the pieces of land around it, and then design a server-to-server protocol for the passing of these messages ("here comes a player, I'm going to start telling you about him!" "make sure to tell me about messages near our boundary", "okay the player is out of my territory and into yours, here's his detailed data", etc). I think it's a bit more complicated than just running a different flavor of Linux and assuming you'll get automatic load balancing.
Why are you moving the distribution logic to the loadbalancer? It's a component that's not free and can break. It seems your clients are quite aware of which zone they're in. It seems they could very well connect to zone<n>.example.com. You'd then handle loadbalancing at DNS level.

Resources