Which DHT algorithm to use (if I want to join two separate DHTs)? - p2p

I've been looking into some DHT systems, specially Pastry and Chord. I've read some concerns about Chord's reaction to churn, though I believe that won't be a problem for the task I have at hands. I'm implementing some sort of social network service that doesn't rely on any central servers for a course project. I need the DHT for the lookups.
Now I don't know of all the servers in the network in the beginning. As I've stated, there's no main tracker server. It works this way: each client has three dedicated servers. The three servers have the profile of the client, and it's wall, it's personal info, replicated. I only get to know about other group of servers when the user adds a friend (inputing the client's address). So I would create two separate DHTs on the two groups of three servers and when they friend each other I would like to join the DHTs. I would like to this consistently. I haven't had a lot of time to get all that familiar with the protocols, so I would like to know which one is better if I want to join the two separate DHTs?

Distributed hash tables are designed to automatically handle the problem of finding a node that stores a given piece of data. So, in the DHT design philosophy, you wouldn't have a dedicated server for the profile, wall, etc... you'd have a dedicated data-identifier for each of those and the DHT would handle placing the data amongst active servers and finding the correct server for a given piece of data.
Pastry and Chord are pretty similar in terms of functionality and differ mostly in how they handle neighbor sets and routing. It's not clear to me that one would be better than the other for this sort of application.
A good technical comparison paper is A performance vs. cost framework for evaluating DHT design tradeoffs under churn (PDF), from Infocom 2005, if you really want details.

Related

How to slice down a monolith where my domain depends heavily from other domains?

I'm currently learning about Microservices because in the company where I work, we will split down our giant monolith into microservices.
We have a lot of business logic in our application, but this rules basically validate data from many different domains and act accordingly the status of this data.
Our domain has it's own data, of course, but I even dare to say we depend something like 60 ~ 70% of data from different domains, which makes our domain kind of an aggregator.
I created a little diagram to illustrate it:
So like I said before, my domain (Domain A) has a lot of business logic to validate data and status from all those different domains. And then after this validation, take the appropriate actions and save the result of it in the DB.
I feel like I hit a dead end, cause I have read a few articles how to break down a monolith, but I haven't got any good example where it explains this situation.
So I ask you guys, do you have any suggestions to tell me? :)
Thanks!
In DDD speak a bounded context approximates a microservice. The different domains in your diagram are probably going to be bounded contexts.
You most certainly do not want concepts from various BCs polluting each other as you are going to end up with quite a mess.
There is one place, however, where you may run into this and that is on integration/orchestration. Here you should approach this concern almost as a separate BC that relates to the orchestration or integration.
For instance, let's assume that you have an Assets domain and an Accounting domain. The two should know nothing about each other. However, when you decommission an asset (say some huge machine that grinds down rocks into stones) you perhaps need to have the accounting domain register some write-off value if the asset has not reached the end of its useful life. In this layer you would integrate the various bits of your process and manage the state using a process manager. Although the process manager, and related state, may belong to the Assets BC the AssetsOrchestration BC may make use of objects from both the Assets as well as the Accounting BCs. Typically you would attempt to limit that interaction to, say, messages using some messaging infrastructure but YMMV.
A good starting point may be Sam Newman's Microservice Decomposition Patterns.
About 18 minutes into the talk, Newman offers this pattern:
Within your monolith, identify the coarse modules of functionality.
Draw the dependency graph
OK, your dependency graph has cycles in it. That's no good, you want an acyclic graph. So iterate on the graph until you are able to eliminate the cycles. DO NOT PASS GO until you have this done
Identify candidate modules that have no inbound dependencies - you want the bits that nothing else depends on.
Pick one, and see if you can redesign your system to allow you to deploy that module independently of the monolith.

Best way to manage 1000+ users registering for conference events at once

We are currently in the process of organising a student conference.
The issue is that we offer several different events at the same time over the course of a week. The conference runs the whole day.
It's currently been operating on a first come, first served basis, however this has led to dramatic problems in the past, namely the server crashing almost immediately, as 1000+ students all try to get the best events as quickly as they can.
Is anyone aware of a way to best handle this so that each user has the best chance of enrolling in the events they wish to attend, firstly without the server crashing and secondly with people registering for events which have a maximum capacity, all within a few minutes? Perhaps somehow staggering the registration process or something similar?
I'm aware this is a very broad question, however I'm not sure where to look to when trying to solve this problem...
Broad questions have equally broad answers. There are broadly two ways to handle it
Write more performant code so that a single server can handle the load.
Optimize backend code; cache data; minimize DB queries; optimize DB queries; optimize third party calls; consider storing intermediate things in memory; make judicious use of transactions trading off consistency with performance if possible; partition DB.
Horizontally scale - deploy multiple servers. Put a load balancer in front of your multiple front end servers. Horizontally scale DB by introducing multiple read slaves.
There are no quick fixes. It all starts with analysis first - which parts of your code are taking most time and most resources and then systematically attacking them.
Some quick fixes are possible; e.g. search results may be cached. The cache might be stale; so there would be situations where the search page shows that there are seats available, when in reality the event is full. You handle such cases on registration time. For caching web pages use a caching proxy.

Distributed Domain Driven Design Resources

I am quite confident with developing DDD applications, but one area which is continuing to cause me problems is when two applications integrate with each other. I am struggling to find any useful books or resources on the subject. Books such as Patterns of EAI go into depth about messaging patterns and message construction, but don't really explain how to architect systems that make use of these patterns.
I've searched high and low and I'm quite sure there are no sample applications that demostrate how to integrate two systems. I understand the concept of asynchronous messaging, but again can't find good examples of how to apply it.
Resources on SOA seem to keep repeating the same concepts without demonstarting how to implement them, and more often than not seem more concerned with selling me products.
Here are the sort of questions I am struggling to answer:
Should each application have it's own copy of the data? For example, should every application within an organisation have it's own list of clients, which it updates upon the receipt of a message?
At what point in the DDD stack are messages passed? Are they the result of domain events?
Can I combine asynchrounous messaging and WCF or do I have to chose? Do I use WCF for request/response and messaging for publish/subscribe?
How does one DDD application consume the services of another? Should one DDD application query another system for its data via its application services, or should it already have its own local copy of the data, as mentioned in point 1?
Apparently I can't have a transaction across two systems. How do I avoid this?
If I sound confused it's because I am. I'm not looking for answers to the above questions, just pointing in the direction of resources that will answer this and simmilar questions.
I've been making a similar transition. My advice:
Start at http://cqrsinfo.com/.
Listen the the Distributed Podcast.
Catch any of Greg Young's talks. For example, here is Eric Evans interviewing Greg. He's got some all-day sessions that are recorded as well.
Read/listen to anything from Udi Dahan (poscasts, lectures, articles, etc.). He's got some good stuff on InfoQ.
Wait for Greg's book.
Read whatever you can find on EDA (Event Driven Architecture).
In addition to what Eric Farr had said, it may be worth looking closely at Part 4 of the DDD book (Strategic Design). It does not approach the problem from 'Distributed' angle but has a lot of information on how to integrate applications (Bounded Contexts). Patterns like Anticorruption Layer can be very helpful when designing at the boundary of the application.

Considerations regarding a p2p social network

While the are many social networks in the wild, most rely on data stored on a central site owned by a third party.
I'd like to build a solution, where data remains local on member's systems. Think of the project as an address book, which automagically updates contact's data as soon a a contact changes its coordinates. This base idea might get extended later on...
Updates will be transferred using public/private key cryptography using a central host. The sole role of the host is to be a store and forward intermediate. Private keys remain private on each member's system.
If two client are both online and a p2p connection could be established, the clients could transfer data telegrams without the central host.
Thus, sender and receiver will be the only parties which are able create authentic messages.
Questions:
Do exist certain protocols which I should adopt?
Are there any security concerns I should keep in mind?
Do exist certain services which should be integrated or used somehow?
More technically:
Use e.g. Amazon or Google provided services?
Or better use a raw web-server? If yes: Why?
Which algorithm and key length should be used?
UPDATE-1
I googled my own question title and found this academic project developed 2008/09: http://www.lifesocial.org/.
The solution you are describing sounds remarkably like email, with encrypted messages as the payload, and an application rather than a human being creating the messages.
It doesn't really sound like "p2p" - in most P2P protocols, the only requirement for central servers is discovery - you're using store & forward.
As a quick proof of concept, I'd set up an email server, and build an application that sends emails to addresses registered on that server, encrypted using PGP - the tooling and libraries are available, so you should be able to get that up and running in days, rather than weeks. In my experience, building a throw-away PoC for this kind of question is a great way of sifting out the nugget of my idea.
The second issue is that the nature of a social network is that it's a network. Your design may require you to store more than the data of the two direct contacts - you may also have to store their friends, or at least the public interactions those friends have had.
This may not be part of your plan, but if it is, you need to think it through early on - you may end up having to transmit the entire social graph to each participant for local storage, which creates a scalability problem....
The paper about Safebook might be interesting for you.
Also you could take a look at other distributed OSN and see what they are doing.
None of the federated networks mentioned on http://en.wikipedia.org/wiki/Distributed_social_network is actually distributed. What Stefan intends to do is indeed new and was only explored by some proprietary folks.
I've been thinking about the same concept for the last two years. I've finally decided to give it a try using Python.
I've spent the better part of last night and this morning writing a sockets communication script & server. I also plan to remove the central server from the equation as it's just plain cumbersome and there's no point to it when all the members could keep copies of their friend's keys.
Each profile could be accessed via a hashed string of someone's public key. My social network relies on nodes and pods. Pods are computers which have their ports open to the network. They help with relaying traffic as most firewalls block incoming socket requests. Nodes store information and share it with other nodes. Each node will get a directory of active pods which may be used to relay their traffic.
The PeerSoN project looks like something you might be interested in: http://www.peerson.net/index.shtml
They have done a lot of research and the papers are available on their site.
Some thoughts about it:
protocols to use: you could think exactly on P2P programs and their design
security concerns: privacy. Take a great care to not open doors: a whole system can get compromised 'cause you have opened some door.
services: you could integrate with the regular social networks through their APIs
People will have to install a program in their computers and remeber to open it everytime, like any P2P client. Leaving everything on a web-server has a smaller footprint / necessity of user action.
Somehow you'll need a centralized server to manage the searches. You can't just broadcast the internet to find friends. Or you'll have to rely uppon email requests to add somenone, and to do that you'll need to know the email in advance.
The fewer friends /contacts use your program, the fewer ones will want to use it, since it won't have contact information available.
I see that your server will be a store and forward, so the update problem is solved.

What came before web services and SOA?

I'm interested in the history of distributed, collaborative, cross-organisational programming paradigms - web services and SOA are de-facto now, but what came before? What models have been superceded by SOA?
Thanks
Well, I suppose there was RPC - which is really what SOAP is, only they didn't piggy-back the data payload on top of a standard protocol (http in SOAP's case). So CORBA and DCE-RPC and ONC RPC all did the same thing, but only over internal networks, not over the internet.
There was also EDI as a 'standard' for exchanging data between disparate entities. This was effectively a way of defining what the data payload would look like (similar the the XML part of SOAP).
But these are still not SOAs really, they provide the same functionality but the big difference was how people thought of using them. Once you could write a machine-to-machine 'website' and have different machines talk to each other through them, it took off. You could do it before using CORBA, say, but it wasn't as easy or as widely known about. You can tell this has happened by the fact we have several terms used for effectively the same thing - SOA, SaaS, Web Services... all the same thing (but lots of money to be made 'consulting' on the difference ;) )
Maybe Silos?
...where services are just not shared across an enterprise, at least in a standard way. This is why products like BizTalk are used: to get silos to talk to each other via standard interfaces.
I don't really think you'll find anything that's been superceded by SOA. You will find that there's been progress in organizing computer programs to take advantage of the SOA type principles. As for programming models that have been in reasonably common use, well, let's see... CORBA, RPC, more generic client-server applications. Of course, computer-to-computer communications were preceded by process-to-process communication using a wide variety of conventions.
SOA as a philosophy of breaking large problems into smaller ones and then composing the results has been known and applied since humans started making bricks instead of building complete walls. Of course, that was mostly implicit. Explicit statements for SOA really started to come about with CORBA and, while SOA is independent of Web Services, the advent of HTTP and XML, and then SOAP, really started to make development of non-specialized "services" easier, more worthwhile and thus common.
This pdf A Note on Distributed Computing should be an interesting read. It is pre-SOA and would give an idea of the history up to that point (1994).
I would say distributed object technology. And before it remote procedure calls.
RPC is one of the earlier approaches and gained popularity from the Sun implementation. One of the famous uses is NFS (network file system).
As object oriented programming became more popular, distributed objects followed. Most important was Microsoft DCOM (and later COM+) and, more industry wide, CORBA.
SOA is a divide and conquer approach that is critically dependent on the concept of services. Which is different from objects as used by CORBA et al, as well as being different from resources as in REST.
Objects are created and their lifetime is typically controlled by the client. On the other hand, services are assumed to be always there provided by the server. This is one reason why SOA is not equivalent to distributed objects.
Services are also stateless, which means that the server when considering the response to a service request need not look at the history of interaction with the client. This was not a consideration when originally devising the RPC concept as scalability wasn't such an important issue then. Interestingly, large scale users of RPC did notice the relationship between scalability and statelessness. The NFS RFC explicitly mentions stateless servers, though with reliability as the main concern. Anyway, statelessness is one of the main difference between services and plain old RPC.
In short, no. I don't believe in the revisionist history of SOA being since the dawn of time. Any more than the universe being written in Lisp (or Perl for that matter). Nor is it equivalent to divide and conquer or division of labour.
SOA started as a concept at some point in the nineties. Overlapping with the development of CORBA. It is much harder to pinpoint an actual date or event and there are more than a few claims to the conceptualisation of it.

Resources