If you create a new application which uses a distributed hash table (DHT), you need to bootstrap the p2p network. I had the idea that you could join an existing DHT (e.g. the Bittorrent DHT).
Is this feasable? Of course, we assume the same technology. Combining Chord with Kademlia is obviously not feasable.
If yes, would this be considered parasitic or symbiotic? Parasitic meaning that it conflicts with the original use somehow. Symbiotic, if it is good for both applications as they support each other.
In general: Kademlia and Chord are just abstract designs, while implementations provide varying functionality.
If its feature-set is too narrow you won't be able to map your application logic onto it. If it's overly broad for your needs it might be a pain to re-implement if no open source library is available.
For bittorrent: The bittorrent DHT provides 20byte key -> List[IP,Port] lookups as its primary feature, where the IP is determined by the sender IP and thus cannot be used to store arbitrary data. There are some secondary features like bloom filter statistics over those lists but they're probably even less useful for other applications.
It does not provide general key-value storage, at least not as part of the core specification. There is an extension proposal for that
Although implementations provide some basic forward-compatibility for unknown message types by treating them like node lookup requests instead of just ignoring them that is only of limited usefulness if your application supplies a small fraction of the nodes, since you're unlikely to encounter other nodes implementing that functionality during a lookup.
If yes, would this be considered parasitic or symbiotic?
That largely depends on whether you are a "good citizen" in the network.
Does your implementation follow the spec, including commonly used extensions?
Does your general use-case stay within an order of magnitude compared to other nodes when it comes to the traffic it causes?
Is the application lifecycle long enough to not lie outside the expected churn rates of the target DHT?
Related
I know DTO is returned by the server-side and received by the client-side, but I am confused by the representation object in DDD. I think they are almost the same. Can someone tell me their differences?
Can someone tell me their differences?
They solve different problems in different contexts
Data transfer is a boundary concern - how do we move information from here to there (across a remote interface)? Among the issues that you may run into: the transfer of information is slow, or expensive. One way of keeping this under control is to move information in a larger grain.
the main reason for using a Data Transfer Object is to batch up what would be multiple remote calls into a single call -- Martin Fowler, Patterns of Enterprise Application Architecture
In other words, a DTO is your programs representation of a fat message.
In DDD, the value object pattern is a modeling concern; it is used to couple immutable representations of information and related computations.
A DTO tends to look like a data structure, with methods that can be used to transform that data structure into a representation (for example: an array of bytes) that can be sent across a boundary.
A value object tends to look like a data structure, with methods that can be used to compute other information that is likely to be interesting in your domain.
DTO tend to be more stable (or at least backwards compatible) out of necessity -- because producer and consumer are remote from one another, coordinating a change to both requires more effort than a single local change.
Value objects, in contrast, are easier to change because they are a domain model concern. IF you want to change the model, that's just one thing, and correspondingly easier to coordinate.
(There's kind of a hedge - for system that need persistence, we need some way to get the information out of the object into a representation that can be stored and retrieved. That's not necessarily a value object concern, especially if you are willing to use general purpose data structures to move information in and out of "the model".)
In the kingdom of nouns, the lines can get blurry - partly because any information that isn't a general purpose data structure/primitive is "an object", and partly because you can often get away with using the same objects for your internal concerns and boundary cnocerns.
I'm learning about microservices.
On one hand, the literature recommends using asynchronous event-publishing for microservices that need to collaborate on sagas or take action on events published by other services.
On the other hand, the same literature recommends not using a shared library to define common events because that couples the microservices through that event library.
Am I taking crazy pills? Aren't those microservices coupled by those events anyway if they rely on them? If so, what is the advantage of coding the exact same events with the same definition in two (or even more) different places? Isn't that a total violation of the DRY principle?
I'm starting to smell a code smell that starts with the initials BS. Will someone help me drink the rest of this koolaid? Or did I just see the emperor with his clothes off for a second?
If so, what is the advantage of coding the exact same events with the same definition in two (or even more) different places?
There could be a number of advantages -- the microservices might be implemented using different languages. Or using the same language, but different in memory representations of the data to suit there specific needs. Or even the "same" in memory representations, but different versions, because they are on different deployment schedules.
There's nothing inherently wrong with sharing the labor of preparing a messaging library among the implementations of your services. But that should be an opt-in, rather than being a requirement. In particular, a team always has the option of replacing the library if the shared implementation is getting in the way.
Two services that agree that the messages are going to use UTF-8 encoded JSON documents should not be required to use the same parser -- the choice of parser is an implementation detail. The coupling is to the schema (the agreement about the semantics of the bytes in the message), not to the implementation.
If you treat events as plain data objects, you don't need a library to deal with them - other than generic messagning and serialization/deserialization code.
The whole point of microservices is to have independent development cycles, so as soon as you introduce the common library, you are starting to make a "distributed monolyth". Any change in this library will cause a redeployment of all microservices.
Without event-specific library the only dependency you introduce it a knowledge of particular event structure from another microservice. Well, this is a necessary evil.
I've read a few posts relating to this, but i still can't quite grasp how it all works.
Let's say for example i was building a site like Stack Overflow, with two pages => one listing all the questions, another where you ask/edit a question. A simple, CRUD-based web application.
If i used CQRS, i would have a seperate system for the read/writes, seperate DB's, etc..great.
Now, my issue comes to how to update the read state (which is, after all in a DB of it's own).
Flow i assume is something like this:
WebApp => User submits question
WebApp => System raises 'Write' event
WriteSystem => 'Write' event is picked up and saves to 'WriteDb'
WriteSystem => 'UpdateState' event raised
ReadSystem => 'UpdateState' event is picked up
ReadSystem => System updates it's own state ('ReadDb')
WebApp => Index page reads data from 'Read' system
Assuming this is correct, how is this significantly different to a CRUD system read/writing from same DB? Putting aside CQRS advantages like seperate read/write system scaling, rebuilding state, seperation of domain boundaries etc, what problems are solved from a persistence standpoint? Lock contention avoided?
I could achieve a similar advantage by either using queues to achieve single-threaded saves in a multi-threaded web app, or simply replicate data between a read/write DB, could i not?
Basically, I'm just trying to understand if i was building a CRUD-based web application, why i would care about CQRS, from a pragmatic standpoint.
Thanks!
Assuming this is correct, how is this significantly different to a CRUD system read/writing from same DB? Putting aside CQRS advantages like seperate read/write system scaling, rebuilding state, seperation of domain boundaries etc, what problems are solved from a persistence standpoint? Lock contention avoided?
The problem here is:
"Putting aside CQRS advantages …"
If you take away its advantages, it's a little bit difficult to argue what problems it solves ;-)
The key in understanding CQRS is that you separate reading data from writing data. This way you can optimize the databases as needed: Your write database is highly normalized, and hence you can easily ensure consistency. Your read database in contrast is denormalized, which makes your reads extremely simple and fast: They all become SELECT * FROM … effectively.
Under the assumption that a website as StackOverflow is way more read from than written to, this makes a lot of sense, as it allows you to optimize the system for fast responses and a great user experience, without sacrificing consistency at the same time.
Additionally, if combined with event-sourcing, this approach has other benefits, but for CQRS alone, that's it.
Shameless plug: My team and I have created a comprehensive introduction to CQRS, DDD and event-sourcing, maybe this helps to improve understanding as well. See this website for details.
A good starting point would be to review Greg Young's 2010 essay, where he tries to clarify the limited scope of the CQRS pattern.
CQRS is simply the creation of two objects where there was previously only one.... This separation however enables us to do many interesting things architecturally, the largest is that it forces a break of the mental retardation that because the two use the same data they should also use the same data model.
The idea of multiple data models is key, because you can now begin to consider using data models that are fit for purpose, rather than trying to tune a single data model to every case that you need to support.
Once we have the idea that these two objects are logically separate, we can start to consider whether they are physically separate. And that opens up a world of interesting trade offs.
what problems are solved from a persistence standpoint?
The opportunity to choose fit for purpose storage. Instead of supporting all of your use cases in your single read/write persistence store, you pull documents out of the key value store, and run graph queries out of the graph database, and full text search out of the document store, events out of the event stream....
Or not! if the cost benefit analysis tells you the work won't pay off, you have the option of serving all of your cases from a single store.
It depends on your applications needs.
A good overview and links to more resources here: https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs
When to use this pattern:
Use this pattern in the following situations:
Collaborative domains where multiple operations are performed in parallel on the same data. CQRS allows you to define commands with
enough granularity to minimize merge conflicts at the domain level
(any conflicts that do arise can be merged by the command), even when
updating what appears to be the same type of data.
Task-based user interfaces where users are guided through a complex process as a series of steps or with complex domain models.
Also, useful for teams already familiar with domain-driven design
(DDD) techniques. The write model has a full command-processing stack
with business logic, input validation, and business validation to
ensure that everything is always consistent for each of the aggregates
(each cluster of associated objects treated as a unit for data
changes) in the write model. The read model has no business logic or
validation stack and just returns a DTO for use in a view model. The
read model is eventually consistent with the write model.
Scenarios where performance of data reads must be fine tuned separately from performance of data writes, especially when the
read/write ratio is very high, and when horizontal scaling is
required. For example, in many systems the number of read operations
is many times greater that the number of write operations. To
accommodate this, consider scaling out the read model, but running the
write model on only one or a few instances. A small number of write
model instances also helps to minimize the occurrence of merge
conflicts.
Scenarios where one team of developers can focus on the complex domain model that is part of the write model, and another team can
focus on the read model and the user interfaces.
Scenarios where the system is expected to evolve over time and might contain multiple versions of the model, or where business rules
change regularly.
Integration with other systems, especially in combination with event sourcing, where the temporal failure of one subsystem shouldn't
affect the availability of the others.
This pattern isn't recommended in the following situations:
Where the domain or the business rules are simple.
Where a simple CRUD-style user interface and the related data access operations are sufficient.
For implementation across the whole system. There are specific components of an overall data management scenario where CQRS can be
useful, but it can add considerable and unnecessary complexity when it
isn't required.
I'm new to UML and I have crossed path with sequence diagram, and realized that there's 2 types: distributed and centralized. Can anyone explain me the differences?
centralized control, with one participant doing most of the processing and the other participants there to supply data.
Example:
Distributed control, in which the processing is split among many participants, each one doing a little bit of the algorithm
Example:
Both styles have their strengths and weaknesses. Most people, particularly those new to objects, are more used to centralized control. In many ways, it’s simpler, as all the processing is in one place; with distributed control, in contrast, you have the sensation of chasing around the objects, trying to find the program.
Despite this, object bigots like strongly prefer distributed control. One of the main goals of good design is to localize the effects of change. Data and behavior that accesses that data often change together. So putting the data and the behavior that uses it together in one place is the first rule of object-oriented design.
Furthermore, by distributing control, you create more opportunities for using polymorphism rather than using conditional logic. If the algorithms for product pricing are different for different types of product, the distributed control mechanism allows us to use subclasses of product to handle these variations.
Can someone please provide links to any paper/reference that talks about disconnected components in P2P networks?
I have found this paper. It deals with various P2P networks including kademlia which is the basis of bittorent DHT. It defines a probabilistic metric called routability rather than talking about connectivity but I guess the two things are related. (With high routability the graph is probably connected.) From the paper:
... we consider the measure of
routability, which is defined as the
expected number of routable node pairs
divided by the number of possible node
pairs among the surviving nodes. ...
(source: imagehost.org)
One paper calls it the islanding problem, another calls it isolated overlays.