Why we have redundant repository in BLoC Architecture? - bloc

In BLoC Architecture we have Data Provider and Repository , In many examples I see that
Repository just called Data Provider and it is really cumbersome to create Repository , Why Repository exists? what is purpose

This is actually something that comes from Adopting Clean Architecture, where a repository is an interface which provides data which is from a source to the UI.
The sources are usually Remote & Data where Remote refers to fetching data from a remote source (this could be other apps, REST API, Websocket connect) and data which is from a local source (something akin to a database.) The idea behind having two separate classes for this is to provide adequate separation of concerns.
Imagine an App like Instagram, where the App manages both offline data and online data. It would make sense then to have the logic handled for each separately, and then use the repository which is what your viewmodel/bloc takes in to access the data. The bloc doesn't need to know where the source of data came from, it only needs the data. Repository Implementation doesn't need to know what is used to make an API call, it just needs to consume the fetched data. Similarly, the Repository Implementation doesn't need to know where the local data is fetched from, it just needs to consume it. This way every bit is abstracted adequately and changes in one class, doesn't affect other classes because everything is an Interface.
All of this helps in testing the code better since mocking and stubbing becomes easier.

Related

Different persistence repositories for an aggregate in DDD

I have an aggregate with a root entity (Documentation) and a VO (Document). Documents are associated with files (pdfs, images, office documents, etc), so I have to persist the aggregate in a database and files in a ftp server (files cannot be saved in the database because space files is too large).
My db repository class implements an interface with methods like FindXXX, AddDocument, RemoveDocument and others. How could I implement ftp persistence? Should my db repository connect to ftp setver in AddDocument and RemoveDocument? Or I should create a ftp repository class that implements the interface. If so, methods like FindXXX not make sense.
As far as I know about DDD, each aggregate have only one interface repository that represents how can be persisted. It can have multiple "persistence modes" (in a db, ftp, file, etc) but the interface should the same.
As far as I know about DDD, each aggregate have only one interface repository that represents how can be persisted.
That's mostly true; people generally assume that an entire aggregate is going to be stored in a single place. When you distribute the state of the aggregate across multiple storage units, your failure modes need very careful attention.
So one thing to consider is whether the separately stored documents are something that are part of the aggregate, or something that is referenced by the aggregate.
If they are referenced by the aggregate, then you treat them like any other reference to another aggregate. The documentation aggregate stores a identifier/reference/hint for the document, and takes advantage of a domain service to access the document if it needs it.
If they are part of the aggregate, then the usual answer is that "the repository" will be a facade in front of a complicated infrastructure thing that masks the fact that the documentation and the document(s) are stored separately.
In other words, the infrastructure layer will be trying to orchestrate the load and store operations, and the rest of the system doesn't need to know the details.
Late response. But, simply put, you should have two services. In my reading of DDD, repositories are often considered as infrastructure services. In this case, you have two:
A repository / interface for storage, and basic retrieval of document IDs, metadata, and references
A repository / interface for storage, and basic retrieval of blobs of data
Sometimes it makes sense to have multiple aggregates, and repositories. In fact, some of Vaughn Vernon examples on bounded contexts (https://github.com/VaughnVernon/IDDD_Samples) do include aggregates holding references to other aggregates. I would argue that you should do what makes sense, and what feels appropriate.
Indeed, if you were running a post office collection centre, chances are you will have a way of 1. storing the small to large parcels, and 2. curating an index of where every small to large parcels are located in the centre so that you can retrieve it.
My db repository class implements an interface with methods like FindXXX, AddDocument, RemoveDocument and others. How could I implement ftp persistence? Should my db repository connect to ftp setver in AddDocument and RemoveDocument? Or I should create a ftp repository class that implements the interface.
If your database repository connects to FTP in addition to some other data store, you may arguably be putting too much logic, and responsibility in one area. That said, there is nothing wrong with doing this too.
If so, methods like FindXXX not make sense. As far as I know about DDD, each aggregate have only one interface repository that represents how can be persisted.
For this specific problem, most DDD practitioners will recommend you have a separate view service / model. It can produce a materialised / view DTO across repositories or services.
Fundamentally, it should be easy to test individual parts, and to replace underlying implementations. If you decided to switch (or even include support) from FTP to Google Cloud Storage / AWS S3 one day, then there might be more work involved, and changes to test cases.

Grpc microservice architecture implementation

In a microservice architecture, is it advisable to have a centralized collection of proto files and have them as dependency for clients and servers? or have only 1 proto file per client and server?
If your organization uses a monolithic code base (i.e., all code is stored in one repository), I would strongly recommend to use the same file. The alternative is only to copy the file but then you have to keep all the versions in sync.
If you share protocol buffer file between the sender and the receiver, you can statically check that both the sender and the receiver use the same schema, especially if some new microservices will be written in a statically typed language (e.g., Java).
On the other hand, if you do not have a monolithic code base but instead have multiple repositories (e.g., one per microservice), then it is more cumbersome to share the protocol buffers file. What you can do is to put them in separate repositories that can be added as an dependency to microservices that need them. That is what I have seen in my previous company. We had multiple small API repositories for the schema.
So, if it is easy to use the same file, I would recommend to do so instead of creating copies. There may be situations, however, where it is more practical to copy them. The disadvantage is that you always have to apply a change at all copies. In best case, you know which files to update, then it is just tedious. In the worst case, you do not know which files to update, and your schema will get out of sync. Only when the code is released, you will find out.
Note that monolithic code base does not mean monolithic architecture. You can have microservices and still keep all the source code together in one repository. The famous example is, of course, Google. Google also heavily uses protocol buffers for their internal communication. I have not seen their source code, but I would be surprised if they do not share their protocol buffer files between services.

REST API with versioned data and differential endpoint : optimizing bandwidth and performance

My NodeJS project is based on SailsJS, itself using ExpressJS.
Its API will be used by mobile apps to fetch their data from it.
The tricky part is I don't want the client apps to fetch the whole data tree every time there is a change in the database.
The client only needs to download a differential between the data it's already got and the data on the server.
To achieve that I thought of using git on the server. That is create a repository and save all endpoints as a json file in the repo. Each save will trigger an automatic commit.
Then I could create a specific API endpoint that will accept a commit sha as a parameter and return a diff between that and git HEAD.
This post by William Benton comforted me with this idea.
I'm now looking for any tips that could help me get this working based on the language and frameworks cited above :
I'd like to see a proof of concept of this in action but couldn't find one
I couldn't find an easy way to use git with NodeJS yet.
I'm not sure how to parse the returned diff on client apps developed with the IONIC framework, so AngularJS.
Note : The api will only be readable. All DB movement will be triggered by a custom web back-end used by few users.
I used the ideas in that post for an experimental configuration-management service. That code is in Erlang and I can't offer Node-specific suggestions, but I have some general advice.
Calling out to git itself wasn't a great option at the time from any of the languages I was interested in using. Using git as a generic versioned-object store actually works surprisingly well, but using git plumbing commands is a pain (and slow, due to all of the forking) and there were (again, at the time) limitations to all of the available native git libraries.
I wound up implementing my own persistent trie data structure and put a git-like library interface atop it. The nice thing about doing this is that your diffs can be sensitive to the data format you're storing; if you call out to git, you're stuck with finding a serialization format for your data that is amenable to standard diffs. (Without a diffable format, though, you can still send back a sequence of operations to the client to replay on whatever stale objects they have.)

Domain Driven Design: Should Repository methods be passed a configuration string?

I have seen this both ways. When writing a Repository, should the methoods be passed in a connection string or should the repositpry be "self-contained", in other words, know internally how to get to the database? In case it helps, my Repository is not true DDD, but is the Repository pattern surrounding methods that call Oracle SPs (that's the way it is ar work here)?
Repositories should normally not work in their own independent transactional unit, so they most often use the 'existing' database connection. This way you can do multiple repository (database!) operations in a single transaction.
How to implement this depends on your developing platform. Java EE for example has ways to inject the current Entity Manager into objects or ways to get it by code. You can also implement this manually by storing a reference in a thread local storage.

Should repositories be both loading and saving entities?

In my design, I have Repository classes that get Entities from the database (How it does that it does not matter). But to save Entities back to the database, does it make sense to make the repository do this too? Or does it make more sense to create another class, (such as a UnitOfWork), and give it the responsibility to save stuff by having it accept Entities, and by calling save() on it to tell it to go ahead and do its magic?
In DDD, Repositories are definitely where ALL persistence-related stuff is expected to reside.
If you had saving to and loading from the database encapsulated in more than one class, database-related code will be spread over too many places in your codebase - thus making maintenance significantly harder. Moreover, there will be a high chance that later readers of this code might not understand it at first sight, because such a design does not adhere to the quasi-standards that most developers are expecting to find.
Of course, you can have separate Reader/Writer-helper classes, if that's appropriate in your project. But seen from the Business Layer, the only gateway to persistence should be the repository...
HTH!
Thomas
I would give the repository the overall responsibility for encapsulating all aspects of load and save. This ensures that tricky issues such as managing contention between readers and writes has a place to be managed.
The repository might well use your UnitOfWork class, and might need to expose a BeginUow and Commit methods.
Fowler says that repository api should mimic collection:
A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection.

Resources