Event-Sourcing: Aggregate Roots and Performance

Event-Sourcing: Aggregate Roots and Performance - domain-driven-design

I'm building a StackOverflow clone using event-sourcing. The MVP is simple:
Users can post a question
Users can answer a question
Users can upvote and downvote answers to non-closed questions
I've modeled the question as the aggregate root. A question can have zero or more answers and an answer can have zero or more upvotes and downvotes.
This leads to a massive performance problem, though. To upvote an answer, the question (being the aggregate root) must be loaded which requires loading all of its answers. In non-event-sourced DDD, I would use lazy loading to solve this problem. But lazy loading in event-sourcing is non-trivial (http://docs.geteventstore.com/introduction/event-sourcing-basics/)
Is it correct to model the question as the aggregate root?

Firstly don't use lazy loading (while using ORM). You may find yourself in even worse situation, because of that, than waiting a little bit longer. If you need to use it, most of the times it means, that your model is just simply wrong.
You probably want to think about things like below:
How many answers to the question you expect.
What happens, if someone posted an answer while you were submiting yours. The same about upvotes.
Does upvote is just simply +1 and you don't care about it anymore or you can find all upvotes for user and for example change them to downvote (upvotes are identified).
You probably want to go for separate aggregates, not because of performance problems, but because of concurrency problems (question 2).
According to performance and the way your upvote behave you may think about modeling it as value object. (question 3)
Go ahead and read it http://dddcommunity.org/library/vernon_2011/
True performance hit you may achieve by using cqrs read/write separation
http://udidahan.com/2009/12/09/clarified-cqrs/

With a simple read model it should not be a problem. What is the maximum number of answer you would expect for a question? Maybe a few hundred which is not a big deal with a denormalized data model.
The upvote event would be triggered by a very simple command with a handful of properties.
The event handler would most likely have to load the whole question. But it is very quick to load those records by the aggregate root ID and replay the events. If the number of events per question gets very high (due to answer edits, etc) you can implement snapshots instead of replaying every single event. And that process is done asynchronously which will make the read model "eventually consistent" with the event store.

Related

How to name an event describing the acknowledgment of the existence of an entity in an event sourced system?

I am new to Event Sourcing and I am considering using it for an industrial application to track events happening in a production facility.
Since the book of record is the production facility itself and not the system, and also because not everything is automated, workers will need to report at a given point in time (the recorded time) what they did at another point in time (the effective time). Therefore, I will be using events such as: TankFilledRecorded, TankOutputConnectedToPipeInputRecorded, ContainerMovedToFacilityAreaRecorded, etc. where these events refer to entities such as a tank, a pipe, or a facility area for example. These events will have both a recorded time and an effective time. Note that there is no submission or approval process for a record to be considered legit.
Domain-driven design (DDD) encourages to design events that are representative of what happens in the domain (like the ones above).
However, in my domain, I don’t care so much about how a tank, a pipe or a facility area came to existence. I just need to know that something exists from a particular point in time, and I also need to know if it is not there after a particular point in time. The main objective of the software is to track liquids and powders flowing in a circuit made of these pipes, tanks and other components. It is not an asset management system and should not become one.
Therefore, what would be the correct DDD way to design an event that represents the fact that there is a tank, a pipe or an area in the production facility?
It is a subtle question but language is important, particularly in DDD.
Here is what I came up with:
1 EntityExistenceAcknowledgmentRecorded
TankExistenceAcknowledgmentRecorded
PipeExistenceAcknowledgmentRecorded
FacilityAreaExistenceAcknowledgmentRecorded
TankDisappearanceAcknowledgmentRecorded
PipeDisappearanceAcknowledgmentRecorded
FacilityAreaDisappearanceAcknowledgmentRecorded
It seems awful to use this in the ubiquitous language. I don’t see myself talking in these terms or providing a UI with such vocabulary. But it does represent exactly what happens though.
2 EntityRegistered
TankRegistered
PipeRegistered
FacilityAreaRegistered
TankUnregistered
PipeUnregistered
FacilityAreaUnregistered
It seems much simpler and it also seems to be meaningful except for one thing. “Registered” conveys the existence of the representation of an entity in the system with immediate effect, without the possibility of saying now that the entity existed 2 days ago. Think about a UserRegistered event in a website that would indicate that the user “existed” from 10 days ago. What would that even mean?
Events are facts and you cannot change the past. However, I do need a way for my users to invalidate a record in which they made a mistake such as a typo. They can record now that they acknowledged the existence of a facility area a week ago and might realize later than there was something wrong, such as a typo in the name of the entity. They would invalidate the record and create a new one. But, invalidate something that has been “registered” does not sound right.
3 Keep looking
Try to dig more in the domain (event storming) and find the real events that brought the entities into existence even if these events are of no use in the problem that needs to be solved.
TankBuiltRecorded
PipeBuiltRecorded, PipeDeliveredRecorded
FacilityArea<something_meaningful>Recorded
TankDestroyedRecorded, TankDecommissionedRecorded
PipeDecommissionedRecorded
FacilityArea<something_meaningful>Recorded

A caution
TankFilled
TankFilledReported
TankFilledReportSubmitted
TankFilledReportSubmissionReceived
Think carefully about whether the increased precision is motivated by business value.
Therefore, what would be the correct DDD way to design an event that represents the fact that there is a tank, a pipe or an area in the production facility?
What is the business doing today? Is there already a process in place for tracking the lifetime of the hardware in the plant (a maintenance log, perhaps?) There's likely to be vocabulary in that place that gives you ideas as to what spellings would make sense in the code.
Events are facts and you cannot change the past.
That's true - but you can back date events. The effective date of the information is often distinct from the reported date of information.
I do need a way for my users to invalidate a record in which they made a mistake such as a typo.
Yes - error correction is an important part of the process that you are modeling.
You should probably review Greg Young's talk Answering a Question, which was based on this thread. It's a discussion of capturing and modeling of temporality.
Here's the good news: you are running into the right problem. Because you are capturing information about an external system, there are going to be opportunities for errors and conflicts, and you need to (a) figure out the protocols for addressing them, and then (b) model that process correctly. That might include exception reports generated by the system when it observes conflicting information, or compensating events, or even automated conflict resolution (for the easy cases -- see also Stop Over Engineering).

design a search function for high scale [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
(not sure if this is the right forum for this question)
I am very curious about how search in major site, say youtube/quora/stackexcahnge, works?
And I'm NOT looking for an answer like 'They Use Lucene Search engine'. I want to understand exactly how the indexing works there.
Is there a different Index for text search than the autocomplete feature?
Is it done in the background like map reduce.
How exactly does map reduce help deliver results? (I know that it counts words in each document but what happens after that when I search for a keyword?)
I also heard that google stopped using map reduce and now using cloud dataFlow here - how does that work?
Help Please :-)

I voted to close, because I think your question is too broad. Each bullet could form the basis of an SO question. That stated, I'll take a crack at answer how SolrCloud attempts to solve each of the problems you are asking about:
Is there a different Index for text search than the autocomplete feature?
The short answer is "yes". Solr has several options for implementing an autocomplete feature and all of them rely on either building a separate index or being supplied a separate dictionary. You can also roll your own in an even more sophisticated fashion as the blog post "Super flexible AutoComplete with Solr" demonstrates.
Is it done in the background like map reduce?
Generally speaking no. SolrCloud is based on the idea of shards with leaders and replicas. A shard being a subset of your overall index. With a shard being comprised of a leader and possibly one or more replicas.
Queries are executed against all shard leaders. With assigning a particular shard to serve as the aggregator of each shard's response, but unlike map reduce where the individual node responses have all the data the reducing node needs, the aggregating Solr shard may make multiple requests back to the other shards to figure out sort order - for example.
How exactly does map reduce help deliver results? (I know that it counts words in each document but what happens after that when I search for a keyword?)
See my response to your previous question. In short the query is executed against each shard, aggregated by one of those shards, and returned to the requestor. What Solr does - Lucene really - that's the useful magic part that people most often associate with it is Term Frequency Inverse Document Frequency indexing usually with stemming on text searches. While this is not exactly what happens under the hood, and you can vary what's actually done via configuration, it provides a fairly good idea of what's being done.
Other searching, on dates and numbers, or simple textual values is done in a fashion similar to database indexing. That is a simplification, if you want to understand it more fully read the JavaDoc on NumericRangeQuery for an in-depth explanation.
I also heard that google stopped using map reduce and now using cloud dataFlow here - how does that work?
If I knew the answer to that I would probably be working for Google and not answering StackOverflow questions :). Seriously whatever they've built is new PhD level work that as far as I know they haven't even release a research paper on, which is what they did with map reduce that led to Yahoo building Hadoop.

Should I let non-members add comment to a post? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
As long as it's SQL injection proof, would it be alright for me to let non-members add comments to a post and give the Author the ability to delete them?

Before you do it, consider the following questions
(and any other questions specific to your project that may spring to mind)
Do you have a good rate-limiting scheme set up so a user can't just fill your hard drive with randomly-generated comments?
Do you have a system in place to automatically ban users / IP addresses who seem to be abusive? Do you have a limit on the number / number of kilobytes of comments loaded per page (so someone can't fill a page with comments, making the page take forever to load / making it easy to DoS you by making a lot of requests for that page)?
Is it possible to fold comments out of sight on the webpage so users can easily hide spammy comments they'd rather not see?
Is it possible for legitimate users to report spammy comments?
These are all issues that apply to full members, of course. But it also matters for anonymous users, and since anonymous posting is low-hanging fruit, a botmaster would be more likely to target that. The main thing is simply to consider "If I were a skilled programmer who hated this website, or wanted to make money from advertising on it, and I have a small botnet, what is the worst thing I could do to this website using anonymous comments given the resources I have?" And that's a tough question, which depends a great deal on what other stuff you have in place.
If you do it, here are a few pointers:
HTML-escape the comments when you fetch them from the database before you display them, otherwise you're open to XSS.
Make sure you never run any eval-like function on the input the user gives you (this includes printf; to do something like that you'd want to stick with printf("%s", userStr);, so printf doesn't directly try to interpret userStr. If you care about why that's an issue, google for Aleph One's seminal paper on stack smashing),
Never rely on the size of the input to fall within a specific range (even if you check this in Javascript; in fact, especially if you try to ensure this in Javascript) and
Never trust anything about the content will be true (make no assumptions about character encoding, for example. Remember, a malicious user won't need to use a browser; they can craft their calls however they want).
Default to paranoia If someone posts 20 comments in a minute, ban them from commenting for a while. If they keep doing that, ban their IP. If they're a real person, and they care, they'll ask you to undo it. Plus, if they're a real person, and they have a history of posting 20 comments a minute, chances are pretty good those comments would be improved by some time under the banhammer; no one's that witty.

Typically this kind of question depends on the type of community, as well as the control you give your authors. Definitely implement safety and a verification system (eg CAPTCHA), but this is something you'll have to gauge over time more often than not. If users are being well-behaved, then it's fine. If they start spamming every post they get their hands on, then it's probably time a feature like that should just go away.

Building a code asset library [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have been thinking about setting up some sort of library for all our internally developed software at my organisation. I would like collect any ideas the good SO folk may have on this topic.
I figure, what is the point in instilling into developers the benefits of writing reusable code, if on the next project the first thing developers do is file -> new due to a lack of knowledge of what code is already out there to be reused.
As an added benefit, I think that just by having a library like this would encourage developers to think more in terms of reusability when writing code
I would like to keep this library as simple as possible, perhaps my only two requirements being:
Search facility
Usable for many types of components: assemblies, web services, etc
I see the basic information required on each asset/component to be:
Name & version
Description / purpose
Dependencies
Would you record any more information?
What would be the best platform for this i.e., wiki, forum, etc?
What would make a software library like this successful vs unsuccessful?
All ideas are greatly appreciated.
Thanks
Edit:
Found these similar questions after posting:
How do you ensure code is reused correctly?
How do you foster the use of shared components in your organization?

Sounds like there is no central repository of code available at your organization. Depending on what you do this could be because of compatmentalization of the knowledge due to security restrictions, the fact that external vendor code is included in some/all of the solutions, or your company has not yet seen the benefits of getting people to reuse, refactor, and evangelize the benefits of such a repository.
The common attributes of solutions I have seen work at mutiple corporations are a multi pronged approach.
Buy in at some level from the management. Usually it's a CTO/CIO that the idea resonates with and they claim it's a good thing and don't give any money to fund it but they won't sand in your way if they are aware that someone is going to champion the idea before they start soliciting code and consolidating it somewhere.
Some list of projects and the collateral available in english. Seen this on wikis, on sharepoint lists, in text files within a source repository. All of them share the common attribute of some sort of front end search server that allows full text over the description of a solution.
Some common share or repository for the binaries and / or code. Oftentimes a large org has different authentication/authorization methods for many different environments and it might not be practical (or possible logistically) to share a single soure repository - don't get hung up on that aspect - just try to get it to the point that there is a well known share/directory/repository that works for your org.
Always make sure there is someone listed as a contact - no one ever takes code and runs it in production without at lest talking to the previous owner of it - and if you don't have a person they can start asking questions of right away then they might just go ahead and hit file->new.
Unsuccessful attributes I've seen?
N submissions per engineer per time period = lots of crap starts making it's way in
No method of rating / feedback. If there is no means to favorite/rate/give some indicator that allows the cream to rise to the top you don't go back to search it often because you weren't able to benefit from everyone else's slogging through the code that wasn't really very good.
Lack of feedback/email link that contacts the author with questions directly into their email.
lack of ability to categorize organically. Every time there is some super rigid hierarchy or category list that was predetermined everything ends up in "other". If you use tags or similar you can avoid it.
Requirement of some design document to accompany it that is of a rigid format the code isn't accepted - no one can ever agree on the "centralized" format of a design doc and no one ever submits when this is required.
Just my thinking.

Agile - User Story Definitions [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm writing a small app for my friend's business, and thought I'd take the opportunity to brush up on some Agile Project Management training I did at the start of the year.
I (and I think, my current organisation!) have always struggled with gathering requirements in the form of User Stories, which take the form:
As a [User Type] I want [feature] so that [some benefit]
I'm always tempted to miss out the beginning and end, and just leave the feature - but this then just becomes requirements gathering the old way!
But I don't want to just make it fit, so that I can say 'I'm doing Agile'.... for example, if I know that the user is to be presented with a list of items, then the reason is self-evident, is it not?
e.g.
As a [Store Manager] I want [to see a list of Stock Items] so that ... ?
Is it normal practice to leave out the [so that] clause?

We used to miss it out as well. And by leaving it out we missed a lot.
To understand the feature properly and not just do the thing right but DO THE RIGHT THING it is key to know WHY the feature, and for that the next key is WHO (the role)
In DDD terms, stakeholder. Stakeholders can be different, everyone who cares. From programmers and db admins to all the types of users.
So, first understand, who is the stakeholder, then you know 50% of WHY he cares, then the benefit, and then it is already almost obviously WHAT to implement.
Try to not just write "as a user". Specify. "as store manager", or even "as the lead of the shift responsible for closing the day", i need....so that....
Maybe you can implement something different which will give the same stakeholder even better benefit!!!

Try, To Achieve [Business Value] As [User] I need [Feature].
The goal is to focus on the value the feature delivers. It helps you think in vertical slices, which reduces pure "technical tasks" that aren't visible. It's not an easy transition, but when you start thinking vertically you start really being able to reduce the waste in your process.
Another way is to thinking of the acceptance tests that your customer could write to ensure the feature would work. It's a short jump to then using something like FitNesse to automated those tests.

No, it's actually not obvious - there are a lot of reasons to want to see a list, a lot of things you might want to with it - scan it for some info, get an overview, print it, copy and paste it into a word document etc. And what exactly it is will give you valuable hints on reasonable implementation details - formatting of the list, exact content; or even a hint that a different feature might be a better idea to satisfy that need. Don't be surprised to find out that the reason actually is "so that I can count the number of entries"...
Of course, this might in fact not apply to you. My actual point in fact is that there are reasons that people came up with this template - and there are also reasons that a lot of experienced people don't actually use it. And when you are new to the practice, you are not in a good position to assess all the pros and cons of following a practice, so I'd highly recommend to simply try to follow it closely for some time. You might be surprised by the usefulness of it - or not, in which case you still learned something and can drop it with a clear concise... :)

User Stories is another way of saying you need to interview your users to find out what they want and what problems they are trying to solve. That the heart of having this in agile development. If the form is not working for your then take a step back and try a different approach that feels more natural to you or better suited to your capabilities as a writer.
In short don't feel like you have to be in a straight jacket. The important thing is that you follow the spirit of the methodology.
In this specific case you want to get a list of what problems the user has, why they are problems, and what they think will help them.

I think you should really try to get a reason defined, even if it may seem obvious. If you can't come up with a reason then why build the feature in the first place? Also the reason may point out other deficiencies in the design that could trigger improvements in other areas.

I often categorize my stories by the user/persona that it primarily relates to, thus I don't put the user's identity in the story title. My stories also are bigger than some agile methodologies suggest. Usually, I start with a title. I use it for planning purposes. Once I get close to actually working on that story, I flesh it out with some details -- basic idea, constraints, assumptions, related stories -- so that I capture more of the information that I know about it. I also keep my stories in a wiki, not on note cards. I understand the trade-off -- i.e., I may spend too much time on details before I need them, but I am able to capture and share it with, typically, off-site customers easily.
The bottom line for me is that Agile is a philosophy, rather than a specification. There are particular implementations that may (strongly) suggest that you do things a certain way and may be non-negotiable on some items. For example, it's hard to say you're doing XP if you don't pair program. In general, though, I would say that most agilists would say that you ought to do those things that work for you, in the way that they work for you -- as long as they are consistent with the general principles, you can still call yourself agile. The general principles would include things like release early/release often, unit testing, short iterations, acknowledge that change will happen, delay detailed planning until you are ready to implement, ...
Bottom line for me: if the stories work for you without the user and rationale -- as long as you understand who the user is and why they want something -- do it however you want. Just don't require a complete specification before you start implementing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string