Akka/ZeroMQ Messaging Patterns by Example - multithreading

I'm interested in trying to see how I might leverage the Akka/ZeroMQ module in my project.
In that document, 4 so-called "messaging patterns" are identified but only 1 (Pub-Sub) are explained in detail. They are:
Pub-Sub
Router-Dealer
Push-Pull
Rep-Req
To me (a messaging greenhorn), I don't understand how there could be anything more than Pub-Sub: you have a message, you publish it to a broker, and another process (subscriber) consumes it from the broker.
So my specific question is: what are some concrete use cases for each message ZeroMQ pattern, and why would I ever want to utilize each pattern if Akka already has a mechanism for communicating between threads?
I ask this because the documentation linked above simply states "More documentation and examples will follow soon." for all patterns except Pub-Sub.

Before going into more details right for your question, kindly check another Answer almost identical to your one >>> https://stackoverflow.com/a/25742744/3666197
Q: What are some concrete use cases for each message ZeroMQ pattern
A: Best proceed with the book, you will find many indispensable comments and remarks there
Q: .. don't understand how there could be anything more than Pub-Sub
A: Oh yes, there is a complete new Universe behind that. ZeroMQ is broker-less, zero-copy, incredibly fast to touch just a few ( read below )
Q: why would I ever want to utilize each pattern if Akka already has a mechanism for communicating between threads?
A: Well, it depends. If you are happy with message passing performance for just a few localhost threads ( not much above a few tens ), no need to invest your time into ZeroMQ. If going for high perormance, distributed, (almost) linear scaleability and heterogenous portability, well, then there might be the right time to start reading into ZMQ.
Several links to a few must-read-s
worth for shaping one's mind before moving into details from the ZeroMQ evangelists Pieter Hintjens & Martin Sústrik
An initial view on PUB/SUB from http://250bpm.com/blog:39 ( check and do not miss Martin's cool notes on unit-testing & other gems in his collection )
A very indepth must-have & must-read is a book ( available as pdf ) "Code Connected, Volume 1" If going seriously in for messaging, this is a basis to work with.
A collection of good whitepapers is on http://zeromq.org/area:whitepapers

Related

Approach to find duplicate - Kafka and queue

Question asked in interview ---Suppose there are two kafaka topic or lets say queue - Q1 & Q2
Both are having some messages suppose 10 messages each.
The condition here is if both the queue are having same messages exctaly same in both queue it's fine but if there is even one odd or non. matching message we need to error out or notify.
the approach i suggested for this problem.
1- Using hashset We can find.. we will add first queue message in add set and while adding other add method will notify us if message is not already there.
2- we can use the Hashmap and store it as key value form..while adding it i will check if the key -message is already there.
but he was not satisfied he did not share the right answer or problem. with above approach.
Let me know if better solution exist and the problem with this approach
He may have been aiming to get into a discussion about the difficulties of balancing in a real-time streaming situation. Let's say there is a continuous stream of messages going through the two topics how do you know if things are balanced?
There is no single answer, it depends on the situation but generally has to consider some kind of time window.
My guess is that the interviewer's dissatisfaction (if any) would be because he was looking to talk about options rather than going to one specific solution to a specific situation.
We can't know what he was thinking without asking (which I would always recommend) but when I'm interviewing I always look for candidates who can consider and discuss problems and trade-offs, not necessarily ones who have the 'right' solution.

How to send multiple timeframe data from MQL4 to a Node.js?

I am trying to get multiple time-frame data of different trading instrument ( _Symbol ) from MetaTrader4 Terminal to a node.
How can I do it?
Can we do it from the same EA inside a MetaTrader4 Terminal?
A.1: Yes, we can.
A.2: No, that initial idea is not a good one.
While the intention is clear, the idea to use a single EA to send live-data for multiple trading instruments is not working for the said interest well.
MQL4 code-execution environment has some fixed, hard-wired internal logic and due to these + plus due to the reality, how Capital Markets and Broker-type Market access mediators work, the solo-EA will never fit these requirements.
A simple call to
iOpen( aTradingInstrumentSymbolNAME, // iHigh, iLow, iClose, iVolume, iTime
aSelectedTimeFrameDefinedCODE,
aRelativeBarPTR
)
is by far not enough.
Professional solution will require a lot of care for a real-time handling capabilities, for unmasking the actual flow of mutually hiding events, for achieving minimalistic processing latencies, so a quite high engineering expertise will be needed.
Start with learning the basics about Scripts, benchmark all your critical code-sections with recording their actual durations in [us] and assure, your code will remain non-blocking under all circumstances. This will decide, whether more than one code-execution thread(s) will be necessary in prime-time / for peak-hour.
Having managed that, your way just started to lead in a direction towards your expected result.
Next one has to decide about a feasible inter-process / distributed-computing data-flow and signalling, needed for inter-platform integration.
Last, but not least, important point is the legal-side of such undertaking. It depends both on your local juri§$§$§diction and Broker's Terms & Conditions as no one would enjoy to celebrate a technically well mastered Project from inside of jail.
All that, quite an interesting Project.
iOpen(Symbol(),PERIOD_M1,1) - is the way to get data from M1 ( last bar ), if you need another timeframe - replace PERIOD_M1 with another ENUM_TIMEFRAMES. So what is the problem? Usually StackOverflow requires to see your MCVE-based example to help you.

Giving up Agile, Switching to waterfall - Is this right? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am working in an Agile environment and things have gone to the state where the client feels that they would prefer Waterfall due to the failures (that's what they think) of the current Agile scenario. The reason that made them think like this would be the immense amount of design level changes that happened during the end stages of the sprints which we (developers) could not complete within the time they specified.
As usual, we both were blaming each other. From our perspective, the changes said at the end were too many and design/code alterations were too much. Whereas from the client's perspective, they complain that we (developers) are not understanding the requirements fully and coming up with solutions that were 'not' what they intended in the requirement. (like they have asked us to draw a tiger, and we drew a cat).
So, the client felt (not us) that Agile process is not correct and they want to switch to a Waterfall mode which IMHO would be disastrous. The simple reason being their satisfaction levels in a Agile mode itself were not enough, then how are they going to tolerate the output after spending so much time during the design phase of a Waterfall development?
Please give your suggestions.
First off - ask yourself are you really doing Agile? If you are then you should have already delivered a large portion of usable functionality to the client which satisfied their requirements in the earlier sprints. In theory, the "damage" should be limited to the final sprint where you discovered you needed large design changes. That being the case you should have proven your ability to deliver and now need a dialogue with the client to plan the changes now required.
However given your description I suspect you have fallen into the trap of just developing on a two week cycle without actually delivering into production each time and have a fixed end date in mind for the first proper release. If this is the case then you're really doing iterative waterfall without the requirements analysis/design up front - a bad place to be usually.
Full waterfall is not necessarily the answer (there's enough evidence to show what the problems are with it), but some amount of upfront planning and design is generally far preferable in practice to the "pure" Agile ethos of emergent architecture (which fits with a Lean approach actually). Big projects simply cannot hope to achieve a sensible stable architectural foundation if they just start hacking at code and hope it'll all come good some number of sprints down the line.
In addition to the above another common problem with "pure" Agile is client expectation management. Agile is sold as this wonderful thing that means the client can defer decisions, change their mind and add new requirements as they see fit. HOWEVER that doesn't mean the end date / budget / effort required remains fixed, but people always seem to miss that part.
The agile development methodologies are particularly appropriate when you have unclear requirements and when you may need to make design changes at later stages in your project. Waterfall is a less appropriate approach in this case. The waterfall approach is appropriate for projects which are well understood and when the requirements are unlikely to change during the project's lifetime. It doesn't sound like that is the case here.
How long are your sprints? An alternative approach might be to decrease the sprint length - at least at the start of the project. Deliver new versions to the customer more often and discuss the changes with the customer. If you aren't doing what they want this will become apparent more quickly so less time will be wasted on implementing solutions that don't meet the customer's requirements.
I'm not sure what kind of shop you run, so it's hard for me to come up with good recommendations. I can offer two guiding principles though:
If you have bad communication with the customer, no development methodology will save you.
It's none of the diner's business how a chef organizes the kitchen, as long as the meal is tasty.
It sounds like you have serious project management and architecture/design issues, and it sounds like your communications have also broken down. Fundamentally I don't think changing your dev methodology is going to fix any of that, and is therefore the wrong thing to be doing (though it may restore some client confidence).
I would be especially concerned about moving towards waterfall since you are now choosing to essentially capture the requirements just once (which we know you have a problem with) with no capacity for input. That rigidity is good for inflexible delivery targets, but it's completely inappropriate here where you have changes all the time - that's agile!
Short term I'd step back and double check your requirements at this stage with them. Renegotiate and confirm your current state in relation to those.
Medium term, I'd open up more communications with the client - try and get them involved in a daily scrum for a while (until you restore confidence, then you can be more flexible).
Long term, you have to be worried about how your PM's and senior devs have managed to get you into this position. If the client is being unreasoanable that's one thing (but it's still up to the PM to manage that, so you're not absolved). It's not reasonable to complain about having too many changes, that just means you screwed up in determining requirements (which is a dialogue, not a monologue) or that you have to have more numerous, but probably shorter sprints.
Above all, I can't see moving towards waterfall is possibly correct. It doesn't fix anything directly and I can only see it exacerbating the problems you've already highlighted.
Caveat: I'm not really capable of a balanced view on waterfall since I've never seen it work effectively and imho it's just completely outdated for enterprise projects.
Agile development does not save you from the burden of actually coming up with a design which both you and the customer understand similarily. Agile just makes it possible to come up with the design in smaller increments and not all at once. And, in the case of a difficult customer, coming up with a proper design takes time.
So, I would spend more effort in sitting down with the customer, with a whiteboard, going over what is it that they actually want. I don't think it really matters in this case if the development process is agile or waterfall.
Agile or waterfall are just words. There are only things that work, and things that don't.
Software development seems virtual to many people and they don't understand why it's hard to change a small thing they request.
Your customers should understand that building a software is just like building a house : when you have built all the foundations and walls, it's hard to change all the house final plan, and room design.
Some practices helps avoid this kind of problem : data modeling, data dictionary, data flow diagrams... the goal being to know every requirement in complete detail. Cutting your product in many independant blocks help starting coding while continuing designing or specifying other parts of your final product.
See Steve McConnell book : "Rapid Software Development : taming wild software schedule" for all the practices that work.
The reason that made them think like this would be the immense amount of design level changes that happened during the end stages of the sprints which we (developers) could not complete within the time they specified.
Scrum is in a way a "short waterfall", and you should be isolated from changing requirements for the sprint duration. It seems that this is not happening! Therefore, don't see you will gain anything from switching to traditional waterfall, but you should stick to freezing requirements for the sprint duration.
Maybe your iterations are too long?
(I assume you follow Scrum, since you mention sprints).
Talk to your clients and agree the following:
- Shorter iterations, up to 3 weeks max.
- No changes in requirements during the iteration.
- Features are planned at the beginning of the iteration
- Every iteration ends with deliverable: fully functional software with all features that are fully operational
- Iteration length does not change. Unfinished features are left for the next iteration (or maybe discarded if client changes his mind).
- Number of "feature points" you can deliver in a single iteration should be based on the team metric, not client insistence. This is your "capacity".
- Client decides what features (but not how many of them) are planned for the iteration
Another thing you should ask yourself is why there are so many "design level changes" in your application. By now, you should have basic architecture and design in place. Maybe you should review the actual design and try to impose some design guidelines and implement some patterns. For example, in a typical enterprise web app, you will probably end up using something like DAO. When you add new features, you create new DAO, but basic architecture and design will not change.
It seems however, that you are not delivering what the client wants. In that case, it is of outermost importance to deliver working product to the client, so he could provide sensible feedback for the next iteration.
Regarding
"we (developers) could not complete
within the time they specified."
The client should not be the one to specify the iteration time-frame. Iteration length should be always the same. The requirements that enter into the iteration should be obtain as a result of client prioritization, but the amount of requirements that is planned for the iteration should be based on the estimation that team performs and number of "points" you are able to deliver during iteration.
For me it sounds as if there was no "Big Plan[TM]" in the agile project. Using an agile process does not mean that there is no long term plan, it is more about to deal with the increasing uncertainty in the farer future. For example there should be a release plan with the planned features for all releases in the next 2 months (and a lesser detailed plan with features for the releases after that), so it is clear to the customer when to expect a feature, and when there is a possibility change requirements.
Also to me it seems that there was not (enough) customer involvement in the process. I know that this is a very problematic point, but it helps a lot if the current progress can be discussed with the customer at the end of each iteration. As #Mark Byers already wrote, the more feedback you can get from your customer the better you are.
Also try to not assign blame, as this keeps people to block. Try to use the inspect-and-adopt approach to get a better process instead.
It's not clear what sort of design changes you mean. Graphical design? User experience design? Code design?
In any event, the best solution is more, and earlier, discussions with the client. Jointly develop explicit, concrete examples that satisfy the client's requirements. You can turn these examples into regression tests to ensure that you continue to satisfy them.
Also, continue the discussions as you progress. Show your output as it is available--don't wait until near the end of the sprint. And work on the part most likely to generate problems first. Also look at ways to make it easier to change the things you're finding often change.
The point is to get the client more involved, even to the iteration of a design. Perhaps you'll want to have some discussions focused only on the design.
Your client does not know about how to develop software, or how to manage the software development process. Don't expect the client to provide meaningful instruction on these matters. As a special case, the client does not really know what terms such as 'waterfall' and 'agile' mean; don't expect them to provide meaningful input on your development methodology. Moreover, the client will not really care about these details, as long as the requirements are met within the agreed budget and timeframe. Don't expect them to care, and don't confuse them with lots of inadequate builds and irrelevant information on your internal process.
Here is what the client does care about, and is trying to talk to you about (partly using your own technical jargon): their requirements, their disappointed expectations, and the way you communicate with them. On these matters, the client is the absolute authority. Interpret what they are saying as being about your relationship and the product, not as usable commentary on internal process. Don't cloud the water with your internal deadlines and processes, discuss progress and expectations and the relationship. (If they insist on talking about internals you can remap the terms: e.g. what they understand as being 'the next release' may be internally known as 'the next major release', or whatever).
It sounds to me like the client may want a higher threshold before they get asked for feedback or play with a bad build. It's worth verifying if this is true. If so, you should honor that - and still use agile methods internally if that is what your team feels is best. If they say "waterfall," you may be able to interpret that internally as meaning "we set a deadline for requirements, and then we don't allow more features to be added for a while." Discuss with the client whether it will suit them to have a requirements deadline followed by this sort of freeze.
Someone on your team needs to be the client advocate, and sit on top of the client's issues and fight for them. This advocate must not be sidelined, nor can they take the team's side against the client; they should be the proxy-boss. Then you can separate the internal process communication (team to advocate) from the external communication (advocate to client). The advocate can in some measure insulate the client from the chatter and the builds they don't appreciate, without artificially imposing a certain sort of management or scheduling on your internal process.
To clarify, I do not at all think that you should be secretive or distant with the client, but you should (A) listen to what the client is saying about the relationship and how you are communicating and honor that, (B) keep that separate from internal development process, which should be managed in whatever way will ultimately meet client's expectations.
Fire the client. Even if it is your fault for not understanding what they mean, waterfall would give them 1 chance to give you feedback instead of a chance at the end of each sprint. Some people/clients are literally so stupid that they are not worth working for. Fire them, or tell them that you're using Waterfall without actually switching.
Obvious problem here is communication with customer. If you really want to do agile you have to communicate with customer on daily basics. Only customer should be able to make decision. If you communicate with customer only during mid spring and at the end of the sprint it is natural that later on you will found problems in your application. Also features implemented in sprint has to be accepted and tested by customer. Until that features are not completed.
I'm writing this because I have similar problem on my current project but I know where we failed.
If the communication issue between the Team and the Customer is not fixed, the situation could be worse with waterfall, if the customer only sees the product once it is complete (tunnel effect).
You commented changes from sprints 6-7 started to cause rework of tasks achieved in earlier sprints. Those changes should have been detected earlier - during the Sprint Review.
If there is a misunderstanding in a feature description, and the Team does not implement what the customer is expecting, this should be detected no later than the Sprint where the feature is implemented, and ideally fixed in the current Sprint.
If the customer changed it's mind, the new ideas shall be added to the Product Backlog, prioritized and selected for a Sprint, as any other backlog item. This should not been deemed as rework.
Do you deliver the software to the customer after each sprint, or are you just demoing it ?
The origin of the miscommunication could be at the Sprint Planning: the Team should only commit on Backlog Item that are clearly defined. The definition of the items should comprises the acceptance criteria. Is the customer the Product Owner, and is it the Product Owner ?
Remote debugging of a development process is sufficiently difficult that I would hesitate to offer any opinion about what you should do. It seems to me noone outside your team can plausibly have enough information to make a very useful judgement about that.
A lesser jump to a conclusion would be to make a guess as to what went wrong. From your description, it sounds like early deliverables, which you thought were progress in the bank, ended up being majorly reworked.
One common cause of that is the late discovery/creation of 'all' requirements, things that are supposed to be true about everything in the scope of the project. These can be pretty fatal if taken seriously: something as simple as 'all dialog boxes must be resizable' is, for example, apparently beyond the capability of Microsoft to retrofit to Windows.
A classic account of this kind of failure (albeit in a non-agile project) can be found here
"Once they saw the product of the code we wrote, then they would say, 'Oh, we've got to change this. That isn't what I meant,'" said SAIC's Reynolds. "And that's when we started logging change request after change request after change request."
For example, according to SAIC engineers, after the eight teams had completed about 25 percent of the VCF, the FBI wanted a "page crumb" capability added to all the screens. Also known as "bread crumbs," a name inspired by the Hansel and Gretel fairy tale, this navigation device gives users a list of URLs identifying the path taken through the VCF to arrive at the current screen. This new capability not only added more complexity, the SAIC engineers said, but delayed development because completed threads had to be retrofitted with the new feature.
The key phrase there is 'all the screens'. In the face of changes of that nature, then, unless you have some pre-existing tool support you can just switch on (changing all background colours really should be trivial), you are in trouble. The progress you think you had made up to that point will have retroactively turned out to be illusory.
The only known approach to such issues is to get them right first time. If that fails, live with having them wrong.
A lot of shops add Agile trimmings to make themselves "look Agile" to customers who expect it. Maybe you just need to add some Waterfall trimmings, and show them the product once every 2 sprints.
I believe your client is wrong to move to waterfall. It's curing the symptom, not the disease.
The problem you describe is one of communication - the client wants a tiger, you're giving them a cat.
The waterfall model includes many steps to verify that the requirements as written are being delivered - but it doesn't ensure that the written requirements are what the business meant.
I would look at techniques like impact mapping, behaviour-driven development (BDD) and story mapping to improve communication.

How to deal with clients and iterations in Agile team? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
This thread is a follow up to my previous one. It's in fact 2 questions, so I hope no one minds, as they are dependent on each other.
We are starting a new project at work and we consider it as a great opportunity to try Agile techniques in action. We had a brainstorming about ideas we read in several books and articles, and came up with concept that would suit us the best: 2 weeks iteration, followed by call with clients who would choose what stuff they want to have in next iteration. I just have few more questions, which we couldn't figure out ourselves.
What to do in the first iteration?
What to, generally, do in the first few iterations if we start from the scratch? Just give it a month of development to code core of the application or start with simple wire-frames with limited pre-coded functionality? What usually clients want to see? Shiny stuff that doesn't work or ugly stuff that does work?
How to communicate with clients?
Our initial thought it to set the process to something like this:
alt text http://img690.imageshack.us/img690/2553/communication.png
Is it a good idea to have a Focal Point on client side or is it better to communicate straight with all the clients to prevent miscommunication?
Any thoughts are welcome! Thanks in advance.
In my opinion, a key success factor for agile development is to focus on delivering value for the customer in each iteration. I would definitely pick "ugly stuff that does work" over "shiny stuff that doesn't work". Doing shiny UIs and trying to get the client to understand hat business logic takes a lot of time to implement is always risky which Joel Spolsky has written a good article about.
If the client wants enhancements to the UI, they can always put that as a requirement for the next iteration.
Regarding communication with clients I think that your scetch should be slightly adjusted. Talking in scrum terms your "focal point" is called "product owner". Having one person coordinating with the clients is good, as it can take quite a lot of time to get the different stakeholders agree on the needs. However the product owner (or focal point) should be in direct contact with the developer, without going through the project manager. In fact, the product owner and the project manager has quite distinct roles that gain a lot by being split on two people.
The product owner is the stakeholders' voice to the development team. The project manager on the other hand is responsible for the wellbeing of the project team and often keeps track of budget etc. These roles sometimes has opposing agendas, and having them split on two people gives a healthy opportunity for negotiation between conflicting interests. If one person has both roles, that person often tend to favour one of them, automatically reducing the other one. You don't want to work on a team where the project manager always puts the client before the team's needs. On the other hand no customer wants a product owner that always puts the team's needs first, neglegting the customer. Splitting the responsibilities on two people helps to remedy that situation.
I'd agree with Anders answer. My one extra observation is that many clients find it impossible to ignoire the Ugly. They get concerned about presentation rather than function. Hence you may need to bite the bullet and do at least one "Nice" screen to show that you will pay attention to presentation details.
What to, generally, do in the first few iterations if we start from the scratch?
Many teams use an Iteration Zero to:
setup the development infrastructure (source control, development machines, the automated build, a continuous integration process, a testing environment, etc),
educated the customer and agree with him on the methodology,
create an initial list of features, identify the most important and do an initial estimation,
define time of meetings (planning meeting, demo, retrospective), choose the the iteration length.
Iteration Zero is very special because it doesn't deliver any functionality to the customer but focus on what is necessary to run the next iterations in an agile way. But subsequent iterations should start to deliver value to the customer.
Just give it a month of development to code core of the application or start with simple wire-frames with limited pre-coded functionality?
No, don't develop the core of your application during one month. Instead, start delivering vertical slice of the application (from the UI to the database) immediately, not horizontal slices. This doesn't mean that a screen has to be complete (e.g. implement only one search field in a search screen) but it should ideally be representative of the final look & feel (unless you agreed with the customer on an intermediate step). The important part is to build things that provide immediate value to the customer incrementally.
What usually clients want to see? Shiny stuff that doesn't work or ugly stuff that does work?
To my experience, they want to see demonstrable progresses and you want to get feedback as soon as possible.
Is it a good idea to have a Focal Point on client side or is it better to communicate straight with all the clients to prevent miscommunication?
You need one person to represent the clients (who is called the Product Owner in Scrum):
he provides a single authoritative voice
he has a perfect knowledge of the business (i.e. he can answer questions)
he knows how to maximize the ROI (i.e. how to prioritize functionalities)
Agile generally wants to provide the client something valuable, quickly.
So I certainly would not spend "month of development to code core of the application". To me, that smells of the "big up front design" anti-pattern. Also, see YAGNI.
Get as much information from the clients about what they need soonest, and implement that in your first iteration. "Valuable" is in the eye of the client. Thet will know if they want to see slick UI (maybe they want to give a slide show about the product at a trade show, so functionality can be fake) or simple working features (maybe you're developing something that they need to start using ASAP). Business Value is what they say will help them do their job.
I'd make my iterations as short as I can (your 2 weeks could work, I suggest considering 1 week) If you absolutely can't have your dev team and your clients co-located, instead of having a call with the clients, I suggest a meeting. Demo what you've done over the previous iteration and solicit feedback about what should stay, what should change, and what should be added.
As others have said, your "Focal point" sounds like a Product Owner. What worries me about your drawing is if it is meant to imply that devs don't interact with the PO or the clients. One thing that makes Agile work is when there is lots of communication. Having communication to/from the dev team always filtered through the Project Manager is almost certainly bound to result in miscommunication, unnecessary work, and missed details.
I agree with the two answers given but I would just add one thing from personal experience. Are your customers bought in to the change towards quick iterations? As well as providing feedback after each iteration which is going to require the customer performing usability tests on each feature.
Now I don't know what your groups relationship is with your customer but its not unusual for customers to take a "Put request in - get working system out" attitude in that they are enthusiastic when giving requirements but not so forthoming with time when it comes to testing the feature.
Now this may be totally inappropriate to your situation but its always worth considering how your customer workflow will have to change as well as your groups.
Cheers

Eventual Consistency

I am in the early stages of design of an application that has to be highly available and scalable. I want to use an eventual consistency data model for this for a number of reasons. I know and understand why this is an unpopular architectural choice for many solutions, but it's important in my case.
I am looking for real-world advice, best-practices and gotchas to look out for when dealing with distributed / document-style databases. And particularly areas around e-commerce (shopping cart style) apps that traditionally are easier to put together with a relational db.
I understand using these types of DB is challenging, but hey, Google and E-bay use them so they can't be that hard ;-) Any advice would be appreciated.
If you want to have a Distributed System (that "Eventual Consistency" thing) you need people, build, maintain and to operate it.
I found that there are three classes of people which have very little problems with "Eventual Consistency":
People with a solid background in distributed systems. They have learned about Eventual Consistency Byzantine Failures and stuff like that. If you understand that Paxos is not about holidays, you are probably one of them.
People experienced in network programming. They might miss the theoretical background but have an intuitive understanding of asynchronity and the "no global clocks & counters" paradigm. If you own at least 8 books by Richard Stevens you are probably one of them.
Very experienced coders which had little exposure to RDBMS. Kernel guys, people from scientific computing and the gaming industry come to mind.
All in all this people are very sought after in the job market. For example 75% or so of the academics in distributed systems leave for institutions who run big, self-designed distributed systems, e.g. the stock exchanges.
The whole thing got somewhat simpler with offerings like Hardoop, SimpleDB and CouchDB but it is still a big challenge to build something on distributed systems technology.
On the other Hand RDBMS are a very fine pice of engineering. They are well understood and expertise on them is available the job market. There are a lot of decent tools, education opportunities and lots of highly skilled experts are available to be rented by the hour. So think twice of you can't get on with a RDBMS approach - perhaps coupled with some clever cheating. I usually point students to the Lifejournal architecture.
For Distributed Databases there is much less experience. That's exactly the reason you have found so little advice so far.
If you are determined to use "Eventual Consistency" I think besides immature tools the main challenge is the mindset of everyone involved. Are your API users (coders) and application users (your employees and your customers) are willing and able to accept the inconsistency? Can you hide it from certain classes of users? We are not used to that mindset that computers are inconsistent. Something is in stock or it isn't. "Maybe" isn't an answer users expect.
Also keep in mind that "eventual" can mean a very long time to algorithm designers. For how long can you accept inconsistency?
For a shopping cart application you might want to go truly distributed: Use the Clients Browser as data store. On checkout you can submit the cart to the server side batch processing system. This means for the catalog you need read only high availability (easier) and the cart submission is a very narrow interface with no need for transactions. Later on the processing of the order has no (Soft) real time requirements and thus is easier.
BTW: Last time I checked on E-Bay architecture they where big in RDBMS but it may have changed since then. (Edit: it did change - see comments)
The only solution to your problem is to decide which tradeoffs in the CAP theorem are right for you, then begin implementing it.
mdorseif has a great point. There are many configurations of to what extent you trade off consistency, availability, and partitioning. You have two main options.
Go the route of an in-house distributed system (takes lots of expertise and research)
Vet and experiment with a number of distributed databases to decide what can handle your requirements as scale.
This is probably an over-simplification. A real production-ready pipeline is an eco-system. It'll at least get you on the right track.
Appnexus is an ad platform that uses hbase for very high availability and eventual consistency. They talk a lot about this here.
An article on http://highscaleability.com outlines how the New York Times implemented RabbitMQ alongside Cassandra across a WAN for fault tolerance and high availability.
MongoDB provides a great deal of flexibility in balancing consistency with availability with their implementation of write concerns. They've got excellent documentation that highlights exactly how to implement it with all the gotchas (including partitioning). They implement the two-phase commit to maintain state across the network (on their config servers).
Google has a great paper on this subject, their photon project implements a highly scalable, highly reliable system with the paxos algoritm at the heart of it alongside a few other techniques. It also happens to be very consistent (with end-to-end latency of about 10s) and fault tolerant, standing up to regional failures.
All systems build on distributed computing models are build on CAP and BASE. Here the main concern is If our system provides Availability and Partition Tolerance we cannot have true consistency but we can have eventual consistency.
The idea behind eventual consistency is that each node is always available to serve requests. As a trade-off, data modifications are propagated in the background to other nodes. This means that at any time the system may be inconsistent, but the data is still largely accurate.
Source: http://www.techspritz.com/eventual-consistency-and-base-model/
How to achieve high availability and scalability using relational databases is well known and there is a vast body of knowledge out there on how to do this!
Google is a special case which does not apply to most sites, very very high volumes of queries, very very large amounts of data, and, most importantly no Service Level Agreements with most of its users. There is no correct answer to a Web search only better answers, for the average user Google is good enough, if Google misses a vital page from a search list you as a user cannot complain.
E-Bay is a rather different case, somehow they have persuaded there users and customers to accept poor service in exchange for theoretically lower prices -- good on them but this is not an option for every business.

Resources