How to create a event source system with nodejs and mongodb - node.js

I am quite new to event source concept but got glimpse that i need to divide my each task or operation to event and save those, but my basic do but is if for each request if you save each step as event then should not be the db size increase and large in short time also how to read it properly.
I just like to know if there any any blog or article or example that you guys could provide for reference and answer the above concerns.

I'm not sure if you are doing that for learning or for a real project, but in both cases I would suggest you to have a look at https://www.eventstore.com.
It is a db modeled to do event sourcing, also if not using it studying how it works will tell you what things you need to have in order to do it

Related

Mongodb change streams getting previous values?

Recently I learned about change streams in mongodb and how powerful they are, but I’m in need of getting the previous values before an update. However, I seem to have learned through some research that it’s impossible to get the previous values? So this got me thinking about what alternatives exist in order to retrieve those previous values?
What I want to achieve is a logging system such as, “Record A field has changed from {old_value} to {new_value}.” I’m using socket.io to push these updates to a react front-end client. The updates to records would be happening from a completely different system and not on the same blackened server where the change streams would be listening so I won’t be able to query the document before updating.
So I started to think of a different solution…maybe I could have two databases? One contains the old records and the other the updated records but this sounds like a duplication of data. And I can’t imagine having thousands of records.
I need some guidance as I really don’t know what the best option is? Is there really no way you can use change streams and get the previous values? Is it possible to somehow query the document before a change stream event? Thank you.
Not sure how I missed this but the solution is versioning the data.

What (in_memory) graph DB if modeling data is focused

I am out of ideas and hope to get some useful input. I am using this question to compress my experiences and share them, hoping to inspire some distributors to go the next step with modeling graph databases as a first class question/way.
I've been validating some graph database solutions usable by node.js for a few weeks. My use case is to save interactions of different social user network accounts. The need is to use CPU and memory in the most efficient way.
My most important requirements are:
in_memory (at least for indexing)
open source (and free to use)
same JavaScript/Node.js performance as first class citizen
comfortable query and modeling language
Neo4J
I really like cypher so my best choice would be Neo4j.
But the major issue about Neo4j is the JavaScript access is non-native. It uses the REST-API which is about ten times (10x) slower than direct Java access. So I took a look at node-neo4j-embedded, but it has been inactive for more than two years. It looks like its author isn't active at all (bad sign).
ArangoDB
The really nice core developers of ArangoDB answered to my question about internals. Finally it means JavaScript is first class citizen because native queries can be pushed out of JS. Looking at the open source benchmarks, I think it is fair. But I am afraid they didn't use node-neo4j-embedded for their benchmark. The benchmarks compare the REST-APIs (Edited because of #weinberger comment). I wished they compare the native APIs (maybe someone is snoopy enough and give it a try! - let us know!). Update: As I noticed now, OrientDB has answered the benchmark with a new node.js driver (using Command Cache by starting the server with -Dcommand.cache.enabled=true -Dcommand.cache.minExecutionTime=3, what isn't fair, because it wasn't a query caches benchmark!)
Because I like to use ArangoDB as a graph database I would have 3 choices (source: FAQ):
traverse JS objects
using AQLs graph functions
using the REST API
In general it isn't comfortable like cypher. And I am not sure how to compare and what is the right way modeling data (like Neo4J explains very well). I'd love to have something like this for ArangoDB Graphs. It feels like ArangoDB is focused on graph operations and Neo4J fits more the needs of using graphs if you have more relations than rows (the reason to use graphs instead of relations with joins).
MongoDB
The document based MongoDB isn't optimized for graph operations but latterly has gotten an experimental in_memory storage engine. Also there are some projects either in_memory or graph related but nothing is really compelling. And at this discussion it looks like MongoDB isn't what I like to use.
OrientDB
Because there is a comparison about OrientDB vs. MongoDB available (from OrientDB) I though about to use this one. "OrientDB has a hybrid Document-Graph engine" using SQL. I am a former PHP/MySQL expert. But where is the modeling part ? Their chapter working with graphs is not cypher like. It is like using SQL for Graphs. There is nothing wrong with that, but using cypher before I miss the modeling like feeling.
If someone did a modeling process with OrientDB and Graphs maybe you could write a tutorial like Neo4J had done.
Update: About JavaScript access like first citizen there are news:
"In the next release the speed of this driver will be comparable to the native Java one" The forked node.js driver had bin fixed last days.
Update: Before choosing OrientDB one might want to read article about some issues and discussions linked from there. The article is touching a sensitive issue and should be approached with critical mind. Note from author of this update: I'm new to editing SO and don't have enough reputation to put this to comments. I believe this information is a valid point to discussion, not sure how to place it here according to SO rules.
LokiJS
Before I was looking at Neo4J, ArangoDB and MongoDB, I played around with that JavaScript based in_memory database called LokiJS, what seams to follow the strategy to ignore everything what slows down performance and efficiency. LokiJS is trying to complete the Mongo-Style (RoadMap). The major issue is the bad ability to scale. Of cause it isn't a graph database but it was an interesting solution while the beginning of my project. Also it wasn't a perfect feeling to find all the distributed documentation (maybe they should reboot with GitBook).
Finally LokiJS is a very interesting project at all and I hope they will go forward!
LevelDB
Previously when I wrote my degree paper I was looking at levelDB. Remembering this while writing this post, I searched for LevelDB in_memory and got a promising result called MemDown (see also). I haven't tested this find, but maybe someone has experiences working and modeling for this solution. Maybe it would be the most efficient way if all the others will not fit because I would simply write a lightweight cypher clone with the goal to stay much lightweight as I can do.
Edit: Due to comment, here is a link to LevelGraph. As an idea to implement a CYPHER parser for LevelGraph/LevelDB your starting point would be to compare
Cypher:
CREATE (SUBJECT:"a") - [b:PREDICATE] -> (OBJECT:"c")
RETURN, subject, predicate, object
LevelGraph:
var RETURN = { SUBJECT: "a", PREDICATE: "b", OBJECT: "c" };
db.put(RETURN, function(err) {
// ..
});
Conclusion
As you likely noticed I am not the super hero about graphs. But this is my initial dive into this and I'm trying to get an overview. I assume there are a lot people out there who want to ask the same questions as me but haven't the time. I hope this post will help a lot people and will change by comments and answers to become a well done overview how to modeling data for graphs.
#editors: You are welcome.
#commenters: This is the result of my personal research - if you also have done a journey like me, please answer with a short summary like I have done for each DB I've evaluated (don't forget to target my 4 goals).
The idea to combine node-style performance through any of the native features (e.g. streams) and a high level query language like CYPHER is actually quite neat.
What you likely won't get is any kind of low level API, since this is rather rare with DB authors and, supposedly, not wanted in their design patterns. So, long running tcp connections shall just serve fine.
cypher-stream since to incorporate all of this, while (superficially judged) maintaining a good style.
Since you likely won't get any further with the search, I'd suggest sending him a pull request if any other features are needed :)
You should take a look at Gundb https://github.com/amark/gun
It's open source and has a very active and helpfull lead developer.
Join us at https://gitter.im/amark/gun

CQRS design: nosql data view

This is a "language agnostic" question.
I started to study the CQRS pattern.
I've a simple question. I'm supposing to have 2 different storage layer: one relational for the commands(Mysql etc..) and one NoSql (mongo,cassandra.. etc) for the "query"?
Let me explain a little example:
1) As a user I want to insert a "Todo task"
Command: "Create Task" and will insert a new task into a database which have the User and the Todo tables.
2) As a user I'm able to see a list of created task
Query: "GetTasks" that will return a "view" with a collection of task taken from a non sql table named "UserTasks" which have a user and a list of created task.
Is the right approach? I'm sorry if the language is poor, it's just a little example.
If it seems a good approach (again, don't consider details) what is the best approach to keep updated the data stores?
I'm thinking to raise an event like "TaskCreated" and take the new task and insert those information in the nosql storage.
Thanks!
I can't really understand what you're looking for. but... typically, a command would be something that results in side effects. Queries don't cause side effects. GetTasks wouldn't really be a command, but a query.
Your "CreateTask" would be a command, which would result in the task added to the relevant data store(s). Your GetTasks query would retrieve that information from a datastore. It doesn't really matter if you're using a SQL or NoSQL store for this.
The "CommandStore" is typically the store that has just enough data to enforce invariants. In your case, what data is required for that? Is some information required to decide whether or not a task can be registered? For example, say, you have a requirement that a user can have at most 3 "todo"s. In this case, a table in the "Command Store" storing (UserId, Todo Count) is enough. You could also use (UserId, [TodoId]) - ie. store a list of todo ids so that you can gain idempotence. All other information about the user and tasks would be query data, and would be in the query store.
Hope that makes sense.
While there are times when you may wish to store commands, you generally don't. Rather a popular approach is to store the domain events that occur as a result of the commands.This is referred to as Event Sourcing. This would make 'STOREA' a store of events or to put it another way, an event stream. 'STOREB' is typically referred to as the Read Model. It has a de-normalised structure optimised for read speed. It is kept up to date via de-normalisers which respond to specific events. A key point to note here is that there is often a lag between the event being raised and the read model being updated. This in my opinion is a good thing but needs to be thought about when designing the UI.
For more info take a look at CQRS – A Step-by-Step Guide to the Flow of a typical Application
I hope that helps

DDD/CQRS Querying Events

I was looking at post's on querying in application designed with approach Event Sourcing/DDD/CQRS.
As I understand events are changes to the state of a domain object. The changes to state will be maintained as history/events in DB(any of sql/no sql).
If user wants to query to get current state for a particular aggregate root, it will involve fetching history of events.
When user will query especially business specific queries he/she will be interested in current state not the history of events.
How querying or 'Q' part in CQRS works with event sourcing?
Consider I have a domain object "Account" as aggregate root. The account AR will go through lots of changes i.e. credits debits. event store will have credit and debit events.
Consider user is required to get current balance of an account, how stream of history of events will suite here? How will user fetch current balance for given account?
I am unable to understand, How for business specific querying history of events will be useful?
-Prakhyat M M
I would recommend you to read more articles from Greg Young (He is like the father of CQRS and Event Sourcing), like this: CQRS, Task Based UIs, Event Sourcing... agh.
Sorry for my bad English, I am from Paraguay. But I really like DDD - CQRS - ES and I would like to try to make a point.
The use of "Projections" (also known as Materialized Views) and the concept of "Eventual Consistency" are the fundamentals that every practitioner of CQRS should understand very well. The Event Store is for query. Is in the Command side of CQRS, not the in the Query side. You may use a bus to send the events stored in the Event Store to the query side in order to process and generate a read model, or view models, from which you can query. In any case a eventstore per se is a query model.
Looks like you are a Java guy, but, still, you may want to check the CQRS Journey from Microsoft.
Hope this helps a little bit and motivates you to do more research on DDD / CQRS / ES, the New Trio of Line of Business Applications.
You'll use a projection of the event stream into the read model, that contains exactly those information that the Query-side (Q) needs. For example, you could have an "account balance" projection that follows all events that change the account balance, but possibly ignores other events in the account's stream (such as owner changes). The projection then saves that info in a way that it can be queried very quickly, e.g., in memory or in a small read-model database table (accountId, balance) with the accountId as the key (database can be a key-value store, for example).
I suggest further reading on the CQRS concept such as this one or this one.
Interesting enough, recently more people discover using event store as the read model, leaving projections and "proper" read models until absolutely necessary.
We all know that dealing with projections increases the complexity. At minimum you have to create new models, establish the DAL for the read model and create projections to translate event to the read model changes, and bind those projections to the stream of events from your store. It requires more code, more moving parts and some of them are not easy to test. Schema changes at the read side also require migrations.
It appears that for many scenarios reading all events (properly partitioned) might be enough to have your "read model". It takes not much time until the system really grows large so you need to read tens of thousands of events to create one UI screen. But before you reach this point, you can just read events. May be use the file system to store events although tools like EventStore are free and quite easy to use. May be add some indexing.
This approach let you stabilise the domain significantly, you get more knowledge about how the system works, tune the events and be really prepared to bring the "proper" read model into the system, but you might not have to.
Adam Dymitruk has wrote a blog post about it, you might find it worth reading even if you don't want to take this approach. Greg Young also gave a talk EventStore as read model back in 2012.

MongoDb for collection of production data

I am facing a new type of problem that I haven't tried tackling before. So I would like some pointers in the right direction by someone more knowledgeable than I :-)
I have been asked by a friend to help him design a control system for production line. The project sounds really interesting, and I can't stop thinking about it.
I have already found that I can control the system using a node.js server. So far so good (HTML5 interface here we come)! But where I really want this system to stand out is in the collection of system metrics. The system reports all kinds of things such as temperature, flow etc, and these metrics are reported up to several hundred times per second per metric... and this runs 24/7.
My thought is to persist this in a MongoDb database, and do some realtime statistics on this. The "competition", if you will, seems to save this in a SQL server database and allow the operators to export aggregated data to Excel, and do statistics in Excel.
What are the strategies for doing real time statistics using a MongoDb?
I would really like to provide instant feedback and monitoring based on these metrics. Such as average temperature over the last 24 hours, spikes etc, and also enable alerts. There will not be much advanced statistics done on the server. If that is needed, I would enable export of data to a program such as SPSS.
Is MongoDb a good fit for that? I would love to use a Linux machine instead of a Windows machine with SQL Server and a WinForms Control Interface. The license fees alone are enough to put me off, although I know it probably isn't the case for the people buying the machinery.
This will not be placed in the cloud, but rather on a single server on the network. Next to the machine being operated, I will place a touch interface that through a browser will contact the node.js server to invoke PLC commands. There can be multiple machines that need controlling, and they would all be controlled by the same central node.js server.
The machinery is controlled by PLC controllers from http://beckhoff.com/.
I am not a complete novice when it comes to MongoDb, but I have never put anything I have made into production, and I wouldn't put MongoDb on my CV... yet!
EDIT: It seems that the $inc operator is the way to go. But what if I wan't both the daily and hourly averages as well as a continuous feed that updates a chart on screen with data every second using socket.io. Is is a good idea to update a document for each of the aggregates I need. I really also want to save every measurement, but maybe I could aggregate that on a per second basis, so I don't store up to a 1000 records per second per metric?
MongoDB can definitely be used for your scenario. Look at http://www.slideshare.net/pstokes2/social-analytics-with-mongodb, http://docs.mongodb.org/manual/use-cases/pre-aggregated-reports/ or
Real-time statistics: MySQL(/Drizzle) or MongoDB? for more on this topic
What I am really looking for is the Aggregation Framework: http://docs.mongodb.org/manual/tutorial/aggregation-examples/
That gives me exactly the kind of stats that I would like to see. Use this to calculate sums and averages as I write, and then also allow for ad-hoc queries should they be needed.
For a little insight on performance, read this awesome blogpost!
http://devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework
Also, anyone else looking to do something like this should take a look at this to see how to save the individual events. I don't need to save data longer than a week for example, so a rolling log should be more than enough for me: http://blog.mongodb.org/post/172254834/mongodb-is-fantastic-for-logging
With this I am very close to having a really sweet setup, and I am beginning to feel confident that this is a good choice over MySQL or MSSQL.

Resources