Node MVC app with dynamodb ORM including associations - node.js

I want to build a Node.js MVC app.
My data is stored in dynamoDB. I'm looking for a suitable framework for this.
I'm mainly debating between:
Express.js (for the controllers) with vogels as the ORM (for the
models),
Sails.js with the dynamoDB adapter .
I prefer to have associations support between models so that I'll not need to implement it myself in my code.
Can anybody advise what are the props and cons of both options? Can i do with the second option everything I can do in the first option but with less code? Any other recommendations?

Firstly, this is very opinion based question. So, I would just give my opinion. This does not mean one is far better than the other.
I have used Vogels for some use cases. I have found it very useful. Some of the advantages of Vogels are:-
1) Parallel scans - Helps to improve performance which developers would most likely come across at some point of the project especially if you are going to maintain millions of records in DynamoDB
2) Supports both global and local secondary indexes - Based on the query pattern, the application would most likely require index on tables. So this feature is very helpful
3) Data type and validation support using Joi (Joi Link)
4) Automatic addition of audit timestamp fields such as updatedAt, createdAt
5) Automatic key value generation in UUID format
6) Chainable API for query and scan operations - You can chain multiple filter conditions with limit option for pagination and sort the results as well
7) Load multiple models with in a single request (Batch get items feature)
8) Basic streaming api
9) Some good sample codes for many features which is very important for developers

Related

Best way to convert query results to domain entities

I am using Knex.js to build SQL queries. It works well but I need to convert my query results into domain entities (a type representing an object from the domain) for my graphql resolvers. I used Knex to avoid using an ORM because a number of people online made it seem like an ORM will make queries more difficult. My current best idea is to follow the Repository pattern and have the ugly code for converting results to classes in the repo class. Better ideas are welcome :)
As I understood you want to just make a db-call based on GraphQL query (which is mean you already have db and want to use simple ORM instead of EF for example).
I don't know which platform do you have, but if you have .net, you can take a look NReco.GraphQL. It allows to set db-connection and define graphql schema in the json file (graphql schmea to db-table including relation between schemas), definately, it's worth take a look.

Bigdata analysis in nosql

I'm trying to migrate our postgres database containing millions of clicks (few years click history) to more performing system. Our current analytic queries, which are running on postgres are taking forever to complete and it degrades performance of the whole database. I've been investigating possible solutions and I've decided to closely investigate 2 options:
HBase with Hadoop (mapreduce)
Cassandra with Spark
I was working with NoSQL before, however never used it for analytical purposes. At first I was a bit disapointed how little analytical query options those databases provide (missing groupBy, count, ...). After reading many articles and presentations I've found out, that I need to design my schema according how I intend to read my data and that storage layer is separated from query layer. Which adds more redundant data, however in the world of NoSQL this is not an issue.
Eventually I've found one nice grails plugin cassandra-orm, which internally encapsulates orderBy feature in cassandra counters counters. However I'm still worried about howto make this design extendable. What about the queries, that will come in the future, which I have no clue about today, how can I design my schema prepared for that ?
One option would be to use Spark, but Spark doesn't provide data in real time.
Could you give me some insight or advice what are the best possible options for bigdata analysis. Should I use combination of real time queries vs. pre-aggregated ones?
Thanks,
If you are looking at near real time data analysis, Spark + HBase combination is one of the solutions.
If you want to compromise on throughput, Solr + Cassandra combination from Datastax can be used.
I am using Solr + Cassandra from Datastax for my use case, which does not require real time processing. The performance of search option is not that great with this combo but I am OK with the throughput.
Spark+HBase combination seems to be promising. Depending on your business requirement & expertise, you can chose the right combination.
If you want the ability to analyse data in near-real-time with complete flexibility in query structure, I think your best bet would be to throw a scalable indexing engine such as Elasticsearch or Solr into your polyglot persistence mix. You could still use Cassanra as the primary data store and then index those fields you're interested in querying and/or aggregating.
Have a look at Datastax Enterprise which bundles together Cassandra and Solr. Also have a look at Solr's Stats component and its faceting capabilities. These, combined with the indexing engine's rich query language, are handy for implementing many analytics use cases.
If your data set consists of a few million records 'only', I think you'll be able to get some good response times from Solr or ES on a reasonably spec'ed cluster.

How do I run geospatial queries at scale with NoSQL?

I am preparing to build an Android/iOS app that will require me to make complex polygon and containment geospatial queries. I like Apache Cassandra's no single point of failure, fault tolerance and data center awareness. Cassandra does not have direct support for geospatial queries (that I am aware of) but MongoDB and Couchbase Server do. MongoDB has scaling issues and I'm not sure if Couchbase would be a better alternative than Cassandra with Solr or Elasticsearch.
Would I be making a mistake by going with Datastax Enterprise (DSE), Cassandra and Elasticsearch over Couchbase Server? Will there be a noticeable difference in load times for web pages with the Cassandra/ES back end vs. Couchbase?
Aerospike just released Server Community Edition 3.7.0, which includes Geospatial Indexes as a feature.
Aerospike can now store GeoJSON objects and execute various queries, allowing an application to track rapidly changing Geospatial objects or simply ask the question of “what’s near me”. Internally, we use Google’s S2 library and Geo Hashing to encode and index these points and regions. The following types of queries are supported:
Points within a Region
Points within a Radius
Regions a Point is in
This can be combined with a User-Defined Function (UDF) to filter the results – i.e., to further refine the results to only include Bars, Restaurants or Places of Worship near you – even ones that are currently open or have availability. Additionally, finding the Region a point is in allows, for example, an advertiser to figure out campaign regions that the mobile user is in – and therefore place a geospatially targeted advertisement. Internally, the same storage mechanisms are used, which enables highly concurrent reads and writes to the Geospatial data or other data held on the record. Geospatial data is a lot of fun to play around with, so we have included a set of examples based on Open Street Map and Yelp Dataset Challenge data.
Geospatial is an Experimental feature in the 3.7.0 release. It’s meant for developers to try out and provide feedback. We think the APIs are good, but in an experimental feature, based on the feedback from the community, Aerospike may choose to modify these APIs by the time this feature is GA. It’s not intended for Production usage right now (though we know some developers will go directly to Production ...)
Aerospike provides a proven highly scalable NoSQL solution. Geospatial query has recently been added, and an Early Adopter release has just been announced. You might want to check that out.
Redis is probably one of the best alternatives. At the current time you would need to use Redis Unstable 3.2. The performance is oustanding. I have been using this with the lettuce java client and have seen incredible results. The larger the radius will decrease performance.
http://redis.io/commands/geohash
You are asking quite a few questions, as has been pointed out. The provided link offers one potential answer to how generic geospatial operations could be implemented using Cassandra. I'll offer one possible answer using straightforward out-of-the-box Cassandra constructs.
Using geohashes (or quad trees), or something similar, create an index of geohashes and their associated polygons. The specific relationship and level(s) of precision are dependent on your data set and use case.
To determine which polygons intersect with a given point or polygon, first compute its geohash(es), then look those geohashes up in the index. For general proximity, this may be sufficient. Either way, this narrows the potential intersection points down to a manageable set.

which nodejs index method is better

According to the neo4j documentation, indexing can be done i 2 ways"
Indexing in Neo4j can be done in two different ways:
1. The database itself is a natural index consisting of its relationships of different types between nodes. For example a tree
structure can be layered on top of the data and used for index lookups
performed by a traverser.
2. Separate index engines can be used, with Apache Lucene being the default
backend included with Neo4j.
But there is no comparison which is better in what and what is better in which cases.
Which one is better and why?
Is this a data warehouse/mart or reporting database? If you have both transactions and search going against the database it might give interesting pros or cons.
Lucene exists for one reason search and it does it really well. If you have a large system with multiple services, for ultimate scalability it is always to split the services up and keep them doing their single responsibility. This would give you flexibility of using that Lucene index against other services if necessary...also if you ever got rid off neo4j, then you still have your index/search artifacts around not coupled to Neo4j.
I would look at it from the overall system architecture not just specific functionality.

Pick database for ads/analytics service

Now I have a project with ads exchange service (something like google double click) and I have to pick a high-scalable database. I'm thinking about mongodb or cassandra.
Cassandra:
fit with our write-intensive system. (+)
looks hard to do aggregate(very important for analytics) (is there a good way? Just read slide about Twitter rainbird, seem good) (?)
I dont prefer java much. (-)
MongoDB:
Seem easier to do analytics. (have build-in aggregate functions) (+)
more RAM-consuming? (because of document-oriented vs key-value Cassandra) (?)
write perfomance compare to Cassandra? (?)
javascript shell and natural fit with node.js(one important part in our project) (+)
http://pastebin.com/raw.php?i=FD3xe6Jt - This article make me cautious. (-)
Can you guys help me to pick the one or answer some of my questions above
Thanks.
I don't know about Cassandra, but MongoDB has some advantages for using it for analytics: high concurrency, sharding, storing everything about an event in a single document, features like upsert and $inc.
For more detailed explanations check the following resources:
MongoDB Analytics - videos
http://blog.mongodb.org/post/171353301/using-mongodb-for-real-time-analytics
http://www.mongodb.org/display/DOCS/Use+Cases
http://www.slideshare.net/jrosoff/scalable-event-analytics-with-mongodb-ruby-on-rails
http://nosql.mypopescu.com/post/3508305955/fast-asynchronous-analytics-with-mongodb
http://blog.opengovernment.org/2011/02/24/fast-asynchronous-analytics-with-mongodb/
http://blog.10gen.com/post/4416876632/london-startup-ubervu-on-storing-5tb-of-data-in-mongodb
It depends a lot on your domain, most cases one would probably choose Mongo.
For example http://square.github.com/cube/ is built on Mongo.
Cube is an open-source system for visualizing time series data, built on MongoDB, Node and D3. If you send Cube timestamped events (with optional structured data), you can easily build realtime visualizations of aggregate metrics for internal dashboards. For example, you might use Cube to monitor traffic to your website, counting the number of requests in 5-minute intervals:
Most use cases of Cassandra draw from the need oh high availability that's the main feature of it afaik. Your needs seem to be centered around having a cheap way to shove queryable data in a scale-out DB, and mongo almost matches RDBMS in regards to querying. Mongo is also probably easier to deal with.
I think cassandra is a good fit for this problem.
You don't need to know much java to get it running (other than install java), as long as there is a client library in your chosen language.
Cassandra 0.8+ now has atomic counter support - perfect for impressions/click tracking.
You could also run hadoop on top of cassandra, giving you a proven platform for writing map reduce jobs to do analytics/aggregations and store the results back to Cassandra too.
Check out this slideshow about cassandra and hadoop: http://www.slideshare.net/jeromatron/cassandrahadoop-4399672
I hope that helps.

Resources