I'm currently involved in a app project, and I'm incharge of setting up the backend.
What i'm use to using is a MYSQL database + php for cleaning and managing the data sent to and fro the front end, which I have much more experience in. However, because of certain preferences of my bosses, on this project I've found myself looking at IBMs Bluemix and Cloudant software. Cloudant is a NoSQL database(like CouchDB) and my experience regarding noSQL is severely lacking. All I've mananged to do so far is to create a few JSON documents, and some basic views
What I need to figure out is how to perform the CRUD(create,read,update,delete) actions on a NoSQL database, or at least what it would look like.
In addition to this, I need to know if there are ways to implement security measures(implement security and anti-hacking functions) on a NoSQL database without an external source, or will I need to learn how to reroute the data through some sort of php function first, if i want it cleaned, before sending it to the Cloudant server where my database sits.
Let me know if my attempt to explain my problem is lacking in clarity. I'll try my best to state a different way, if need be.
Generally speaking, there is nothing equivalent to an ANSI to NoSQL databases. In other words, NoSQL databases are not as standardized as SQL databases. All standards are starting to appear. You can think of it as a technology still in the making.
What you have in general is an API with methods such as put_record or delete_record, or a REST interface that is logically equivalent. Also, in general you CRUD the whole record, not parts of the record.
Take a look at the reference: Cloudant - Reading and Writing
Having that said, in your case I would recommend abstracting away from the specific implementation of the NoSQL you want to use if you care about avoiding vendor lock-in. So I would suggest you to wrap CRUD functions using PHP functions that later can be replaced if you want to change the NoSQL database flavor.
This approach has the additional advantage to provide an abstraction for you to implement your own security. Some important NoSQL databases have no concept of multi-tenancy or just implemented that. Again, it is a technology in the making.
When your mindset is the relational one, you tend to think of the database as something that will help you guarantee data consistency as much as possible. But NoSQL databases are not like that. Think of them as a simple repository of documents (in a JSON or XML structure, for instance), without cross references.
Then the obvious question is perhaps: why would anyone want such a thing? One of the possible answers is because NoSQL databases may hold an aggregate of consolidated data. You can then retrieve aggregates to save time reprocessing or re-retrieving data unnecessarily.
As for security, most (if no all) NoSQL databases have some pretty good authentication mechanisms.
Is there anybody out there using Node-Neo4j-embedded in production mode ?
What kind of limits are expectable ?
Because I think this module is pushing the Cypher queries directly to the node-java module, what uses them directly with Neo4j java libs, I belief there shouldn't be any limits.
I feel it is dangerous to decide to use a lib what isn't maintained for about 2 years (see: github) - and it shouldn't be on Neo4j docs if it isn't maintained (see: README.md dead link about API-Docs).
It looks like there could be a new trend to power up node.js support like first citizen languages by other distributor(s) for (in_memory) graph databases. Maybe Neo4j also should review this and the unmaintained node module (like OrentDB did). The trend had bin initiated by a benchmark-battle between ArangoDB and OrientDB.
I would love to see an Node-Neo4j-embedded benchmark answer to the open source benchmark of ArangoDB - done by professional Neo4j people like OrientDB people had done. But note: They hadn't been fair enough (read the last lines about enabling query caches...).
Or it has to be a new benchmark focused on most possible first citizens-like access by NodeJS. There are three possible solutions to test. I am not experienced enough to do such a test what would be really acceptable. But I would like to help by verifying this.
Please support this call for action with comments and (several types of) answers. A better (native like access) and wider range of supporting in_memory and graph solutions would help the node community very much. A new benchmark would force innovation
Short note about ArangoDBs benchmark: They've tested the REST-APIs. But if you think about performance, you don't like to use a REST-API - you like to use direct library access.
#editors: you are welcome
We (ArangoDB) think that the scalability of embedded databases is to limited. It also limits the number of databases which you may want to compare. Users prefer to implement their solutions in their Application stack of choice, so you would limit the number of people potentially interested in your comparison.
The better way of doing this is to compare the officialy supported interface of the database vendor into the client stack that is commonly supported amongst all players in the field. This is why we have chosen nodejs.
There is enough chatter about benchmarks and how to compare them on stackoverflow, so in doubt, start out to create a usecase and implement code for it, present your results in a reproducable way and request that for comment, instead of demanding others to do this for you.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm building a NodeJS application and am utterly torn between NoSQL MongoDB vs RMDS PostregresSql. My project is to create a open source example project for logging visitors and displaying visitor statistics in real time on a webpage using NodeJS. I was planning on using MongoDB at first, because lot of NodeJS examples and tutorials, albeit mostly older ones, used it and paas hosters with a free tier are abounding. However, I was seeing a lot of bashing on MongoDB recently and found that people who tried to use MongoDB ended up switching to Postgres:
http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
http://dieswaytoofast.blogspot.com/2012/09/mysql-vs-postgres-vs-mongodb.html
http://www.plotprojects.com/why-we-use-postgresql-and-slick/
I also a fan of Heroku and have heard a lot about Postgres because of that and find that SQL queries can be nice sometimes.
I'm not a database expert, so I can't tell for the life of me which way to go. I would really appreciate it if you could give some advice on which one to consider and why.
I have a few criteria:
Since I want this to be a example, it would be nice to have a way to host a decently sized amount of data. I know that MongoDB definitely offers this, but Postgres paas like Heroku seem to have pretty small databases (since I am logging every visitor to the website)
A database that is simplistic and easy to explain to others.
Performance doesn't really matter, but speed can't hurt
Thanks for all of the help!
Note: Please no flame wars, everyone has their own opinion :)
Choosing between an SQL database and a NoSQL database is certainly being debated heavily right now and there are plenty of good articles on the subject. I'll list a couple at the end. I have no problem recommending SQL over NOSQL for your particular needs.
NoSQL has a niche group of use cases where data is stored in large tightly coupled packages called documents rather than in a relational model. In a nutshell, data that is tightly coupled to a single entity (like all the text documents used by a single user) is better stored in a NoSQL document model. Data that behaves like excel spreadsheets and fits nicely in rows and is subject to aggregate calculations is better stored in a SQL database of which postgresql is only one of several good choices.
A third option that you might consider is redis (http://redis.io/) which is a simple key value data store that is extremely fast when querying like SQL but not as rigidly typed.
The example you cite seems to be a straightforward row/column type problem. You will find the SQL syntax is much less arcane than the query syntax for MongoDB. Node has many things to recommend it and the toolset has matured significantly in the past year. I would recommend using the monogoose npm package as it reduces the amount of boilerplate code that is required with native mongodb and I have not noticed any performance degradation.
http://slashdot.org/topic/bi/sql-vs-nosql-which-is-better/
http://www.infoivy.com/2013/07/nosql-database-comparison-chart-only.html
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
What are the core architectural differences between these technologies?
Also, what use cases are generally more appropriate for each?
Update
Now that the question scope has been corrected, I might add something in this regard as well:
There are many comparisons between Apache Solr and ElasticSearch available, so I'll reference those I found most useful myself, i.e. covering the most important aspects:
Bob Yoplait already linked kimchy's answer to ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?, which summarizes the reasons why he went ahead and created ElasticSearch, which in his opinion provides a much superior distributed model and ease of use in comparison to Solr.
Ryan Sonnek's Realtime Search: Solr vs Elasticsearch provides an insightful analysis/comparison and explains why he switched from Solr to ElasticSeach, despite being a happy Solr user already - he summarizes this as follows:
Solr may be the weapon of choice when building standard search
applications, but Elasticsearch takes it to the next level with an
architecture for creating modern realtime search applications.
Percolation is an exciting and innovative feature that singlehandedly
blows Solr right out of the water. Elasticsearch is scalable, speedy
and a dream to integrate with. Adios Solr, it was nice knowing you. [emphasis mine]
The Wikipedia article on ElasticSearch quotes a comparison from the reputed German iX magazine, listing advantages and disadvantages, which pretty much summarize what has been said above already:
Advantages:
ElasticSearch is distributed. No separate project required. Replicas are near real-time too, which is called "Push replication".
ElasticSearch fully supports the near real-time search of Apache
Lucene.
Handling multitenancy is not a special configuration, where
with Solr a more advanced setup is necessary.
ElasticSearch introduces
the concept of the Gateway, which makes full backups easier.
Disadvantages:
Only one main developer [not applicable anymore according to the current elasticsearch GitHub organization, besides having a pretty active committer base in the first place]
No autowarming feature [not applicable anymore according to the new Index Warmup API]
Initial Answer
They are completely different technologies addressing completely different use cases, thus cannot be compared at all in any meaningful way:
Apache Solr - Apache Solr offers Lucene's capabilities in an easy to use, fast search server with additional features like faceting, scalability and much more
Amazon ElastiCache - Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud.
Please note that Amazon ElastiCache is protocol-compliant with Memcached, a widely adopted memory object caching system, so code, applications, and popular tools that you use today with existing Memcached environments will work seamlessly with the service (see Memcached for details).
[emphasis mine]
Maybe this has been confused with the following two related technologies one way or another:
ElasticSearch - It is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.
Amazon CloudSearch - Amazon CloudSearch is a fully-managed search service in the cloud that allows customers to easily integrate fast and highly scalable search functionality into their applications.
The Solr and ElasticSearch offerings sound strikingly similar at first sight, and both use the same backend search engine, namely Apache Lucene.
While Solr is older, quite versatile and mature and widely used accordingly, ElasticSearch has been developed specifically to address Solr shortcomings with scalability requirements in modern cloud environments, which are hard(er) to address with Solr.
As such it would probably be most useful to compare ElasticSearch with the recently introduced Amazon CloudSearch (see the introductory post Start Searching in One Hour for Less Than $100 / Month), because both claim to cover the same use cases in principle.
I see some of the above answers are now a bit out of date. From my perspective, and I work with both Solr(Cloud and non-Cloud) and ElasticSearch on a daily basis, here are some interesting differences:
Community: Solr has a bigger, more mature user, dev, and contributor community. ES has a smaller, but active community of users and a growing community of contributors
Maturity: Solr is more mature, but ES has grown rapidly and I consider it stable
Performance: hard to judge. I/we have not done direct performance benchmarks. A person at LinkedIn did compare Solr vs. ES vs. Sensei once, but the initial results should be ignored because they used non-expert setup for both Solr and ES.
Design: People love Solr. The Java API is somewhat verbose, but people like how it's put together. Solr code is unfortunately not always very pretty. Also, ES has sharding, real-time replication, document and routing built-in. While some of this exists in Solr, too, it feels a bit like an after-thought.
Support: there are companies providing tech and consulting support for both Solr and ElasticSearch. I think the only company that provides support for both is Sematext (disclosure: I'm Sematext founder)
Scalability: both can be scaled to very large clusters. ES is easier to scale than pre-Solr 4.0 version of Solr, but with Solr 4.0 that's no longer the case.
For more thorough coverage of Solr vs. ElasticSearch topic have a look at https://sematext.com/blog/solr-vs-elasticsearch-part-1-overview/ . This is the first post in the series of posts from Sematext doing direct and neutral Solr vs. ElasticSearch comparison. Disclosure: I work at Sematext.
I see that a lot of folks here have answered this ElasticSearch vs Solr question in terms of features and functionality but I don't see much discussion here (or elsewhere) regarding how they compare in terms of performance.
That is why I decided to conduct my own investigation. I took an already coded heterogenous data source micro-service that already used Solr for term search. I switched out Solr for ElasticSearch then I ran both versions on AWS with an already coded load test application and captured the performance metrics for subsequent analysis.
Here is what I found. ElasticSearch had 13% higher throughput when it came to indexing documents but Solr was ten times faster. When it came to querying for documents, Solr had five times more throughput and was five times faster than ElasticSearch.
Since the long history of Apache Solr, I think one strength of the Solr is its ecosystem. There are many Solr plugins for different types of data and purposes.
Search platform in the following layers from bottom to top:
Data
Purpose: Represent various data types and sources
Document building
Purpose: Build document information for indexing
Indexing and searching
Purpose: Build and query a document index
Logic enhancement
Purpose: Additional logic for processing search queries and results
Search platform service
Purpose: Add additional functionalities of search engine core to provide a service platform.
UI application
Purpose: End-user search interface or applications
Reference article : Enterprise search
I have been working on both solr and elastic search for .Net applications.
The major difference what i have faced is
Elastic search :
More code and less configuration, however there are api's to change
but still is a code change
for complex types, type within types i.e nested types(wasn't able to achieve in solr)
Solr :
less code and more configuration and hence less maintenance
for grouping results during querying(lots of work to achieve in
elastic search in short no straight way)
I have created a table of major differences between elasticsearch and Solr and splunk, you can use it as 2016 update:
While all of the above links have merit, and have benefited me greatly in the past, as a linguist "exposed" to various Lucene search engines for the last 15 years, I have to say that elastic-search development is very fast in Python. That being said, some of the code felt non-intuitive to me. So, I reached out to one component of the ELK stack, Kibana, from an open source perspective, and found that I could generate the somewhat cryptic code of elasticsearch very easily in Kibana. Also, I could pull Chrome Sense es queries into Kibana as well. If you use Kibana to evaluate es, it will further speed up your evaluation. What took hours to run on other platforms was up and running in JSON in Sense on top of elasticsearch (RESTful interface) in a few minutes at worst (largest data sets); in seconds at best. The documentation for elasticsearch, while 700+ pages, didn't answer questions I had that normally would be resolved in SOLR or other Lucene documentation, which obviously took more time to analyze. Also, you may want to take a look at Aggregates in elastic-search, which have taken Faceting to a new level.
Bigger picture: if you're doing data science, text analytics, or computational linguistics, elasticsearch has some ranking algorithms that seem to innovate well in the information retrieval area. If you're using any TF/IDF algorithms, Text Frequency/Inverse Document Frequency, elasticsearch extends this 1960's algorithm to a new level, even using BM25, Best Match 25, and other Relevancy Ranking algorithms. So, if you are scoring or ranking words, phrases or sentences, elasticsearch does this scoring on the fly, without the large overhead of other data analytics approaches that take hours--another elasticsearch time savings.
With es, combining some of the strengths of bucketing from aggregations with the real-time JSON data relevancy scoring and ranking, you could find a winning combination, depending on either your agile (stories) or architectural(use cases) approach.
Note: did see a similar discussion on aggregations above, but not on aggregations and relevancy scoring--my apology for any overlap.
Disclosure: I don't work for elastic and won't be able to benefit in the near future from their excellent work due to a different architecural path, unless I do some charity work with elasticsearch, which wouldn't be a bad idea
If you are already using SOLR, remain stick to it. If you are starting up, go for Elastic search.
Maximum major issues have been fixed in SOLR and it is quite mature.
Imagine the use case:
A lot(100+) of small(10Mb-100Mb, 1000-100000 documents) search indexes.
They are using by a lot of applications (microservices)
Each application can use more than one index
Small by size index, yes. But huge load(hundreds search-requests per second) and requests are complex (multiple aggregations, conditions and so on)
Downtimes are not allowed
All of that is working years long, and constantly growing.
Idea to have individual ES instance per each index - is huge overhead in this case.
Based on my experience, this kind of use case is very complex to support with Elasticsearch.
Why?
FIRST.
The major problem is fundamental back compatibility disregard.
Breaking changes are so cool!
(Note: imagine SQL-server which require you to do small change in all your SQL-statements, when upgraded... can't imagine it. But for ES it's normal)
Deprecations which will dropped in next major release are so sexy!
(Note: you know, Java contain some deprecations, which 20+ years old, but still working in actual Java version...)
And not only that, sometimes you even have something which nowhere documented (personally came across only once but... )
So. If you want to upgrade ES (because you need new features for some app or you want to get bug fixes) - you are in hell. Especially if it is about major version upgrade.
Client API will not back compatible. Index settings will not back compatible.
And upgrade all app/services same moment with ES upgrade is not realistic.
But you must do it time to time. No other way.
Existing indexes is automatically upgraded? - Yes. But it not help you when you will need to change some old-index settings.
To live with that, you need constantly invest a lot of power in ... forward compatibility of you apps/services with future releases of ES.
Or you need to build(and anyway constantly support) some kind of middleware between you app/services and ES, which provide you back compatible client API.
(And, you can't use Transport Client (because it required jar upgrade for every minor version ES upgrade), and this fact do not make your life easier)
Is it looks simple & cheap? No, it's not. Far from it.
Continuous maintenance of complex infrastructure which based on ES, is way to expensive in all possible senses.
SECOND.
Simple API ? Well... no really.
When you is really using complex conditions and aggregations.... JSON-request with 5 nested levels is whatever, but not simple.
Unfortunately, I have no experience with SOLR, can't say anything about it.
But Sphinxsearch is much better it this scenario, becasue of totally back compatible SphinxQL.
Note:
Sphinxsearch/Manticore are indeed interesting. It's not Lucine based, and as result seriously different. Contain several unique features from the box which ES do not have and crazy fast with small/middle size indexes.
I have use Elasticsearch for 3 years and Solr for about a month, I feel elasticsearch cluster is quite easy to install as compared to Solr installation. Elasticsearch has a pool of help documents with great explanation. One of the use case I was stuck up with Histogram Aggregation which was available in ES however not found in Solr.
Add an nested document in solr very complex and nested data search also very complex. but Elastic Search easy to add nested document and search
I only use Elastic-search. Since I found solr is very hard to start.
Elastic-search's features:
Easy to start, very few setting. Even a newbie can setup a cluster step by step.
Simple Restful API which using NoSQL query. And many language libraries for easy accessing.
Good document, you can read the book: . There is a web version on official website.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
what is the best way is to use a ORM like Nhibertate or Entity Framework or to do a customer ORM .
I will use this ORM for a C# 4.0 project
UPDATE 2016
Six years later things are VERY different. NHibernate is all but abandoned, other alternatives were abandoned (eg Subsonic), Entity Framework is perhaps the most common full-featured ORM, and people have been moving to micro ORMs like Dapper for years, to map from queries to objects with a minimum of overhead.
The application scenarios have also changed. Instead of loading and caching one big object graph at the expense of memory and performance, Web Services and REST APIs need to service a high number of smaller requests. This means that a full ORM's overhead is no longer acceptable.
This means that patterns and techniques like Active Record, transaction per request etc have become throughput and scalability killing anti-patterns
One of the most important features nowadays is asynchronous execution , to reduce thread and CPU waste due to waits. NHibernate never made this transition.
Original Answer
Define "best": Is it the most mature, the one with more documentation, bigger community, more mainstream?
NHibernate is more mature, feature rich, with a more advanced community and not likely to be discontinued when MS decides to break compatibility again. Entity Framework is more mainstream and is supported out-of-the-box. You will find more beginner books for EF, more advanced books for NH.
A good option would be to try one of the simpler ORMs like Subsonic and move to more advanced ORMs once you understand how ORMs work, what are the various pitfalls, what SELECT N+1 means [:P]
Just don't try to create your own ORM, there are several dozens out there already! Subsonic, Castle ActiveRecord, NH, EF (of course), LLBLGenPro...
If you can spend some money, have definetely a look at LLBLGEn Pro 3.0
full .NET 4.0 support and it's a mature product. Good support it's also useful.
wide database support (Oracle,MS Sql,Firebird,MySql,PostgreSQL,Sysbase)
nice designer, Model first support and also Database first support
If your budget is thin, then try NHibernate. It's also a mature product,but It has a bigger learning curve. And if you need some support,you can always call Ayende :-)
For smaller projects it's EF 4.0 a good choice.
Most ORMs have their own strengths and weaknesses.
Entity Framework, for example, has the (huge?) advantage of being in the framework itself, but it also is fairly heavy-weight, and a bit more difficult to get up and running (steeper learning curve).
There are some very nice, very easy to use commercial ORMs. I'm currently using Lightspeed in a C# 4 project, and extremely happy with it for this specific scenario.
It really comes down to what you need from the ORM. If you want very quick and easy to setup and use, Lightspeed, subsonic, and others are very nice. If you need full featured, then Entity Framework and NHibernate are good options.
Calling an ORM the best amongst all in General perspective is completely impossible. Each of them are best from different perspectives. You chose the one that best fits to your needs. Linq2Sql was written with performance in mind but it lacks support for other providers,Linq2Sql is very fast. Yet there are others that may not be as fast as Linq2Sql when it comes to dealing with SQL server but they support wide variety of providers. The best idea would be to list down the features you want an ORM to have for your project and select the one that is serves all your needs.` You may ask these questions to chose the right ORM for your project.
What database providers you want the ORM to support ? SQL Server,MySQL, Oracle, etc.
Do you need model-first or db-first support ?
What is my performance criteria [memory, processing] ?
Are you going to use it in web-app or a desktop-app ?
Do you have distributed clients in your application ?
And the list goes on..
I use Linq-to-SQL as my main ORM when creating C# applications. I'll eventually move on to Entity Framework, but for now this one is really easy to use and fast.
I would agree with #this. __curious_geek that choosing the right ORM would depend on your requirements.Having worked on both Hibernate and Entity Framework, I feel the latter is more user friendly as it is a GUI based editor. On the feature richness front, NHibernate has an advantage of supporting a lot of database providers. Also customizing NHibernate turned out to be much easier than tweaking the Entity Framework.
Assuming the most tools satisfy your core requirement, I would prefer NHibernate as a vibrant and engaging user community is a big plus for any tool.