Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Seems there has been some data loss with nodejs + redis :
https://hallard.me/damaged-community-forum-lost-data/
https://community.nodebb.org/topic/6904/how-to-export-from-redis-to-mongodb-my-database-got-wiped/58
Did someone experience the same disaster and know how to fix it apart from backuping up the whole stuff.
At the company i work at, we've been using it for quite a long time now and it never failed us.
In my opinion, you should never use a database you are not very familiar with, then and only then you will face problems such as saving corrupted data or "losing data".
redis will lose all its data in case of crashes (if the server memory maxes out for example) hence you will need to use redis persistence modules.
there are two types of redis persistence data modules, RDB and AOF. you should choose consciously choose which one (or both) to use based on the nature of data you're going to store in there.
The RDB persistence performs point-in-time snapshots of your dataset at specified intervals.
the AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log on background when it gets too big.
read more about it here: http://redis.io/topics/persistence
here is a quote from a good blog post about using redis as a primary database:
Redis persistence is not less reliable compared to other databases, it
is actually more reliable in most of the cases because Redis writes in
an append-only mode, so there are no crashed tables, no strange
corruptions possible.
source: https://blog.andyet.com/2012/02/09/redis-reliability-for-realtime-apps/
node should not affect how redis work, its only used to communicate data from and to redis, you should not worry about using node in particular.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 days ago.
Improve this question
I need to download data from several APIs some using basic REST some using GraphQL, parse the data and map some of the fields to various Azure SQL tables discarding the unneeded data (will be later visualised in PowerBI).
I started using Azure Data Factory but got frustrated with the lack of simple functions like converting json field containing html into text.
I then looked at Azure Functions, thought Python (although I’m open to NodeJS) however I’ve got a lot of data to download and upSert into the database and there is mentions on the internet the ADF is the most efficient to bulk upSert data.
Then I thought ADF using function to get the data and ADF to bill copy.
So my question is what should I be using for my use case? I’m open to any suggestions but it needs to be on Azure and cost sensitive. The ingestion needs to run daily upSerting around 300,000 records.
I think this pretty much comes down to taste, as you can probably solve this entirely only using ADF or an azure function, depending on more specific circumstances of your case. In my personal experience I've often ended up using the hybrid variant because I can be easier due to more flexibility compared to the standard API components of ADF, doing the extraction from an API using an azure function, storing the data in blob storage/data lake and then loading the data into a database using ADF. This setup can be pretty cost effective from my experience, depending on if you can use an azure function consumption plan (cheaper than alternatives) and/or can void using data flows in ADF (a significant cost driver in ADF)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
My client ask me to build a realtime application that could chat, send images and videos all in realtime. He asked me to come up with my own technology stack, so I did a lot of research and found out that the easiest one to build would be using below tech stack
1) Node.js and cluster to max out the CPU core for one instance of server - Language
2) Socket.io - realtime framework
3) Redis - pub/sub for multiple instances of server
4) Nginx - to reverse proxy and load balance multiple servers
5) Amazon EC2 - to run the server
6) Amazon S3 and CloudFront - to save the images/videos and to deliver
Correct me if I'm wrong for the above stack. My real question is, can the above tech stack scale 1,000,000 messages per seconds (text, images, videos)?
Anyone who have experienced with node.js and socket.io, could give me an insights or an alternatives of the above stack.
Regards,
SinusGob
My real question is, can the above tech stack scale 1,000,000 messages
per seconds (text, images, videos)?
Sure it can. With the right design and enough hardware. The question your client should be asking is really not whether it can be made to go that big, but at what cost and practicality can it be done and are those the best choices.
Let's look at each piece you've mentioned:
node.js - For an I/O centric app, it's an excellent choice for high scale and it can scale by deploying many CPUs in a cluster (both multi-process per server and multi-server). How practical this type of scale is depends a lot on what kind of shared data all these server processes need access to. Usually, the data store ultimately ends up being the harder bottleneck in scaling because it's easy to throw more servers at the request processing. It's not so easy to throw more hardware at a centralized data store. There are ways to do that, but it depends a lot on the demands of the app for how you do it and how hard it is.
socket.io - If you need efficient server push of smallish messages, then socket.io is probably the best way to go because it's the most efficient at push to the client. It is not great at all types of transport though. For example, I wouldn't be moving large images or video around through socket.io as there are more purpose built ways to do that. So, the use of socket.io depends a lot on what exactly the app wants to use it for. If you wanted to push a video to a client, you could also push just an URL and have the client turn around and request the video via a regular http URL using well known high scale technology.
Redis - Again, great for some things, not great at everything. So, it really depends upon what you're trying to do. What I explained earlier is that the design of your data store and the number of transactions through it is probably where your real scale problems lie. If I were starting this job, I'd start with an understanding of the data storage needs for a server, transactions per second of various types, caching strategy, redundancy, fail-over, data persistence, etc... and design the high scale access to data first. I wouldn't be entirely sure redis was the preferred choice. I'd probably suggest you need a high scale database guy as a consultant early in the project.
Nginx - Lots of high scale sites using nginx so it's certainly a good tool. Whether it's exactly the right tool for you depends upon your design. I'd probably work on this part last because it seems less central to the design and once the rest of the system is laid out, you can then consider what you need here.
Amazon EC2 - One of several possible choices. These choices are hard to compare directly in an apples to apples comparison. Large scale systems have been built out of EC2 so there is proof of concept there and the general architecture seems an appropriate match. If you wanted to know where the real gremlins are there, you'd need a consultant that had done high scale stuff on EC2.
Amazon S3 - I personally know some very high storage and bandwidth sites using S3 for both video and images. It works for that.
So ... these are generally likely good tools to use if they are used in the right way. Redis would be a question-mark depending upon the storage needs of the actual application (you've provided zero requirements and a database can't be selected with zero requirements). A more reasoned answer would be based on putting together a high level set of requirements that analyze what the system needs to be able to do to serve 1,000,000 whatever. Those requirements could be compared with known capabilities for some of these pieces to start a ballpark on scaling a system. Then, you'd have to put together some benchmarking tests to run some tests on certain pieces of the system. As much of the success of failure would depend upon how the app was built and how the tools were used as it would which tools were selected. You can likely make a successful scale with many different types of tools. Heck, Facebook runs on PHP (well, a highly modified, customized PHP that is not really typical PHP at all at runtime).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We have a POC using Spring Core whose work is essentially determined by two application properties read from a file. We can scale this out by spinning up additional jvms (running same code base) and assigning different property values to each jvm so that they don't interfere with each other. This works to an extent, but I would like to make it more dynamic. I can kind of see how SI might be a fit here. I think I could create one application that queries the DB and figures out the work parameters and sends those out to the available instances of our application in kind of a round-robin fashion. But am having trouble seeing how to implement it technically. All the applications are running on the same machine, so they have the same IP address. Also, they are not web apps. Would I need to use JMS (which I am not familiar with) or can SI handle this?
You could use JMS, RabbitMQ, Redis, or any number of outbound endpoints to distribute the work.
Let's say you choose to use simple rmi or TCP/UDP; you can simply have a number of outbound endpoints subscribed to the routing channel and SI will round robin the requests (by default).
This would be statically configured though. You would need a little glue if you want to dynamically change the number of servers without using a broker such as JMS or RabbitMQ.
The dynamic FTP sample illustrates a technique of adding new destinations (in that case ftp servers) dynamically.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm building a NodeJS application and am utterly torn between NoSQL MongoDB vs RMDS PostregresSql. My project is to create a open source example project for logging visitors and displaying visitor statistics in real time on a webpage using NodeJS. I was planning on using MongoDB at first, because lot of NodeJS examples and tutorials, albeit mostly older ones, used it and paas hosters with a free tier are abounding. However, I was seeing a lot of bashing on MongoDB recently and found that people who tried to use MongoDB ended up switching to Postgres:
http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
http://dieswaytoofast.blogspot.com/2012/09/mysql-vs-postgres-vs-mongodb.html
http://www.plotprojects.com/why-we-use-postgresql-and-slick/
I also a fan of Heroku and have heard a lot about Postgres because of that and find that SQL queries can be nice sometimes.
I'm not a database expert, so I can't tell for the life of me which way to go. I would really appreciate it if you could give some advice on which one to consider and why.
I have a few criteria:
Since I want this to be a example, it would be nice to have a way to host a decently sized amount of data. I know that MongoDB definitely offers this, but Postgres paas like Heroku seem to have pretty small databases (since I am logging every visitor to the website)
A database that is simplistic and easy to explain to others.
Performance doesn't really matter, but speed can't hurt
Thanks for all of the help!
Note: Please no flame wars, everyone has their own opinion :)
Choosing between an SQL database and a NoSQL database is certainly being debated heavily right now and there are plenty of good articles on the subject. I'll list a couple at the end. I have no problem recommending SQL over NOSQL for your particular needs.
NoSQL has a niche group of use cases where data is stored in large tightly coupled packages called documents rather than in a relational model. In a nutshell, data that is tightly coupled to a single entity (like all the text documents used by a single user) is better stored in a NoSQL document model. Data that behaves like excel spreadsheets and fits nicely in rows and is subject to aggregate calculations is better stored in a SQL database of which postgresql is only one of several good choices.
A third option that you might consider is redis (http://redis.io/) which is a simple key value data store that is extremely fast when querying like SQL but not as rigidly typed.
The example you cite seems to be a straightforward row/column type problem. You will find the SQL syntax is much less arcane than the query syntax for MongoDB. Node has many things to recommend it and the toolset has matured significantly in the past year. I would recommend using the monogoose npm package as it reduces the amount of boilerplate code that is required with native mongodb and I have not noticed any performance degradation.
http://slashdot.org/topic/bi/sql-vs-nosql-which-is-better/
http://www.infoivy.com/2013/07/nosql-database-comparison-chart-only.html
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 months ago.
Improve this question
I am trying to build a highly available very high volume shopping cart application. The application will have a volume so high that I am considering using cassandra instead of mysql for the database.
Now, in a shopping cart system, most database actions have to be 100% consistent, while others do not have to be.
Example of 100% consistent action:
Saving the payment confirmation.
Saving the purchased items list.
Example of things which do not require 100% consistent action:
Saving the address of the customer (If at the time of payment, no address is saved in the database, assume that it was lost and ask the customer again).
Other similar things.
Now, if I am running a server cluster in the same region (Amazon EC2), are there any major roadblocks to performing all transactions as a maximal consistent transaction. Would that provide identical reliability than mySQl Relational database. Remember, we are dealing with financial transactions here.
Is my data generally "safe" in cassandra. By that I mean complete unexpected power failure, random disc failure, etc, etc.
Specific to your questions about availability and EC2 ... As Theodore wrote, the consistency level in Cassandra will dictate how "safe" the data is. The problems you'll face is how to ensure the data is getting to Cassandra, fulfilling your Transaction goals and is being saved appropriately.
There are some good threads about transactions and solving this problem on the Apache Cassandra User's mailing list.
Cassandra on it's own is not suitable for transactions:
Cassandra - transaction support
To get around this, you need "something" that can leverage Cassandra as a data store that manages the transactions above the data tier.
how to integrate cassandra with zookeeper to support transactions
Cages: http://code.google.com/p/cages/
'Locking and transactions over Cassandra using Cages': http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/
There is a section called "Transactions for Cassandra" for more information
Summary ... You cannot guarantee financial transactions with Cassandra alone
There are lots of different ways to define consistency. If by "maximal consistent transaction", you mean reading and writing at ConsistencyLevel ALL, then that will provide consistency in sense that your reads will never return an out-of-date value, and durability in the sense that your writes will be stored on all nodes before returning.
That's not the same as transactions, however. Cassandra does not support transactions. It doesn't provide consistency between different rows, as MySQL does. For example, suppose you add an item to the shopping basket, and update the total cost in the cart. Individually, each operation will be stored consistently and durably. However, there may be a window of time in which you can see one change but not the other. In a relational database, you can group them into a transaction so that you can only see both, or neither.
As far as safety goes, Cassandra stores all your writes to disk in a commit log before it does anything else, in the same way that relational databases use transaction logs. So it is just as safe with regard to system crashes. With regards to node failures, if you write at CL.ALL, then you will never lose data as long as one node in each replica set survives. With regard to disk failure, that is a matter for your underlying hardware setup, e.g. RAID.
As of 2022 Cassandra supports transactions.
Find out how BestBuy are using it:
https://www.slideshare.net/joelcrabb/cassandra-and-riak-at-bestbuycom