I am learning how to use socket.io and nodejs. In this answer they explain how to store users who are online in an array in nodejs. This is done without storing them in the database. How reliable is this?
Is data stored in the server reliable does the data always stay the way it is intended?
Is it advisable to even store data in the server? I am thinking of a scenario where there are millions of users.
Is it that there is always one instance of the server running even when the app is served from different locations? If not, will storing data in the server bring up inconsistencies between the different server instances?
Congrats on your learning so far! I hope you're having fun with it.
Is data stored in the server reliable does the data always stay the way it is intended?
No, storing data on the server is generally not reliable enough, unless you manage your server in its entirety. With managed services, storing data on the server should never be done because it could easily be wiped by the party managing your server.
Is it advisable to even store data in the server? I am thinking of a scenario where there are millions of users.
It is not advisable at all, you need a DB of some sort.
Is it that there is always one instance of the server running even when the app is served from different locations? If not, will storing data in the server bring up inconsistencies between the different server instances?
The way this works typically is that the server is always running, and has some basics information regarding its configuration stored locally - when scaling, hosted services are able to increase the processing capacity automatically, and handle load balancing in the background. Whenever the server is retrieving data for you, it requests it from the database, and then it's loaded into RAM (memory). In the example of the user, you would store the user data in a table or document (relational databases vs document oriented database) and then load them into memory to manipulate the data using 'functions'.
Additionally, to learn more about your 'data inconsistency' concern, look up concurrency as it pertains to databases, and data race conditions.
Hope that helps!
Related
I am building a Node.js application which uses a few global variables to track data such as online users and statuses, information about other servers, and ongoing events, but having this information be lost in the event of server restart/crash is not ideal.
As these things are frequently read & modified, I figure it would not be a good idea to put that extra strain on my existing MySQL database. I have looked into Redis but unfortunately my application is hosted on a Windows server so I would have to use an old unsupported version of it which isn't ideal.
I'm currently considering setting up a NoSQL database such as MongoDB, but I'm not sure if this is an efficient solution and if it would be too much on my relatively weak server to have an application and 2 different databases running.
What would be the best solution for persistent storage of data that needs to be frequently accessed and updated by an application?
Making my comments into an answer...
If it's a reasonable amount of data, you can just write JSON to a single data file. No database required. Just overwrite the file with a new block of JSON to save the new state. This is very fast, efficient and simple. I've used this before as a quick and easy way to regularly save snapshots of state that you want to be able to reload if your server restarts. Read the state into memory upon server start, then use it from memory, then regularly save a new snapshot to disk however often your application desires.
If some data changes a lot and some data doesn't change very much, you can break the data into multiple files so you're writing less data on the more frequent interval. Obviously, there is a threshold of amount of data or frequency of writes or complexity of data access where a database would be warranted, but you should at least consider the simpler option first and only add a new database when you think you really need it.
If you cluster your servers in the future, that would speak to a multi-user database (one with appropriate concurrency management features) to be your master keeper of state, but you're going to have other design issues to work through if you're trying to share multi-user state (like online status) across all clustered servers as you can no longer keep that in memory for any server unless all state changes are broadcast to all servers so they can update their in-memory copy of the data or unless you make users sticky to a particular server (which complicates load balancing in clustering). That does somewhat call for a redis-like central store that all clustered servers can access.
I am making a website with nodejs and mongodb which records the username of the currently online users. I wonder whether it would be better practice to store this in an array created during the website's runtime or should I store it in a database?
I agree with explorer. Generally, when an app is in production, you store information in some sort of database. This insures that your application uses the least possible RAM, assuming that you write decent code. Also, if your application crashes for some unforeseen reason, you can recover quickly and your data isn't lost.
I have been working on a Web App for visualizing live data. It is crucial that this data is kept up to date on the client side without such updates being invoked directly by the client (e.g. no button presses or refreshing the page). Currently, on page load, I grab the current data set from a database (DynamoDB) via Ajax, and subsequent updates are pushed to any listening clients every 5 minutes via a Websockets connection (using Socket.io).
I have overlooked the computational load of this update job. It has to mine some data, process it, update the database, and send the update out to all clients. As a result, the web server is left unresponsive for about 30 seconds with each update. Furthermore, my current architecture limits me from putting my server behind a load balancer, which is something I anticipate coming up in the future. For both these reasons, I really need to get this update job off my web server.
I am relatively inexperienced in web development, and I don't feel I am knowledgeable enough about these technologies to know the drawbacks of the solutions I have come up with. Currently, I am considering:
Break the update off into a separate process so it does not block the Node event loop. This would solve my issue in the short term, but if I ever want to load balance my application, I can't have the update running on multiple machines.
Drop Websockets entirely and just have the client query the database every 5 minutes, while a separate process (or separate server if I want load balancing) keeps the database up to date without interacting directly with the client. Will this kind of access pattern put too much load on my db?
Have a separate server run the update, and send the result via Websockets (or maybe some other protocol) to my load balanced application servers, which then push that update to all listening clients as usual. Is this even possible?
Perhaps there are other solutions. It seems like this would be a relatively common problem, so I was hoping I could find some guidance here. What are the potential issues with the solutions I have proposed, and are there other possible solutions that my suit my use case better?
It sounds like you want one process sitting somewhere which crunches the data and publishes it to a stream. Clients can then subscribe to the stream as and when they like. Redis handles streams nicely, you could process your data and push it into a redis stream. You could then create a small node service which subscribes to the redis stream and pushes the formatted data out over a websocket or via polling.
In this scenario you can then scale up either the publishing process (the one crunching the numbers) if your data load goes up, or scale up your subscribed process (which serves the data over a websocket to browsers) if you get an influx of clients watching the data.
You can also easily distribute the hosting of these services across other machines, and even write them in different languages if you decide the number crunching needs something like threading.
You're then left with the issue of clients (web browsers) consuming this data with a load balance in-between. This can be a hard problem if you use websockets and is bundled with pros and cons. But importantly you'll have separated your data crunching from your result publishing and that'll isolate out your issue to only the load balancing.
I have done pretty much the same to check ressources on some of our servers.
I have a C# service getting the information on each server that we manage, sending them to a queue (Amq).
From there, I have a stomp client fetching data from amq and emiting them to a websocket.
My main micro service is fetching the data to save them into a db.
My visualisation webapp is connected to the same ws and is fetching the data as they are sent to display them.
The Amq step isn't mandatory at all, it's just something I had to work with (historical).
I don't know what type of data your are working with, so I don't know if my solution can apply to you.
Don't hesitate if I'm not clear or you have any question.
This is a big question and I'm not going to try and give you a definitive answer.
For option 2
It really depends on how expensive your queries are. You can make DynamoDB fast if you pay for enough throughput. That said, on the face it, re-loading your whole dataset, when that sounds like its probably large, probably isn't good engineering.
For option 3
This option seems best to me if its achievable, although admittedly its hard to say with such a complex system - obviously you can't share your whole project.
Given your are already using AWS you might want to look into AWS Lambda. If you can move the update process into a stand alone job, you can host it on lambda and move the load off the web server. Lambda is essentially infinitely scalable and you only pay for the compute you use.
This really depends on you being able to split the update task off into a separate service. Its likely you would need a fair bit of refactoring to isolate it as a service. If you can break little bits off at a time, and make the move gradually, even better.
If you consider trying this, and you've not used Lambda before, I would definitely start small with some hello world examples. Then try a very simple service in your application, and build up to taking on the update service.
You might also consider looking in AWS Simple Message Queue Service to handle the comms between clients and server.
Database tuning
If a lot of your update time is spent waiting for database actions to complete, rather than server processing, you can consider tuning that side of things up. Things to consider are:
Buying more throughput
Using batch operations (as these move load to DynamoDB from your server)
Tuning keys, indexes and database access
Trying to decide between DynamoDB and CouchDB for my website. It's a static site (built with a static site generator) and I'm planning on using a JavaScript module to build a comment system.
I'm toying with using PouchDB and CouchDB so that synchronizing is easy. I'm also considering DynamoDB.
I have a performance question. From these databases, do any of them push data out to edge locations so that latency is reduced? Or is my Database essentially sitting on one virtual server somewhere?
From what I know, neither of these solutions utilise edge locations ootb.
Since you're mentioning PouchDB, I assume you want to use a client-side database in your app?
If that's the case you should keep in mind that, in order to sync, a client-side DB needs to have access to your cloud db. So it's not really suitable for a comment system since all client could just drop comments of others, edit them, etc.
I am working on inventory application (C# .net 4.0) that will simultaneously inventory dozens of workstations and write the results to a central database. To save me having to write a DAL I am thinking of using Fluent NHibernate which I have never used before.
It is safe and good practice to allow the inventory application which runs as a standalone application to talk directly to the database using Nhibernate? Or should I be using a client server model where all access to the database is via a server which then reads/writes to database. In other words if 50 workstations when currently being inventoried there would be 50 active DB sessions. I am thinking of using GUID-Comb for the PK ID's.
Depending on the environment in which your application will be deployed, you should also consider that direct database connections to a central server might not always be allowed for security reasons.
Creating a simple REST Service with WCF (using WebServiceHost) and simply POST'ing or PUT'ing your inventory data (using HttpClient) might provide a good alternative.
As a result, clients can get very simple and can be written for other systems easily (linux? android?) and the server has full control over how and where data is stored.
it depends ;)
NHibernate has optimistic concurrency control ootb which is good enough for many situations. So if you just create data on 50 different stations there should be no problem. If creating data on one station depends on data from all stations it gets tricky and a central server would help.