I have 1 minio object storage cluster, everyday it is running so ok but someday when there are many users using to put and download data that make performance too slow and maybe corrupt and especially with small data size. So how the way can i make performance better? pls help me, thanks!
What is your traffic pattern?
You can try SeaweedFS S3, which should scale linearly by adding more file servers. And it excels at lots of small files.
Related
I'm currently working to find a solution that let people upload to an s3 bucket from a web form with a relative high speed.
I've tried already with a couple of nodejs libraries (evaporatejs and resumablejs) but it's kinda low on upload speed (~3.5 MB/s) and with the signedUrl PUT request solution, also very slow (~2 MB/s).
The point is, people should be able to upload to this bucket from all over the world, and we talk files that ranges from ~50MB to a couple GBs, so it's necessary that the speed is considerably high as i don't want a 30 minutes upload.
I was wondering if anyone knows a better way to achieve the goal cuz i've looked around for a week and found only these, but they're not enough.
All answers will be appreciated, thank you in advance.
Along with multipart upload, try out s3 transfer acceleration. With transfer acceleration, client data will be loaded to the nearest edge location, and from there it will go to your S3 bucket. It can improve your performance anywhere from 10 to 50%.
S3 transfer acceleration - https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration.html
There is one drawback of this approach. For clients who are near to S3 bucket location, it adds extra hop to move data from client to S3. But that is a small penalty compared to the overall advantage of using S3 transfer acceleration.
S3 transfer acceleration speed comparison - http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html
I am loading a few big JSON data from 3rd party API on server startup and write them into .JSON files (150mb json files), loading it into an object whenever I need to use it.
The thing is, I am not sure this is the right and efficient way to do so. Should I use a database instead? If yes, could you mention which one to use?
Thanks.
glad to answer your question.
Modern databases are already able to keep up with large file sizes, so in this case size would not be an issue.
However, the issue regarding performance is that it still depends on the usage and purpose of the application.
For example, sometimes the application might require content caching, in this case most databases already have this function built-in, however, there are also applications where this won't apply.
This issue also discusses the comparison of disk storage and database storage, there are lots of good answers in there, I hope it will help.
I have Cassandra running in two different DC, and now it's time to scale it up and add more storage. Unfortunately, I'm not able to add storage on the existing partitions due to restrictions/limitations. I'd like to know would it be a good idea to use one common mount(NFS) to store the data. I know Cassandra is distributed across many nodes but can they share a common mount to access the data?
Thank you,
No, it is not a good idea to do that. Essentially, you're trading disk I/O for network I/O; so it'll perform terribly. Also, you're introducing a single point of failure into your cluster.
DataStax published a blog post on this a couple of years ago. The important thing to remember, is that blog posts don't usually happen about isolated incidents. They happen because someone sees the same thing causing problems over and over again, and they're trying to stop others from rationalizing that same mistake.
https://www.datastax.com/dev/blog/impact-of-shared-storage-on-apache-cassandra
I need to add images to my mongoDB using Node and Express.
I am able to normal data in it by running the mongo shell. But I cannot find any method to add images to it.
Can anybody help?
Please don't do this. Databases are not particularly well suited to storing large bits of data like images, files, etc.
Instead: you should store your images in a dedicated static file store like Amazon S3, then store a LINK to that image in your MongoDB record.
This is a lot better in terms of general performance and function because:
It will reduce your database hosting costs (it is cheaper to store large files in S3 or other file services than in a database).
It will improve database query performance: DBs are fast at querying small pieces of data, but bad at returning large volumes of data (like files).
It will make your site or application much faster: instead of needing to query the DB for your image when you need it, you can simply output the image link and it will be rendered immediately.
Overall: it is a much better / safer / faster strategy.
My website is written in Node.js, has no database or external dependencies, but does have lot of large media files (images and some video) totalling some 2gb. The structure of the website is drawn from a couple of simple JSON files.
My problem is drastic and sudden scaling. Traffic to my site is usually easily handled by any small VPS instance, but occasionally traffic can get to hundreds of times it normal level for short periods. My problem is how to scale quickly, without downtime, and automatically. I know there are issues with autoscaling, but perhaps lacking a database will negate some of that.
What sort of scaling issues and options should I be looking at?
(For context, I am currently using a Digital Ocean VPS, but I can't find a clean way to scale it with no downtime. I am not wedded to my provider.)
Scalability is important, but scaling when you need to is also important. We all do not have the scaling needs of Facebook or Twitter : ) This might just be a case of resource management.
Test the problem
Without a database and using NodeJS, some of the strengths of node are its number of concurrent connections. For simple io load, it would seem you have picked a good choice of framework. And, since your problem set is a particular resource being bombarded, run some load testing on your server. Popular and free tools include:
Apache Bench
httperf
OpenLoad
And there are pay service like NeoLoad, LoadImpact (which is free at small levels), forecastweb, E-Load, etc..
With those results, Determine the Cause
Is it the size of the file being served? Is it the number of concurrent requests? What resources are being used, or maxed out, during a slowdown (ram, ports, file system, some other IO, CPU, bandwidth, etc...)?
Have a look at this question, which defines a few concepts for server load. To implement a solution, you will need to determine the cause of the slowdown. Is it: 1)Some queues fill up? 2)Problem with TCP Connections and Ports? 3) Too slow allocation of resources? That will help shape your solution.
Plan for scaling.
The type of scaling needed for your project may only be the portion needed for another. If you know the root cause in this case, it will increase your options.
Is the problem bandwidth? Perhaps using your web server as a router to multiple cloud instances of file serving would effectively increase the bandwidth your users see. Even just storing your files on a larger cloud that can guarantee the bandwidth you may need.
Is the problem CPU, RAM, etc? You may need multiple instances of the same web app (or an increased allotment for your VFS). This is the "Elastic" portion of Amazon's Elastic Cloud Computing (EC2), and other models like it. Create a "golden image" and duplicate when you see traffic start spiking, using built-in monitoring tools, turning it off when the rush is done. Can be programatic or simply manual.
Is the problem concurrent requests? The bottleneck should not be NodeJS, up to 1000's of concurrent requests anyway. Perhaps just check your implementation to ensure there is not a slowdown of the single node thread. Maybe node clustering or some worker threads would alleviate the bottleneck enough for your purposes.
Last Note: For serving static files I've heard nginx or even Apache Tomcat is a little more well-suited than NodeJS. Depending on your web app's complexity, you might be able to switch or benchmark fairly easily.
In case anyone is reading this rather specific question years later, I have gained some perspective on it. As Clay says, the ultimate answer is to spin up more servers, either manually or programatically based on load.
However, in my case that would be massive overkill - I'm not running Twitter. The problem was a relatively simple mistake in architecture. My app was reading the JSON data files from disk with every page request, and the disk I/O was getting saturated. I changed to loading the data files into memory on startup, and reloading them when they change using fs.watch().
My modest VPS can now easily handle the sorts of traffic that would previously crash it. I've never seen traffic that would make me want to up-size it.