Images in Web application - web

I am working on application in which users will upload huge number of images and i have to show those image webpage
What is the best way to store and retrieve images.
1) Database
2) FileSystem
3) CDN
4) JCR
or something else
What i know is
Database: saving and retrieving image from database will lead to lot of queries to database and will convert blob to file everytime. I think it will degrade the website performance
FileSystem: If i keep image information in database and image file in filesystem there will be sync issues. Like if i took a backup of the database we do have take the backup of images folder. ANd if there are millions of files it will consume lot of server resources
i read it here
http://akashkava.com/blog/127/huge-file-storage-in-database-instead-of-file-system/
Another options are CDNs and JCR
Please suggest the best option
Regards

Using the File System is only really an option if you only plan to deploy to one server (i.e. not several behind a load balancer), OR if all of your servers will have access to a shared File System. It may also be inefficient, unless you cache frequently-accessed files in the application server.
You're right that storing the actual binary data in a Database is perhaps overkill, and not what databases do best.
I'd suggest a combination:
A CDN (such as AWS CloudFront), backed by a publicly-accessible (but crucially publicly read-only) storage such as Amazon S3 would mean that your images are efficiently served, wherever the browsing user is located and cached appropriately in their browser (thus minimising bandwidth). S3 (or similar) means you have an API to upload and manage them from your application servers, without worrying about how all servers (and the outside world) will access them.
I'd suggest perhaps holding meta data about each image in a Database however. This means that you could assign each image a unique key (generated by your database), add extra info (file format, size, tags, author, etc), and also store the path to S3 (or similar) via the CDN as the publicly-accessible path to the image.
This combination of Database and shared publicly-accessible storage is probably a good mix, giving you the best of both worlds. The Database also means that if you need to move / change or bulk delete images in future (perhaps deleting all images uploaded by an author who is deleting their account), you can perform an efficient Database query to gather metadata, followed by updating / changing the stored images at the S3 locations the Database says they exist.
You say you want to display the images on a web page. This combination means that the application server can query the database efficiently for the image selection you want to show (including restricting by author, pagination, etc), then generate a view containing images referring to the correct CDN path. It means viewing the images is also quite efficient as you combine dynamic content (the page upon which the images are shown) with static content (the image themselves via the CDN).

CDNs may be a good option for you.
You can store the link to the images along with the image information in your database.

Related

Retrive pdf's link from google drive and upload it on mongodb?

I want to create a website to provide PDF notes to students.
I am planning to upload pdf on my googledrive and want to serve those pdf using my website.
I am using MongoDb As Database and my plan is to read link of pdf from my google drive and want to save it on collection of mongodb so that it will render on my website dynamically.
Is there any convenient way to do this automatically rather than copy pasting each link.(because it will be a time draining stuff and we have to upload 1000 of pdfs).
If this method is not possible Please suggest me some better way to do this work using nodejs and mongodb.
If using Google Drive is not a must, I recommend you to use MongoDB's GridFS. GridFS allows you to store documents that are higher than 16MB. In this way, you do not need to manage/update document links, because MongoDB will be already in charge of metadata and the document itself.
If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.
When you want to access information from portions of large files without having to load whole files into memory, you can use GridFS to recall sections of files without reading the entire file into memory.
When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities, you can use GridFS. When using geographically distributed replica sets, MongoDB can distribute files and their metadata automatically to a number of mongod instances and facilities.
This tutorial might be the starting point.

Best way to save images on Amazon S3 and distribute them using CloudFront

The application I'm working on (nodejs) has user profiles and each profile can have multiple images. I'm using S3 as my main storage and CloudFront to distribute them.
The thing is sometimes users upload large images and what I want to do is to scale the image when downloading it (view it in an html img tag, or a mobile phone) mainly because of performance.
I don't know if I should scale the image BEFORE uploading it to S3 (maybe using lwip https://github.com/EyalAr/lwip) or is there a way of scaling the image or getting a low quality image when downloading it through CloudFront?. I've read that CloudFront can compress the files using Gzip but also not recommended for images.
I also don't want to upload a scaled + original image to S3 because of the storage.
Should be done in client, server or S3? what is the best way of doing it?
is there a way of scaling the image or getting a low quality image when downloading it through CloudFront?
There is no feature like this. If you want the image resized, resampled, scaled, compressed, etc., you need to do it before it is saved to its final location in S3.
Note that I say its final location in S3.
One solution is to upload the image to an intermediate location in S3, perhaps in a different bucket, and then resize it with code that modifies the image and stores it in the final S3 location, whence CloudFront will fetch it on behalf of the downloading user.
I've read that CloudFront can compress the files using Gzip but also not recommended for images.
Images benefit very little from gzip compression, but the CloudFront documentation also indicates that CloudFront doesn't compress anything that isn't in some way formatted as text, which tends to benefit much more from gzip compression.
I also don't want to upload a scaled + original image to S3 because of the storage.
I believe this is a mistake on your part.
"Compressing" images is not like compressing a zip file. Compressing images is lossy. You cannot reconstruct the original image from the compressed version because image compression as discussed here -- by definition -- is the deliberate discarding information from the image to the point that the size is within the desired range and while the quality is in an acceptable range. Image compression is both a science and an art. If you don't retain the original image, and you later decide that you want to modify your image compression algorithm (either because you later decide the sizes are still too large or because you decide the original algorithm was too aggressive and resulted in unacceptably low quality), you can't run your already-compressed images through the compression algorithm a second time without further loss of quality.
Use S3's STANDARD_IA ("infrequent access") storage class to cut the storage cost of the original images in half, in exchange for more expensive downloads -- because these images will rarely ever be downloaded again, since only you will know their URLs in the bucket where they are stored.
Should be done in client, server or S3?
It can't be done "in" S3 because S3 only stores objects. It doesn't manipulate them.
That leaves two options, but doing it on the server has multiple choices.
When you say "server," you're probably thinking of your web server. That's one option, but this process can be potentially resource-intensive, so you need to account for it in your plans for scalability.
There are projects on GitHub, like this one, designed to do this using AWS Lambda, which provides "serverless" code execution on demand. The code runs on a server, but it's not a server you have to configure or maintain, or pay for when it's not active -- Lambda is billed in 100 millisecond increments. That's the second option.
Doing it on the client is of course an option, but seems potentially more problematic and error-prone, not to mention that some solutions would be platform-specific.
There isn't a "best" way to accomplish this task.
If you aren't familiar with EXIF metadata, you need to familiarize yourself with that, as well. In addition to resampling/resizing, you probably also need to strip some of the metadata from user-contributed images, to avoid revealing sensitive data that your users may not realize is attached to their images -- such as the GPS coordinates where the photo was taken. Some sites also watermark their user-submitted images this also would be something you'd probably do at the same time.
I would store the images in S3 in STANDARD_IA and then resize them on the fly with Lambda running nodejs and sharp to do the heavy lifting. Google does something similar I believe since you can request your profile img in any dimensions.
The AWS Networking & Content Deliver blog has a post that may give you a lot of what you need. Check it out here.
The basic idea is this:
Upload the image to S3 like normal (you can do STANDARD_IA to save on costs if you want)
Send requests to Cloudfront with query parameters that include the size of image you want (i.e. https ://static.mydomain.com/images/image.jpg?d=100×100)
Using Lambda Edge functions, you can build the resized images and store them in s3 as needed before their served up via the CDN. Once a resized version is created, its always available in S3
Cloudfront returns the newly resized image, which was just created.
Its a bit more work, but it gets you some great resizing to whatever size you want/need as you need it. It also gives you flexibility to change your img size you want to serve to the client from the UI at any time. Here are a couple similar posts.. some don't even use Cloudfront, but just serve through ApiGateway as the intermediary.
https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/
https://github.com/awslabs/serverless-image-resizing

How can I add images in mongoDB?

I need to add images to my mongoDB using Node and Express.
I am able to normal data in it by running the mongo shell. But I cannot find any method to add images to it.
Can anybody help?
Please don't do this. Databases are not particularly well suited to storing large bits of data like images, files, etc.
Instead: you should store your images in a dedicated static file store like Amazon S3, then store a LINK to that image in your MongoDB record.
This is a lot better in terms of general performance and function because:
It will reduce your database hosting costs (it is cheaper to store large files in S3 or other file services than in a database).
It will improve database query performance: DBs are fast at querying small pieces of data, but bad at returning large volumes of data (like files).
It will make your site or application much faster: instead of needing to query the DB for your image when you need it, you can simply output the image link and it will be rendered immediately.
Overall: it is a much better / safer / faster strategy.

What is a good architecture to store user files if using Mongo schema?

Simply, I need to build an app to store images for users. So each user can upload images and view them on the app.
I am using NodeJS and Mongo/Mongoose.
Is this a good approach to handle this case:
When the user uploads the image file, I will store it locally.
I will use Multer to store the file.
Each user will have a separate folder created by his username.
In the user schema, I will define a string array that records the file path.
When user needs to retrieve the file, I will check the file path, retrieve it from the local disk.
Now my questions are:
Is this a good approach (storing in local file system and storing path in schema?
Is there any reason to use GridFS, if the file sizes are small (<1MB)?
If I am planning to use S3 to store files later, is this a good strategy?
This is my first time with a DB application like this so very much appreciate some guidance.
Thank you.
1) Yes, storing the location within your database for use within your application and the physical file elsewhere is an appropriate solution. Depending on the data store and number of files it can be detrimental to store within a database as it can impede processes like backup and replication if there are many large files
2) I admit that I don't know GridFS but the documentation says it is for files larger than 16MB so it sounds like you don't need it yet
3) S3 is a fantastic product and enables edge caching and backup through services and many others too. I think your choice needs to look at what AWS provides and if you need it e.g. global caching or replication to different countries and data centres. Different features cause different price points but personally I find the S3 platform excellent and have around 500G loaded there for different purposes.

Use Sql Server FileStream or traditional File Server?

I am designing a system that's going to have about 10 millions+ users, each has a photo, which is about 1~2 MB.
We are going to deploy both database and web app using Microsoft Azure
I am wondering the way I should store the photos, there are currently two options,
1, Store all photos use Sql Server FileStream
2, Use File Server
I haven't experienced such large scale BLOB data using FileStream.
Can anybody give my any suggestion? The Cons and Pros?
And anyone with Microsoft Azure experiences concerning the large photos store is really appreciated!
Thx
Ryan.
I vote for neither. Use Windows Azure Blob storage. Simple REST API, $0.15/GB/month. You can even serve the images directly from there, if you make them public (like <img src="http://myaccount.blob.core.windows.net/container/image.jpg" />), meaning you don't have to funnel them through your web app.
Database is almost always a horrible choice for any large-scale binary storage needs. Database is best for relational-only systems, and instead, provide references in your database to the actual storage location. There's a few factors you should consider:
Cost - SQL Azure costs quite a lot per GB of storage, and has small storage limitations (50GB per database), both of which make it a poor choice for binary data. Windows Azure Blob storage is vastly cheaper for serving up binary objects (though has a bit more complicated pricing system, still vastly cheaper per GB).
Throughput - SQL Azure has pretty good throughput, as it can scale well, however, Windows Azure Blog storage has even greater throughput as it can scale to any number of nodes.
Content Delivery Network - A feature not available to SQL Azure (though a complex, custom wrapper could be created), but can easily be setup within minutes to piggy-back off your Windows Azure Blob storage to provide limitless bandwidth to your end-users, so you never have to worry about your binary objects being a bottleneck in your system. CDN costs are similar to that of Blob storage, but you can find all that stuff here: http://www.microsoft.com/windowsazure/pricing/#windows
In other words, no reason not to go with Blob storage. It is simple to use, cost effective, and will scale to any needs.
I can't speak on anything Azure related but for my money the biggest advantage of using FILESTREAM is that that data can get backed up inside the normal SQL Server backup process. The size of the data that you are talking about also suggests that FILESTREAM may be a good choice as well.
I've worked on a SCM system with a RDBMS back end and one of our big decisions was whether to store the file deltas on the file system or inside the DB itself. Because it was cross-RDBMS we had to cook up a generic non-FILESTREAM way of doing it but the ability to do a single shot backup sold us.
FILESTREAM is a horrible option for storing images. I'm surprised MS ever promoted it.
We're currently using it for our images on our website. Mainly the user generated images and any CMS related stuff that admins create. The decision to use FILESTREAM was made before I started. The biggest issue is related to serving the images up. You better have a CDN sitting in front. If not, plan on your system coming to a screeching halt. Of course, most sites have a CDN, but you don't want to be at the mercy of that service going down meaning your system will get overloaded. The amount of stress put on your sql server is the main problem here.
In terms of ease of backup. Your tradeoff there is that your db is MUCH MUCH LARGER and, therefore, the backup takes longer. Potentially, much longer and the system runs slower during the backup. Not to mention, moving backups around takes longer (i.e., restoring prod data in a dev environment or on local machines for dev purposes). Don't use this as a deciding factor.
Most cloud services have automatic redundancy of any files that you store on their system (i.e., aws's S3 and azure's blob). If you're on premise, just make sure you use a shared location for the images and make sure that location is backed up. I think the best option is to set it up so each image (other UGC file types too) has an entry in your db with a path to that file. Going one step further, separate the root path into a config setting and only store the remaining path with the entry. For example, root path in config might be a base url, a shared drive or virtual dir, or a blank entry. Then your entry might have "/files/images/image.jpg". This way, if you move your filestore, you can just update the root config. I would also suggest creating a FileStoreProvider interface (Singleton) that can be used for managing (saving, deleting, updating) these files. This way, if you switch between AWS, Azure, or on premise, you can just create a new Provider.
I have a client server DB, i manage many files (doc, txt, pdf, ...) and all of them go in a filestream BLOB. Customers has 50+ MB dbs. If in azure you can do the same go for it. Having all in the db is a wonderful thing. It is considered good policy also for Postgres and MySQL

Resources