I am designing a system that's going to have about 10 millions+ users, each has a photo, which is about 1~2 MB.
We are going to deploy both database and web app using Microsoft Azure
I am wondering the way I should store the photos, there are currently two options,
1, Store all photos use Sql Server FileStream
2, Use File Server
I haven't experienced such large scale BLOB data using FileStream.
Can anybody give my any suggestion? The Cons and Pros?
And anyone with Microsoft Azure experiences concerning the large photos store is really appreciated!
Thx
Ryan.
I vote for neither. Use Windows Azure Blob storage. Simple REST API, $0.15/GB/month. You can even serve the images directly from there, if you make them public (like <img src="http://myaccount.blob.core.windows.net/container/image.jpg" />), meaning you don't have to funnel them through your web app.
Database is almost always a horrible choice for any large-scale binary storage needs. Database is best for relational-only systems, and instead, provide references in your database to the actual storage location. There's a few factors you should consider:
Cost - SQL Azure costs quite a lot per GB of storage, and has small storage limitations (50GB per database), both of which make it a poor choice for binary data. Windows Azure Blob storage is vastly cheaper for serving up binary objects (though has a bit more complicated pricing system, still vastly cheaper per GB).
Throughput - SQL Azure has pretty good throughput, as it can scale well, however, Windows Azure Blog storage has even greater throughput as it can scale to any number of nodes.
Content Delivery Network - A feature not available to SQL Azure (though a complex, custom wrapper could be created), but can easily be setup within minutes to piggy-back off your Windows Azure Blob storage to provide limitless bandwidth to your end-users, so you never have to worry about your binary objects being a bottleneck in your system. CDN costs are similar to that of Blob storage, but you can find all that stuff here: http://www.microsoft.com/windowsazure/pricing/#windows
In other words, no reason not to go with Blob storage. It is simple to use, cost effective, and will scale to any needs.
I can't speak on anything Azure related but for my money the biggest advantage of using FILESTREAM is that that data can get backed up inside the normal SQL Server backup process. The size of the data that you are talking about also suggests that FILESTREAM may be a good choice as well.
I've worked on a SCM system with a RDBMS back end and one of our big decisions was whether to store the file deltas on the file system or inside the DB itself. Because it was cross-RDBMS we had to cook up a generic non-FILESTREAM way of doing it but the ability to do a single shot backup sold us.
FILESTREAM is a horrible option for storing images. I'm surprised MS ever promoted it.
We're currently using it for our images on our website. Mainly the user generated images and any CMS related stuff that admins create. The decision to use FILESTREAM was made before I started. The biggest issue is related to serving the images up. You better have a CDN sitting in front. If not, plan on your system coming to a screeching halt. Of course, most sites have a CDN, but you don't want to be at the mercy of that service going down meaning your system will get overloaded. The amount of stress put on your sql server is the main problem here.
In terms of ease of backup. Your tradeoff there is that your db is MUCH MUCH LARGER and, therefore, the backup takes longer. Potentially, much longer and the system runs slower during the backup. Not to mention, moving backups around takes longer (i.e., restoring prod data in a dev environment or on local machines for dev purposes). Don't use this as a deciding factor.
Most cloud services have automatic redundancy of any files that you store on their system (i.e., aws's S3 and azure's blob). If you're on premise, just make sure you use a shared location for the images and make sure that location is backed up. I think the best option is to set it up so each image (other UGC file types too) has an entry in your db with a path to that file. Going one step further, separate the root path into a config setting and only store the remaining path with the entry. For example, root path in config might be a base url, a shared drive or virtual dir, or a blank entry. Then your entry might have "/files/images/image.jpg". This way, if you move your filestore, you can just update the root config. I would also suggest creating a FileStoreProvider interface (Singleton) that can be used for managing (saving, deleting, updating) these files. This way, if you switch between AWS, Azure, or on premise, you can just create a new Provider.
I have a client server DB, i manage many files (doc, txt, pdf, ...) and all of them go in a filestream BLOB. Customers has 50+ MB dbs. If in azure you can do the same go for it. Having all in the db is a wonderful thing. It is considered good policy also for Postgres and MySQL
Related
I'm trying to create a distributed system which contains mobile app, web userpanel and an API that communicates with DB. I want the user to be able to upload a profile image both from the mobile app and the web userpanel but what is the best and "right" way to store images accross a distributed system? Cant really find anything describing best practices on this topic.
I know that the filepath should be in database, and the image in a file system. But should that file system be on the API server or where?
Here is an diagram of what i think the distributed system should be like.
The "right" way to do something complex like image hosting depends on factors like expected traffic and performance expectations. Designing large systems involves a lot of tradeoffs, so it's best to nail down what requirements are for your system are in order to make decisions that serve those requirements.
As for your question, this diagram is roughly correct - you want to store the location of the uploaded image separate from the image itself. If you wanted your solution to be more scalable, an approach would be turning your file system into its own service with its own API. You would store a hash of the file in your database to reference it rather than its path, then request that image (or a URL to that image) from the new storage service by asking the storage service's API for the file that has the stored hash.
The reason this is more scalable is that the storage service is free to become its own distributed system when we don't require that every file has an associated file system path within a single namespace. A hash is a good candidate for a replacement of the filesystem path, but you could come up with your own storage ID scheme depending on your needs.
However, this may be wildly out of scope for what you are trying to design. If you only expect to have a few thousand users, storing your images and database on your API server in the file system isn't necessarily wrong, but you might experience growing pains if the requirements of your system grow.
Google's site reliability engineer classroom has a lesson on building a distributed image server, which is an adjacent problem to what you're looking to do: https://sre.google/classroom/imageserver/
I've got a website that will need to access a file on the file system (or somewhere) containing some template text used to send an email. I'd like a suggestion for where to store the file and how to access / find the file at runtime for both azure web roles and azure web sites.
So far, I've read about Azure Local Storage, but that seems to only be an option for web roles, and not available for azure websites (I think?). Plus, I'm not sure how the file would make its way into the storage.
The other option I was thinking about was adding the file to the VS solution and marking it as content, in which case I believe it would be deployed with the other files. But in this case, I don't know how to get the path to access the file form the .NET code. Also, with this, I believe that I would need to redeploy the entire solution in order to update the file.
I would appreciate any thoughts on this. Thanks...
Using a non-local storage system is your best approach, it is highly unlikely your speed requirements will be that intense it will need intense performance improvements.
I would recommend blob storage in the same region as your website/cloud service.
If you have extreme loads and need that file loaded rapidly, then have an in-memory cache set to 5 minutes or something low to store the template. Each time it checks the cache, if its not there it loads in the cache from storage then provides the resource.
You may look at using cache if you are getting a constant 1 request per second or higher. Anything lower than that then just stick to reading on demand directly from the blob storage.
If you really want to get something locally off the disk then do
Server.MapPath("~/YourFolder/YourFile.ext")
I have been developing Windows Phone Apps for a while now, since WP7 first came out. I have written countless apps, but never actually released any that use an external service.
I am finally getting ready to release one of my first apps that requires a service, and have decided to go with Azure as my host.
Now for the question:
For this specific App, I need an offering that will allow me to host a very small amount of images and text, not even in the hundreds at this time. From what I have looked up, it seems like a database would be the preferred method of storing such a small amount of data, however, thinking into the future, would it be better for me to get the smallest table or blob storage (200gb) and use that? I will most likely be writing other apps that will most likely also require services, however, it is hard to tell what kind of services I would need. I could require a database rather than a blob if I am not storing images... or I may require a blob if I am, again, storing images...
If anyone has been in this situation before, which would you recommend, and why?
I would store images in blobs and other information in Table Services or Sql Database.
Which one to choose? It will vary according to your requirement.
See
http://blogs.msdn.com/b/writingdata_services/archive/2012/07/26/windows-azure-storage-sql-database-versus-table-storage.aspx
http://msdn.microsoft.com/library/azure/jj553018.aspx
SQL Azure storage is a lot more expensive than Windows Azure Storage. Would implementing a no-sql solution like RavenDB allow me to store data on the cheaper Azure Storage?
Are there other things to consider, like backup, speed or security?
Thank you.
You have to consider that with SQL Azure you not only get the storage, but the database server too. If you implement RavenDB, you will will need a worker role to host it in and, in order to allow for failure of that worker role, another worker role (replica), which also doubles up the storage.
Bear in mind that with SQL Azure you get a highly available (3x replicated with failover) SQL solution that surfaces a familiar (ADO.NET) API. Make your choices based on aspects other than storage cost, such as operational effort and development effort. If you choose RavenDB it should be because of the potential cost savings in development effort (because of the closeness on the document API to the object graph) and operational cost, because RavenDB is 'administered' as part of the application. Cost of storage of actual bytes, particularly at scale, is a marginal consideration.
Adding a bit to #Simon's answer: When considering Table Storage and its low cost, also consider whether you can use it directly, instead of going with an installed-and-managed-by-you NoSQL database engine. As it stands, Table Storage offers a schemaless solution that lets you store essentially a property bag within a row, indexed by partitionkey+rowkey. Does that work for you? Could you work with a few extra tables to give you additional indexing? If so, your storage cost is going to be really low (and still durable, triple-replicated).
If you find yourself writing significant code to manage Table Storage, then it may be more efficient to invest in the Compute instances needed to run RavenDB. When considering this, also consider that you'll likely want larger VM sizes if you're moving significant data (as you get approx. 100Mbps per core). A database like MongoDB, working with memory-mapped files, really ramps up speed-wise with more RAM. Not sure if this is the same with RavenDB.
I have a RESTful service running on azure. Currently, it has zero persistence. (It is just a REST gateway to another api.) I run it in a single, minimal Azure instance, and expect this will handle all the load this will ever get.
I now need to add some very lightweight persistence to it. A simple table, of 40-200 rows, eight data columns. The data is very static.
Doing the whole SQL Azure thing seems big overkill for my needs.
My thoughts have been to use:
An XML file, and load it into memory, as the db. XML file is
deployed with code.
Some better way to deploy XML, so it can be
rolled out/updated easier
SQL Compact (can I do this on Azure?)
___ ?
What is the right path here?
Thank you!
SQL Server Compact would need to store its data somewhere in persistent manner, so you would need to sync it regularly to a persistent storage and that's a lot of extra work and I have no idea how to do that reliably, so it's likely not a very good idea.
For your simple table the Azure Table Storage might be just enough. If that's not enough then SQL Azure is the next choice.
You can use the XML file as your store, there is no harm it it, rather this is a very easy and cost efficient solution, but there is a catch. As you mentioned currently you are using only azure instance, in this case you can store the XML file in your App_Data, but if in future if you want to shift to 2 azure instance, you will have to replicate the App_Data folder. In other words you will need to keep App_Data folder in sync.
Suggestion
Instead of storing file in App_Data store it in BLOB, you can retrieve it using WebClient and the store it in memory.
Pros: The advantage of BLOB is, you don't have to sync it.
Cons: There is a cost associated on the number of transactions you can make. This will depend upon how many times you update the file.
Summary
If you are going to work with only one Azure Instance, use App_Data
More than one Azure Instance, use BLOB with no syncing or use App_Data with sync.
Do not use Azure Table, as BLOB is the designated store provided for this purpose only.
EDIT
From MSDN post
As far as I know, Windows Azure does not support SQL Compact Edition. SQL Compact Edition stores data in file system which will not be synchronized in multiple instances (a web role may be deployed to more than one instance. An instance is similar to a virtual machine). And files stored in file system will lost when the instance is restarted or reimaged.
Hope this helps you.