Getting started with Azure storage: Blobs vs Tables vs SQL Azure - azure

It's quite a topic, blobs vs tables vs SQL, and despite all I read so far I still can't find some proper reasoning on what to use when.
We have a multi-tenant SaaS web-application which we are about to move to Azure. We use an SQL Server 2008 database. We store documents and log information that belongs to the documents. Kinda like dropbox does.
The forums state that you better use Azure Tables when you are considering "large" objects. We typically store hundreds of documents per user where the size of the documents vary from 5kb to 30mb where the vast majority will be around 1MB?
Are there some ground rules when to go for Blobs, Tables, Sql? I already learned that I shouldn't store my documents in SQL since it is too expensive. But when does it get "beneficial" to store the documents in Blobs and when would I be better of with tables? Is there some kind of formula like :
if (objects * MB/object * objectrequested > y) then blobs, else tables

I think Igorek has addressed your SQL Azure concerns. You seem to still have questions about Tables vs Blobs, though.
In your case using Table storage would be annoying. Each property/column in ATS can be at most 64KB, so you would have to split the documents across multiple properties and then reassemble them. There is also a limit of 4MB per entity, which would be a problem. Blob storage has neither of these limitations.
I would tend to use Azure Table Storage when you have smallish entities with many properties that need to be stored and queried separately. So it works wells for stored objects, or small documents with lots of metadata.
Blob storage works better for things without a ton of metadata. It's good for things that might work well as files on a filesystem.

I would store documents themselves in the Azure Blob storage (not table storage). Outside of the fact that it is pretty expensive to store documents in a SQL Azure database that charges a penny per meg (or less depending on volume), SQL database is generally not a good place for documents. SQL is a relational database that provides benefits of ability to do queries, joins, etc. There is usually no benefit to storing large documents or images in a SQL database, especially when there is a highly scalable central storage system that is pretty cheap to store/access.
Now, if you need to search thru the documents themselves, I'd use something like Lucene.NET to provide a search capability for document-based repository.
HTH

Related

Storing IOT Data in Azure: SQL vs Cosmos vs Other Methods

The project I am working on as an architect has got an IOT setup where lots of sensors are sending data like water pressure, temperature etc. to an FTP(cant change it as no control over it due to security). From here few windows service on Azure pull the data and store it into an Azure SQL Database.
Here is my observation with respect to this architecture:
Problems: 1 TB limit in Azure SQL. With higher tier it can go to 4 TB but that's the max. So it does not appear to be infinitely scalable plus with size, the query issues could be a problem. Columnstore index and partitioning seem to be options but size limitation and DTUs is a deal breaker.
Problem-2- IOT data and SQL Database(downstream storage) seem to be tightly coupled. If some customer wants to extract few months of data or even more with millions of rows, DB will get busy and possibly throttle other customers due to DTU exhaustion.
I would like to have some ideas on possibly scaling this further. SQL DB is great and with JSON support it is awesome but a it is not horizontally scalable solution.
Here is what I am thinking:
All the messages should be consumed from FTP by Azure IOT hub by some means.
From the central hub, I want to push all messages to Azure Blob Storage in 128 MB files for later analysis at cheap cost.
At the same time,  I would like all messages to go to IOT hub and from there to Azure CosmosDB(for long term storage)\Azure SQL DB(Long term but not sure due to size restriction).
I am keeping data in blob storage because if client wants or hires a Machine learning team to create some models, I would prefer them to pull data from Blob storage rather than hitting my DB.
Kindly suggest few ideas on this. Thanks in advance!!
Chandan Jha
First, Azure SQL DB does have Hyperscale which is much larger than 4TB. That said, there is a tipping point where it makes sense to consider alternative architectures when you get to be bigger than what one machine can handle for your solution. While CosmosDB does give you a horizontal sharding solution, you can do the same with N SQL Databases (there are libraries to help there). Stepping back, it is actually pretty important to understand what you want to do with the data if it were in a database. Both CosmosDB and SQL DB are set up for OLTP-style operations (with some limited forms of broader queries - SQL DB supports columnstore and batch mode, for example, which means you could do a reasonably-sized data mart just fine there too). If you are just storing things in the database in the hope of needing to support future data scientists, then you may or may not really need either of these two OLTP stores.
Synapse SQL is set up for analytics and generally has support to read from data in formats in Azure Storage. So, this may be a better strategy if you want to support arbitrarily-large IoT data and do analytics/ML processing over it.
If you know your solution will never be above , you may not need to consider something like Synapse, but it is set up for those scenarios if you are of sufficient size.
Option - 1:
Why don't you extract and serialize the data based on the partition id (device id), send it over the to IoT hub, where you can have the Azure Functions or Logic Apps that de-serializes the data into files that are stored in the blob containers.
Option - 2:
You can also attempt to create a module that extracts the data into excel file, which is then sent to the IoT hub to be stored in the storage containers.

Sharepoint - external storage for documents

I am now extending some features of our corporate sharepoint, and one of the wishes of my customer is following:
Sharepoint farm has little space allocation, about 2 Gb per department. The department I am working with has hundreds of midsized documents (4-10 Mb in average, pptx, doc, pdf...) per month to be stored permanently. Currently they do it in a NAS file share. They want however to store them in Sharepoint for accessibility, but avoiding the 2Gb limit. Is there a way to integrate the external file storage to the Sharepoint?
Alternatively, is it possible to store the file data in a database (i.E. Access or MS SQL) but avoiding the farm-wide installations of frameworks like RBS?
You can using Remote BLOB Storage (RBS) as part of your data storage solution.
Check the article: Manage RBS in SharePoint Server
Before you migrate your BLOBs out of the database, you’ll need to choose an option. Here are several typical options:
1.File System: You can use a normal file system (perhaps a large disk partition on a file server) to store your BLOBs.
2.SAN/NAS Storage: Storage area network (SAN) and network attached storage (NAS) are usually high-end storage options. They’re expensive, but well-suited if the business value of your documents and their size can justify the cost. Both SAN and NAS provide data replication and mirroring and seamless growth into terabytes of data.
3.Cloud Storage: This is useful in at least two situations. First, when you’re running SharePoint in the cloud, but still want to externalize the BLOBs, your natural choice is to store them in a nearby, vendor-provided cloud storage. The second is when you’re running SharePoint within your own datacenter but want to store or archive all BLOBs in the cloud due to space limitations or reliability issues in your datcenter. Archiving is the most common reason in this situation. Make sure that if a user is creating or modifying document for cloud storage that your EBS or RBS provider does this in the background, as it could degrade performance. You also want to make sure your third-party EBS or RBS provider supports storing BLOBs in the cloud storage.
More information is here: Optimize SharePoint with External Storage
RBS is the way to leverage non-SQL Server storage for your BLOBs. Any other approach you take will be a hack/work around. Only your customer's requirements can tell us if the limitations of these workarounds are acceptable. For example, you could store the files elsewhere (network share or cloud share) and have SharePoint Search index them. In this case you loose out on a consistent UI for managing content and you'd still need the SP hosting team's help to setup the search crawling.
The real answer is to work with your customer to document their business needs and why IT's offering doesn't meet that need so that they can give you more space.

Best solution for dynamic spatial data

I'm trying to find the best solution for storing dynamic spatial data. I wonder if any of Microsoft's Azure solutions could work. Azure Table Storage would let me create a lot of custom and dynamic structures stored on fast SSD disks.
Because of data's dynamic nature, common indexing seems useless. I would also like to create a lot of table-like structures so the whole architecture cannot be static. Using Azure Table Storage I would dynamically create a table based on country, city, etc sorted by latitude or longitude.
I would appreciate any clue.
Azure Table Storage has mostly been replaced by Azure Cosmos DB.
At the time of writing the Table Storage page even says:
The content in this article applies to the original basic Azure Table storage. However, there is now a premium offering for Azure Table storage in public preview that offers throughput-optimized tables, global distribution, and automatic secondary indexes. To learn more and try out the new premium experience, please check out Azure Cosmos DB: Table API.
You can use Cosmos DB via the Table API, but you'll probably find the Document DB API to be more powerful.
Documents are "schema-free". You can just throw your documents in to a collection, and then you can query against them.
You can create documents which have geo-spatial properties which are indexed automatically.
Then you can perform geo-spatial queries against those properties.
For example you might give each of your documents a point, and then create a query to select all documents that are inside of a polygon.
Or maybe you want to find out how far away each document is from a given point.

Azure Table Storage Vs On-premises NoSql

I need to consider a database to store large volumes of data. Though my initial requirement is to simply retrieve chunks of data and save them in excel file, I am expecting more complex use cases for this data in future where the data will be consumed by different applications especially for analytics - hence need to use aggregated queries.
I am open to use either cloud based storage or on-premises storage. I am considering azure storage table (when there is a need to use aggregated data, I can have a wrapper service + cache around azure table storage but eventually will end up with nosql type storage) and on-premises MongoDB. Can someone suggest pros and cons of saving large data in azure table storage Vs on-premises MongoDB/couchbase/ravendb? Cost factor can be ignored.
I suspect this question may end up getting closed due to its broad nature and potential for gathering more opinions than fact. That said:
This is really going to be an app-specific architecture issue, dealing with latency and bandwidth, as well as the need to maintain on-premises servers and other resources. On-prem, you'll have full control of your hardware resources, but if you're doing high-volume querying against your database, from the cloud, your performance will be hampered by latency and bandwidth. Cloud-based storage (whether in MongoDB or any other database) will have the advantage of being neighbors with your app if set up in the same data center.
Keep in mind: Any persistent database store will need to back its data in Azure Storage, meaning a mounted disk backed by Blob storage. You'll need to deal with the 1TB-per-disk size limit (expanding to 16TB on an 8-core box via stripe), and you'll need to compare this to your storage needs. If you need to go beyond 16TB, you'll need to either shard, go with 200TB Table storage, or go with on-prem MongoDB. But... MongoDB and Table Storage are two different beasts, one being document-based with a focus on query strength, the other a key/value store with very high speed discrete lookups. Comparing the two on the notion of on-prem vs cloud is secondary (in my opinion) to comparing functionality as it relates to your app.

Azure Tables or SQL Azure?

I am at the planning stage of a web application that will be hosted in Azure with ASP.NET for the web site and Silverlight within the site for a rich user experience. Should I use Azure Tables or SQL Azure for storing my application data?
Azure Table Storage appears to be less expensive than SQL Azure. It is also more highly scalable than SQL Azure.
SQL Azure is easier to work with if you've been doing a lot of relational database work. If you were porting an application that was already using a SQL database, then moving it to SQL Azure would be the obvious choice, but that's the only situation where I would recommend it.
The main limitation on Azure Tables is the lack of secondary indexes. This was announced at PDC '09 and is currently listed as coming soon, but there hasn't been any time-frame announcement. (See http://windowsazure.uservoice.com/forums/34192-windows-azure-feature-voting/suggestions/396314-support-secondary-indexes?ref=title)
I've seen the proposed use of a hybrid system where you use table and blob storage for the bulk of your data, but use SQL Azure for indexes, searching and filtering. However, I haven't had a chance to try that solution yet myself.
Once the secondary indexes are added to table storage, it will essentially be a cloud based NoSQL system and will be much more useful than it is now.
Despite similar names SQL Azure Tables and Table Storage have very little in common.
Here are a two links that might help you:
Table Storage, a 100x cost factor
Fat Entities on Table Storage
Basically, the first question should wonder about is Does my app really need to scale? If not, then go for SQL Azure.
For those trying to decide between the two options, be sure to factor reporting requirements into the equation. SQL Azure Reporting and other reporting products support SQL Azure out of the box. If you need to generate complex or flexible reports, you'll probably want to avoid Table Storage.
Azure tables are cheaper, simpler and scale better than SQL Azure. SQL Azure is a managed SQL environment, multi-tenant in nature, so you should analyze if your performance requirements are fit for SQL Azure. A premium version of SQL Azure has been announced and is in preview as of this writing (see HERE).
I think the decisive factors to decide between SQL Azure and Azure tables are the following:
Do you need to do complex joins and use secondary indexes? If yes, SQL Azure is the best option.
Do you need stored procedures? If yes, SQL Azure.
Do you need auto-scaling capabilities? Azure tables is the best option.
Rows within an Azure table cannot exceed 4MB in size. If you need to store large data within a row, it is better to store it in blob storage and reference the blob's URI in the table row.
Do you need to store massive amounts of semi-structured data? If yes, Azure tables are advantageous.
Although Azure tables are tremendously beneficial in terms of simplicity and cost, there are some limitations that need to be taken into account. Please see HERE for some initial guidance.
One other consideration is latency. There used to be a site that Microsoft ran with microbenchmarks on throughput and latency of various object sizes with table store and SQL Azure. Since that site's no longer available, I'll just give you a rough approximation from what I recall. Table store tends to have much higher throughput than SQL Azure. SQL Azure tends to have lower latency (by as much as 1/5th).
It's already been mentioned that table store is easy to scale. However, SQL Azure can scale as well with Federations. Note that Federations (effectively sharding) adds a lot of complexity to your application. I'm also not sure how much Federations affects performance, but I imagine there's some overhead.
If business continuity is a priority, consider that with Azure Storage you get cheap geo-replication by default. With SQL Azure, you can accomplish something similar but with more effort with SQL Data Sync. Note that SQL Data Sync also incurs performance overhead since it requires triggers on all of your tables to watch for data changes.
I realize this is an old question, but still a very valid one, so I'm adding my reply to it.
CoderDennis and others have pointed out some of the facts - Azure Tables is cheaper, and Azure Tables can be much larger, more efficient etc. If you are 100% sure you will stick with Azure, go with Tables.
However this assumes you have already decided on Azure. By using Azure Tables, you are locking yourself into the Azure platform. It means writing code very specific to Azure Tables that is not just going to port over to Amazon, you will have to rewrite those areas of your code. On the other hand programming for a SQL database with LINQ will port over much more easily to another cloud service.
This may not be an issue if you've already decided on your cloud platform.
I suggest looking at Azure Cache in combination with Azure Table. Table alone has 200-300ms latencies, with occasional spikes higher, which can significantly slow down response times / UI interactivity. Cache + Table seems to be a winning combination, for me.
For your question, I want to talk about how to decide with logic choose SQL Table and which need to use Azure Table.
As we know SQL Table is a relational database engine. but if you have a big data in one table the SQL Table is not applicable, because SQL query get big data is slow.
At this time you can choose Azure Table, the Azure Table query is so fast then SQL Table for big data, for example, in our website, someone subscribed many articles, we make the article as feed to user, every user have a copy of article title and description, so in the article table there are lots of data, if we use SQL Table, each query execution maybe take more than 30 seconds. But in Azure Table get users article feed by PartitionKey and RowKey is so fast.
From this example you may know how to choose between in SQL Table and Azure Table.
I wonder whether we are going to end up with some "vendor independent" cloud api libraries in due course?
I think that you have first to define what your application usage funnels are. Will your data model be subjected to frequent changes or it is a stable one? You have to be able to perform ultra fast inserts and reads are not so complicated? Do you need advance google like search? Storing BLOBS?
Those are the questions (and not only) that you have to ask and answer yourself in order to decide if you are more likely going to use NoSql or SQL approach in storing your data.
Please consider that both approaches can easily coexist and can be extended with BLOB storage as well.
Both Azure Tables and SQL Azure are two different beasts.Both are meant for different scenarios, one con to azure table is that you cannot move from azure to any other platform, unless you write providers in your code that can handle such shifts.

Resources