Windows Azure Virtual Machines Are They Worth It/Full Text Indexing

Windows Azure Virtual Machines Are They Worth It/Full Text Indexing - azure

We have a fairly large application running on a small web farm with Sql Server replication. Lately the sales team have become obsessed with wanting to sell the solution as a cloud based product as they think it sounds better.
We have begun migrating our dynamic media content to azure blob storage for images, videos, etc. but we cannot move to Azure with Sql Azure as it is as we use features such as FileStream and Full Text Indexing for searching documents. Also from what I can understand you cannot bring backups locally and run in Sql Server which is a fundamental part of our development process and bug fixing.
One suggestion is to move to the Virtual Machines that are in preview when they are released, I guess with all my research I am struggling to see if that is going to be of benefit to us over our current setup. I can see the advantages of blob storage for geo-replication as we have users in China, Mexico and India.
My question is is it worth going to a virtual machine in Windows Azure over our current setup, if it's not does anyone know any dates when Sql Azure will support fulltext indexing?
Many thanks for any thoughts/your own experiences with this.

I think that they are working on FTS on SQL Azure, but the release date isn't known. You can use Lucene.NET with SQL Azure to create a full text index from you database, you can simple update the index with the content of your database once every x minutes (or hours/days).
More info: http://social.technet.microsoft.com/wiki/contents/articles/2367.how-to-use-lucene-net-with-windows-azure-sql-database-en-us.aspx

I do agree about the Lucene recommendation, however to answer your question on SQL Server and Azure :)....
SQL Server on VMs works pretty well, right now it is a BYO-License model...so keep that in mind. You can tweak everything, use the SKU you need, set up AlwaysOn (SQL 2012) or clustering etc. Format your drives the way you want them, set processor affinity etc. etc. etc. :)
Check out some of the IaaS BUILD 2012 videos from this past week: http://channel9.msdn.com. Microsoft is certifying SQL Server as part of the software to run on the IaaS platform. If you know a bit about IaaS, you get some gallery options now to pick starting images from...would be an educated guess expect SQL Server 2012 image on there with the license priced in for you
(Microsoft can do that because they own SQL Server, Amazon/RackSpace have to rely on open source)
I/O can be a performance bottleneck now...IOPs on a striped drive is about 2x a 15k SCSI drive...they are going to improve that in the future (once again there are hacks to get this to work better).
Azure has a bit to catch up to Amazon in the dedicated specialized VMs (high I/O, high memory, high CPU). I am guessing from your solution (search/FILESTREAM) you would want high I/O..Amazon AWS has this right now and specialized storage on SSDs etc.
I am not sure if Full Text Search is coming to SQL Azure (PaaS version of SQL Server). If you have Full Text Search you probably are going to be tweaking Trace Flags, need constant communication with Blob Storage with high I/O...formatting the drive to the default 64kb allocation unit is probably OK for data drives but not for FILESTREAM etc.

Related

what type of azure resource should i use for hosting many database

I have a project where i need to host many Databases, (500 and up)
and i am trying to find what is the best option to manage everything considering all the options this days and the price.
in the past i would have a virtual server that has SQL-SERVER on it, and i would create the database on my own, and that is all.
but today
i host my current project on AZURE, a simple web server, with SQL server, with one database.
and i do not know what Resource to choose from AZURE
is it the SQL Ware House? or do i need to get a Virtual Machine?
or any other option?
i read all the information i found online, but its mostly confused me.
i hope some one could help me, i would like to know from your experience
thank you in advanced

It all very much depends on size and load of databases. You have 3 options - you can get a VM yourself and have a SQL server there. You are pretty much in control of what is happening and you can host as many DBs as you want. However you'll be in charge of backups, updates and maintenance. But this is a pretty much fixed price.
Another option is to get SQL Server from Azure - you don't need to think much about backups, encryption, updates and other boring stuff and you can get. You can have up to 5000DBs per server, but you can choose size and performance tier of your databases. However that can be expensive, as you are charged per DB.
Third option is to have Elastic Pool - this is basically a pile of DBs that are sharing the same resource. Can be useful if you have a lot of small DBs with small load. This will work out cheaper than just paying per DB on your scale. However might not work if you have very uneven load on some DBs - they can consume all the DTUs and will starve the rest of your DBs from processing power.
So it is up to you what you want to do based on your conditions. Personally I would not go with a VM - too much hassle. I would recommend considering (based on DBs load) a combination of Elastic Pool and a stand-alone DBs.

What Azure services to use for Larger Datastorage

Looking for best azure services for holding and manipulating data for an e-commerce application (online book store) with millions of books.
As of now the e-commerce application is running over asp.net and on-premises SQL server. As stock availability and prices are changed very frequently (in every hour) so we are manipulating/ updating millions of data in a specific time-line. Millions of records are updating with in 30 minutes using SSIS packages.
Now as we are intended to move our application over Azure, so can some help me to select the best data storage service on azure which meets our expectations.
Expectations:
1- Can store relational data
2- Data can update with in strict timeline - uses minimum time to complete full transaction
3- Highly scalable and highly available
As an experiment I am managing these data with Azure SQL Database (P1-tier) but not fully satisfied. Because for those task where On-premises Sql Server takes 30 minutes to complete, Azure Sql takes more than 7 hrs for the same process. I also tried with batches but still struggling.
Can someone suggest the solution please.

I'd be happy to help.
Unfortunately there is so much that's different between an P1 in a remote data center with a 99.99 SLA, automatic HA and very specific CPU/IOPS/Memory resources - and your on-premise server where the app logic, SQL Server are all running in the same OS context. Putting network latency to the side, I would guess that the HW resources in this server (CPU/IOPS/memory) are many times larger than what resources an P1 has.
Using what data you already have, upgrading to a P2 will approx. double the resources available in this test, P3 will quadruple and so on.
Happy to talk offline to help you build a more apples to apples comparison. guyhay at microsoft.com

Choosing a long-term storage/analytic system?

A brief summary of the project I'm working on:
I was hired as a web dev intern at a small company (part of a larger corporation) close to the state college I attend. For the past couple months, myself and two other interns have been working on the front-end as well as the back-end. The company is prototyping adding sensors to its products (oil/gas industry); we were tasked with building the portal that customers could login to to see data from their machines even if they're not near them.
Basically, we're collecting sensor data (~ten sensors/machine) and it's sent back to us. Where we're stuck is determining the best way to store and analyze long term data. We have a Redis Cache set up for fast access by the front-end, where only the lastest set of data for each machine is stored. But for historical data, I (and my coworkers) are having a tough time deciding the best route to go. Our whole project is based in VS (C#/Razor) with Azure integration (which is amazing by the way), so I'd like to keep the long term storage there as well. As far as I can tell, HDinsight + data in a BLOB seems to be the best option, but I'm fairly green when it comes to backend solutions. I would just like input from some older developers who may have more experience in this area, as we are the only developers here besides a couple older members who are more involved in the engineering side of things vs. development.
So, professionals of stack overflow, what would be your recommendation for long-term data storage and analytics?
PS: I apologize if I have HDinsight confused. From what I understand, it maps data in BLOB storage into HBase for easier analytics? Hadoop/HBase confuses me.

My first recommendation would be Azure Table storage. It provides a highly scalable and low cost data archival solution. If designed properly, you can also get a very decent query performance. Refer to the Azure Storage Table Design Guide for more details.
My second choice would be Azure DocumentDB service which is a NoSQL document database. It costs a bit more but querying data is much more flexible.
You should only go with HDInsight when you have a specific need as it's a resource-intensive and expensive service. Once you identify a specific requirement for a big-data analysis that's when you import your data and process it with HDInsight.

Is storing data in Windows Azure cheaper when using RavenDB rather than SQL Azure?

SQL Azure storage is a lot more expensive than Windows Azure Storage. Would implementing a no-sql solution like RavenDB allow me to store data on the cheaper Azure Storage?
Are there other things to consider, like backup, speed or security?
Thank you.

You have to consider that with SQL Azure you not only get the storage, but the database server too. If you implement RavenDB, you will will need a worker role to host it in and, in order to allow for failure of that worker role, another worker role (replica), which also doubles up the storage.
Bear in mind that with SQL Azure you get a highly available (3x replicated with failover) SQL solution that surfaces a familiar (ADO.NET) API. Make your choices based on aspects other than storage cost, such as operational effort and development effort. If you choose RavenDB it should be because of the potential cost savings in development effort (because of the closeness on the document API to the object graph) and operational cost, because RavenDB is 'administered' as part of the application. Cost of storage of actual bytes, particularly at scale, is a marginal consideration.

Adding a bit to #Simon's answer: When considering Table Storage and its low cost, also consider whether you can use it directly, instead of going with an installed-and-managed-by-you NoSQL database engine. As it stands, Table Storage offers a schemaless solution that lets you store essentially a property bag within a row, indexed by partitionkey+rowkey. Does that work for you? Could you work with a few extra tables to give you additional indexing? If so, your storage cost is going to be really low (and still durable, triple-replicated).
If you find yourself writing significant code to manage Table Storage, then it may be more efficient to invest in the Compute instances needed to run RavenDB. When considering this, also consider that you'll likely want larger VM sizes if you're moving significant data (as you get approx. 100Mbps per core). A database like MongoDB, working with memory-mapped files, really ramps up speed-wise with more RAM. Not sure if this is the same with RavenDB.

Use Sql Server FileStream or traditional File Server?

I am designing a system that's going to have about 10 millions+ users, each has a photo, which is about 1~2 MB.
We are going to deploy both database and web app using Microsoft Azure
I am wondering the way I should store the photos, there are currently two options,
1, Store all photos use Sql Server FileStream
2, Use File Server
I haven't experienced such large scale BLOB data using FileStream.
Can anybody give my any suggestion? The Cons and Pros?
And anyone with Microsoft Azure experiences concerning the large photos store is really appreciated!
Thx
Ryan.

I vote for neither. Use Windows Azure Blob storage. Simple REST API, $0.15/GB/month. You can even serve the images directly from there, if you make them public (like <img src="http://myaccount.blob.core.windows.net/container/image.jpg" />), meaning you don't have to funnel them through your web app.

Database is almost always a horrible choice for any large-scale binary storage needs. Database is best for relational-only systems, and instead, provide references in your database to the actual storage location. There's a few factors you should consider:
Cost - SQL Azure costs quite a lot per GB of storage, and has small storage limitations (50GB per database), both of which make it a poor choice for binary data. Windows Azure Blob storage is vastly cheaper for serving up binary objects (though has a bit more complicated pricing system, still vastly cheaper per GB).
Throughput - SQL Azure has pretty good throughput, as it can scale well, however, Windows Azure Blog storage has even greater throughput as it can scale to any number of nodes.
Content Delivery Network - A feature not available to SQL Azure (though a complex, custom wrapper could be created), but can easily be setup within minutes to piggy-back off your Windows Azure Blob storage to provide limitless bandwidth to your end-users, so you never have to worry about your binary objects being a bottleneck in your system. CDN costs are similar to that of Blob storage, but you can find all that stuff here: http://www.microsoft.com/windowsazure/pricing/#windows
In other words, no reason not to go with Blob storage. It is simple to use, cost effective, and will scale to any needs.

I can't speak on anything Azure related but for my money the biggest advantage of using FILESTREAM is that that data can get backed up inside the normal SQL Server backup process. The size of the data that you are talking about also suggests that FILESTREAM may be a good choice as well.
I've worked on a SCM system with a RDBMS back end and one of our big decisions was whether to store the file deltas on the file system or inside the DB itself. Because it was cross-RDBMS we had to cook up a generic non-FILESTREAM way of doing it but the ability to do a single shot backup sold us.

FILESTREAM is a horrible option for storing images. I'm surprised MS ever promoted it.
We're currently using it for our images on our website. Mainly the user generated images and any CMS related stuff that admins create. The decision to use FILESTREAM was made before I started. The biggest issue is related to serving the images up. You better have a CDN sitting in front. If not, plan on your system coming to a screeching halt. Of course, most sites have a CDN, but you don't want to be at the mercy of that service going down meaning your system will get overloaded. The amount of stress put on your sql server is the main problem here.
In terms of ease of backup. Your tradeoff there is that your db is MUCH MUCH LARGER and, therefore, the backup takes longer. Potentially, much longer and the system runs slower during the backup. Not to mention, moving backups around takes longer (i.e., restoring prod data in a dev environment or on local machines for dev purposes). Don't use this as a deciding factor.
Most cloud services have automatic redundancy of any files that you store on their system (i.e., aws's S3 and azure's blob). If you're on premise, just make sure you use a shared location for the images and make sure that location is backed up. I think the best option is to set it up so each image (other UGC file types too) has an entry in your db with a path to that file. Going one step further, separate the root path into a config setting and only store the remaining path with the entry. For example, root path in config might be a base url, a shared drive or virtual dir, or a blank entry. Then your entry might have "/files/images/image.jpg". This way, if you move your filestore, you can just update the root config. I would also suggest creating a FileStoreProvider interface (Singleton) that can be used for managing (saving, deleting, updating) these files. This way, if you switch between AWS, Azure, or on premise, you can just create a new Provider.

I have a client server DB, i manage many files (doc, txt, pdf, ...) and all of them go in a filestream BLOB. Customers has 50+ MB dbs. If in azure you can do the same go for it. Having all in the db is a wonderful thing. It is considered good policy also for Postgres and MySQL

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string