What Azure services to use for Larger Datastorage

What Azure services to use for Larger Datastorage - azure

Looking for best azure services for holding and manipulating data for an e-commerce application (online book store) with millions of books.
As of now the e-commerce application is running over asp.net and on-premises SQL server. As stock availability and prices are changed very frequently (in every hour) so we are manipulating/ updating millions of data in a specific time-line. Millions of records are updating with in 30 minutes using SSIS packages.
Now as we are intended to move our application over Azure, so can some help me to select the best data storage service on azure which meets our expectations.
Expectations:
1- Can store relational data
2- Data can update with in strict timeline - uses minimum time to complete full transaction
3- Highly scalable and highly available
As an experiment I am managing these data with Azure SQL Database (P1-tier) but not fully satisfied. Because for those task where On-premises Sql Server takes 30 minutes to complete, Azure Sql takes more than 7 hrs for the same process. I also tried with batches but still struggling.
Can someone suggest the solution please.

I'd be happy to help.
Unfortunately there is so much that's different between an P1 in a remote data center with a 99.99 SLA, automatic HA and very specific CPU/IOPS/Memory resources - and your on-premise server where the app logic, SQL Server are all running in the same OS context. Putting network latency to the side, I would guess that the HW resources in this server (CPU/IOPS/memory) are many times larger than what resources an P1 has.
Using what data you already have, upgrading to a P2 will approx. double the resources available in this test, P3 will quadruple and so on.
Happy to talk offline to help you build a more apples to apples comparison. guyhay at microsoft.com

Related

what type of azure resource should i use for hosting many database

I have a project where i need to host many Databases, (500 and up)
and i am trying to find what is the best option to manage everything considering all the options this days and the price.
in the past i would have a virtual server that has SQL-SERVER on it, and i would create the database on my own, and that is all.
but today
i host my current project on AZURE, a simple web server, with SQL server, with one database.
and i do not know what Resource to choose from AZURE
is it the SQL Ware House? or do i need to get a Virtual Machine?
or any other option?
i read all the information i found online, but its mostly confused me.
i hope some one could help me, i would like to know from your experience
thank you in advanced

It all very much depends on size and load of databases. You have 3 options - you can get a VM yourself and have a SQL server there. You are pretty much in control of what is happening and you can host as many DBs as you want. However you'll be in charge of backups, updates and maintenance. But this is a pretty much fixed price.
Another option is to get SQL Server from Azure - you don't need to think much about backups, encryption, updates and other boring stuff and you can get. You can have up to 5000DBs per server, but you can choose size and performance tier of your databases. However that can be expensive, as you are charged per DB.
Third option is to have Elastic Pool - this is basically a pile of DBs that are sharing the same resource. Can be useful if you have a lot of small DBs with small load. This will work out cheaper than just paying per DB on your scale. However might not work if you have very uneven load on some DBs - they can consume all the DTUs and will starve the rest of your DBs from processing power.
So it is up to you what you want to do based on your conditions. Personally I would not go with a VM - too much hassle. I would recommend considering (based on DBs load) a combination of Elastic Pool and a stand-alone DBs.

Azure SQL Server data center speed

I have set up an Azure web site using an Azure SQL Server database. These are placed in different locations (by accident). The web site in Northern Europe and the SQL Server database in South Central US.
Assume I instead have the SQL Server database in Northern Europe, so that it is in the same location as the web site, would it be any faster retrieving data? If so, by how much? Assume I have a very inefficient query loading too much data that currently takes 15 seconds.
Please ignore the possibility of improving the query. I am just interested in if anyone have any statistics on any speed improvements on moving where the SQL Server data centers are located related to the web site.

Assume I have a very inefficient query loading too much data that currently takes 15 seconds.
Now it will take 15.3 seconds (15 seconds + 300ms 3-way TCP handshake across the ocean).
Consider having to do 10000 queries over let's say one hour - you pay the latency penalty FOR EACH OF THOSE QUERIES.
In essence, move your database in the same Azure Region as your application, or vice-versa.

Choosing a long-term storage/analytic system?

A brief summary of the project I'm working on:
I was hired as a web dev intern at a small company (part of a larger corporation) close to the state college I attend. For the past couple months, myself and two other interns have been working on the front-end as well as the back-end. The company is prototyping adding sensors to its products (oil/gas industry); we were tasked with building the portal that customers could login to to see data from their machines even if they're not near them.
Basically, we're collecting sensor data (~ten sensors/machine) and it's sent back to us. Where we're stuck is determining the best way to store and analyze long term data. We have a Redis Cache set up for fast access by the front-end, where only the lastest set of data for each machine is stored. But for historical data, I (and my coworkers) are having a tough time deciding the best route to go. Our whole project is based in VS (C#/Razor) with Azure integration (which is amazing by the way), so I'd like to keep the long term storage there as well. As far as I can tell, HDinsight + data in a BLOB seems to be the best option, but I'm fairly green when it comes to backend solutions. I would just like input from some older developers who may have more experience in this area, as we are the only developers here besides a couple older members who are more involved in the engineering side of things vs. development.
So, professionals of stack overflow, what would be your recommendation for long-term data storage and analytics?
PS: I apologize if I have HDinsight confused. From what I understand, it maps data in BLOB storage into HBase for easier analytics? Hadoop/HBase confuses me.

My first recommendation would be Azure Table storage. It provides a highly scalable and low cost data archival solution. If designed properly, you can also get a very decent query performance. Refer to the Azure Storage Table Design Guide for more details.
My second choice would be Azure DocumentDB service which is a NoSQL document database. It costs a bit more but querying data is much more flexible.
You should only go with HDInsight when you have a specific need as it's a resource-intensive and expensive service. Once you identify a specific requirement for a big-data analysis that's when you import your data and process it with HDInsight.

Windows Azure Virtual Machines Are They Worth It/Full Text Indexing

We have a fairly large application running on a small web farm with Sql Server replication. Lately the sales team have become obsessed with wanting to sell the solution as a cloud based product as they think it sounds better.
We have begun migrating our dynamic media content to azure blob storage for images, videos, etc. but we cannot move to Azure with Sql Azure as it is as we use features such as FileStream and Full Text Indexing for searching documents. Also from what I can understand you cannot bring backups locally and run in Sql Server which is a fundamental part of our development process and bug fixing.
One suggestion is to move to the Virtual Machines that are in preview when they are released, I guess with all my research I am struggling to see if that is going to be of benefit to us over our current setup. I can see the advantages of blob storage for geo-replication as we have users in China, Mexico and India.
My question is is it worth going to a virtual machine in Windows Azure over our current setup, if it's not does anyone know any dates when Sql Azure will support fulltext indexing?
Many thanks for any thoughts/your own experiences with this.

I think that they are working on FTS on SQL Azure, but the release date isn't known. You can use Lucene.NET with SQL Azure to create a full text index from you database, you can simple update the index with the content of your database once every x minutes (or hours/days).
More info: http://social.technet.microsoft.com/wiki/contents/articles/2367.how-to-use-lucene-net-with-windows-azure-sql-database-en-us.aspx

I do agree about the Lucene recommendation, however to answer your question on SQL Server and Azure :)....
SQL Server on VMs works pretty well, right now it is a BYO-License model...so keep that in mind. You can tweak everything, use the SKU you need, set up AlwaysOn (SQL 2012) or clustering etc. Format your drives the way you want them, set processor affinity etc. etc. etc. :)
Check out some of the IaaS BUILD 2012 videos from this past week: http://channel9.msdn.com. Microsoft is certifying SQL Server as part of the software to run on the IaaS platform. If you know a bit about IaaS, you get some gallery options now to pick starting images from...would be an educated guess expect SQL Server 2012 image on there with the license priced in for you
(Microsoft can do that because they own SQL Server, Amazon/RackSpace have to rely on open source)
I/O can be a performance bottleneck now...IOPs on a striped drive is about 2x a 15k SCSI drive...they are going to improve that in the future (once again there are hacks to get this to work better).
Azure has a bit to catch up to Amazon in the dedicated specialized VMs (high I/O, high memory, high CPU). I am guessing from your solution (search/FILESTREAM) you would want high I/O..Amazon AWS has this right now and specialized storage on SSDs etc.
I am not sure if Full Text Search is coming to SQL Azure (PaaS version of SQL Server). If you have Full Text Search you probably are going to be tweaking Trace Flags, need constant communication with Blob Storage with high I/O...formatting the drive to the default 64kb allocation unit is probably OK for data drives but not for FILESTREAM etc.

How to scale SQL azure?

I want to host my WCF services in the Azure clouds for scalability reasons. For example there will be some read data action. And it will be under High Load (1000+ user/sec).
(Like in my previous question)
Also I have a limitation in 1 sec timeout for any request.
My service will be connected with SQL Azure. I chosing it because of small latency (not more than 7ms according to microsoft's benchmark)
How many concurrent connections can hold SQL Azure per instance/database?
Is there any ability to scale SQL Azure when i will reach the limit of connections per instance?
Other solutions, options for my scenario?
Thanks.

One thing to keep in mind is that you will need to make sure you are leveraging connection pooling to its maximum. Using a service account instead of different logins is an important step to ensure proper connection pooling.
Another consideration is the use of MARS. If you have many requests coming through, you may want to pool them together into a single request, hence a single connection, and return multiple resultsets. In this post I discuss how to implement one-way queuing of SQL statements; this may not work for you as-is because you may be expecting a response, but it may give you some ideas on how to implement a batch of requests to minimize the number of connections and minimize wait time.
Finally you can take a look at this tool I wrote last year to test connection/statements against SQL Azure. The tool automatically turns off connection pooling to measure the effects of concurrency. You can download it here.
Finally, I also wrote the Enzo Shard Library on codeplex. Let me know if you have any questions if you decide to investigate the library for your project. Note that the library will evolve to support the future capabilities of SQL Azure Data Federation as well.

It appears there is no direct limit to the number of connections available per SQL Azure instance, but Microsoft state that they reserve the right to throttle connections in situations where resource use is regarded as "excessive".
There's some information on this here, also details on what may happen in this situation here.
A good work-around is to consider "sharding", where you partition your data on some easily-definable criteria and have multiple databases. This does, of course, incur additional cost. A neat implementation of that is here: http://enzosqlshard.codeplex.com/
Also: Azurescope have had some interesting benchmarks here: http://azurescope.cloudapp.net/BestPractices/#ed6a21ed-ad51-4b47-b69c-72de21776f6a (unfortunately, removed early 2012)

Is there any ability to scale SQL Azure when i will reach the limit of connections per instance?
In addition to the Enzo sql sharding suggestion, there are a couple of Microsoft products/features under construction to assist with scaling SQL Azure. These are CTP (at best) but may provide some scalability options for you by allowing you to spread the load across multiple SQL Azure databases:
SQL Azure federations - http://convective.wordpress.com/2011/05/02/sql-azure-federations/
SQL Azure datasync http://www.microsoft.com/windowsazure/sqlazure/datasync/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string