Sizing an Azure Geo Replicated Database

Sizing an Azure Geo Replicated Database - azure

If I have an S2 Sql Database, and I create a secondary geo-replicated database, should it be of the same size (S2)? I see that you get charged for the secondary DB, but the DTU's reported against that secondary are 0%, which seems to indicate that S2 is too large.
Obviously, we'd like to save the cost if at all possible and move the secondary to a smaller size if at all possible.
Considerations
I understand if we need to failover to the secondary, at that point, it would need to be bumped up to the size of S2 to meet the production workloads, but assuming that we could do this at the time of failover?
I also get that if we were actively using the replicated DB for reporting, etc, then we'd have to size it accordingly to meet that demand. But currently, we are not actively using the secondary for anything other than to use as a failover point if it is ever needed.

At this point both primary and secondary must be in the same edition but can have different performance objectives (DTU size). We are working on lifting that limitation so that geo-replication databases could scale to a different edition when needed without breaking the replication links (e.g. standard to premium).
Re sizing the secondary, you *can" make it smaller in DTU than the primary if you believe that the updates take less capacity than reads (high read/write ratio). But as noted earlier, you will have to upsize it right after the failover and it may take time during which your app's performance will be impacted. In general, we do not recommend having the secondary more than 1 level smaller. E.g. S3->S1 is not a good idea as it will likely cause replication lag and may result in excessive data loss after failover.

You can safely change the tier of the secondary database, but bear in mind, that in the case of failover, you will face performance issues. Also you cant scale past your current performance Tier (so both bases ought to be of the same Tier).
And yes, you can change the size past failover, but the process is manual.

Related

Does Cloud Spanner allow me to scale storage separately from compute?

I see a lot of claims that Spanner decouples compute from storage. And sure the diagrams look like it does. However, when scaling Spanner the only dial I can turn is the number of nodes in the cluster. Each node is provisioned with some compute and 2TB of storage.
What's nice is that even if I over-provision nodes past my storage needs, I still only pay for the storage I'm using. So in that sense, the costs for compute and storage are also decoupled.
But what if my storage scales faster than compute? If I have 10TB of data I need 5 (really 6) nodes. But what if there just aren't enough queries to use even 10% of the available compute on those nodes? Unlike storage, I don't pay for used compute hours. I pay for the node as long as it's provisioned and I can't deprovision it because I need the storage space.
This means Spanner does not actually separate compute from storage in a strict sense. Since my compute costs scale with storage (as well as with queries per second), this claim seems almost blatantly false.
It's possible that Spanner is simply not intended for a use case where compute scales slower than storage, but I feel like I must be misunderstanding something. Please help me see the error of my ways.

I pay for the node as long as it's provisioned and I can't deprovision it because I need the storage space.
Unfortunately that is true.
Maintaining the data itself incurs not only CPU but also memory cost. So there is a limit to how much data a node can efficiently handle. The claim of separation of compute/storage still holds until the limit is reached.
It's possible that Spanner is simply not intended for a use case where compute scales slower than storage.
I am afraid there is no workaround this limit ATM. But I do agree that this is a valid use case and Cloud Spanner probably should have a solution to handle it.
Although it does not directly address your concern - you can open a feature request to provide a data point and help the team better prioritize.

RethinkDB with more throughput?

I am looking to build out a realtime pubsub database backend. RethinkDB is actually a perfect package for what I need, mainly because of it's very low latency changefeeds. But RethinkDB seems to be a DB that you can expect about 10k-20k inserts per second on two machines. Whereas I have seen some postings claim people are getting 1 million inserts per second on DB's like Cassandra with comparable hardware, but Cassandra doesn't have the realtime changefeeds feature.
So my question is, is there another DB, or combination of open source systems, which can provide the low latency changefeed functionality of RethinkDB, but enable it to occur on a scale much much larger than RethinkDB? Both quantity of inserts per second, and amount of users that are subscribed to change feeds are both important requirements that need to be high as possible.

RethinkDB might still fit your needs if you can scale out to a robust cluster (lots of nodes). Below is a link to a report they generated with performance metrics scaling up to a 16-node cluster.
https://rethinkdb.com/docs/2-1-5-performance-report/

How to optimize deployment to regions for minimum perceived latency and maximum cost savings?

I will be using Azure Cosmos DB with Azure Functions deployed in the same regions, with a gateway (cloudflare or an Azure option) which will route to the azure function in the closest region, which is deployed along side a Cosmos DB replication.
the benefits in perceived latency should be logarithmic right?
like, having 2 regions is 3x better,
3 region ~5x times better perceived latency. etc.
according to MS, Cosmos DB is available in all regions.
considering our customers aren't clustered around a specific region and are all over the world.
which is the optimal regions to deploy to?
for replication in
1 region
2 regions
3 regions
4 regions

You can use the http://www.azurespeed.com/
to see the closest DC from the client and pick the optimal location.

As an extreme/unrealistic case you can imagine each customer/client having a copy of the db running next to them. This should cause the least latency for the customer. Right ?
The answer is that it depends. If you talk about local read/write latency then that would be true. However, the more you replicate your database the more time write operations will take to synchronise across all nodes (and in turn affect what is available when you read). See consistency models here. Although you have customers spread across the globe, it would be better if you start from regions with the most load/requests and then spread out from there.
Deciding this is also when the proverbial "rubber meets the road" as you would soon realise that business might be willing to relax some latency needs around edges given the cost increase to achieve 100% coverage.

New Azure SQL Database Services, how scalable and what are DTUs

The new new Azure SQL Database Services look good. However I am trying to work out how scalable they really are.
So, for example, assume a 200 concurrent user system.
For Standard
Workgroup and cloud applications with "multiple" concurrent transactions
For Premium
Mission-critical, high transactional volume with "many" concurrent users
What does "Multiple" and "Many" mean?
Also Standard/S1 offers 15 DTUs while Standard/S2 offers 50 DTUs. What does this mean?
Going back to my 200 user example, what option should I be going for?
Azure SQL Database Link
Thanks
EDIT
Useful page on definitions
However what is "max sessions"? Is this the number of concurrent connections?

There are some great MSDN articles on Azure SQL Database, this one in particular has a great starting point for DTUs. http://msdn.microsoft.com/en-us/library/azure/dn741336.aspx and http://channel9.msdn.com/Series/Windows-Azure-Storage-SQL-Database-Tutorials/Scott-Klein-Video-02
In short, it's a way to understand the resources powering each performance level. One of the things we know when talking with Azure SQL Database customers, is that they are a varied group. Some are most comfortable with the most absolute details, cores, memory, IOPS - and others are after a much more summarized level of information. There is no one-size fits all. DTU is meant for this later group.
Regardless, one of the benefits of the cloud is that it's easy to start with one service tier and performance level and iterate. In Azure SQL Database specifically you can change the performance level while you're application is up. During the change there is typically less than a second of elapsed time when DB connections are dropped. The internal workflow in our service for moving a DB from service tier/performance level follows the same pattern as the workflow for failing over nodes in our data centers. And nodes failing over happens all the time independent of service tier changes. In other words, you shouldn’t notice any difference in this regard relative to your past experience.
If DTU's aren't your thing, we also have a more detailed benchmark workload that may appeal. http://msdn.microsoft.com/en-us/library/azure/dn741327.aspx
Thanks Guy

It is really hard to tell without doing a test. By 200 users I assume you mean 200 people sitting at their computer at the same time doing stuff, not 200 users who log on twice a day. S2 allows 49 transactions per second which sounds about right, but you need to test. Also doing a lot of caching can't hurt.

Check out the new Elastic DB offering (Preview) announced at Build today. The pricing page has been updated with Elastic DB price information.

DTUs are based on a blended measure of CPU, memory, reads, and writes. As DTUs increase, the power offered by the performance level increases. Azure has different limits on the concurrent connections, memory, IO and CPU usage. Which tier one has to pick really depends upon
#concurrent users
Log rate
IO rate
CPU usage
Database size
For example, if you are designing a system where multiple users are reading and there are only a few writers, and if your application middle tier can cache the data as much as possible and only selective queries / application restart hit the database then you may not worry too much about the IO and CPU usage.
If many users are hitting the database at the same time, you may hit the concurrent connection limit and requests will be throttled. If you can control user requests coming to the database in your application then this shouldn't be a problem.
Log rate: Depends upon the volume of the data changes (including additional data pumping in the system). I have seen application steadily pumping the data vs data being pumped all at once. Selecting the right DTU again depends upon how one can do throttling at the application end and get steady rate.
Database size: Basic, standard, and premium has different allowed max sizes, and this is another deciding factor. Using table compression kind of features helps reducing the total size, and hence total IO.
Memory: Tuning the expesnive queries (joins, sorts etc), enabling lock escalation / nolock scans help controlling the memory usage.
The very common mistake people usually do in database systems is scaling up their database instead of tuning the queries and application logic. So testing, monitoring the resources / queries with different DTU limits is the best way of dealing this.
If choose the wrong DTU, don't worry you can always scale up/ down in SQL DB and it is completely online operation
Also unless a strong reason migrate to V12 to get even better performance and features.

Azure table storage and caching

Is it worth caching data from Azure Table storage with the Azure Caching Preview?
Or is the table storage fast enough in large scale applications?
Thanks

The short answer is it depends. In the application I am currently working on there is some information that we use caching for to handle both the latency of retrieving data from Table Storage and to accommodate the desired number of transactions per second.
We started out serving the information from Table Storage and moved to caching only when our performance requirements dictated it. I'd recommend a similar approach: make it work, then make it fast.

In addition to what Robert said, you should also consider following points:
Windows Azure Table Storage allows to store up to 100TB in size (in chunks). At first glance, that size of data may seem overwhelming. However, Table Storage can be partitioned. Each partition of Table Storage can be moved to a separate server by the Azure controller thereby reducing the load on any single server and improving performance.
If you have very high load on your application, you cache with frequent inserts will approach the maximum cache size very quickly and then cache items eviction process starts. In most cases frequent inserts into cache and frequent cache items eviction processes end up with performance degradation instead of improvement. Then you would need to increase cache maximum size, which in turn will affect your application cost (sometimes this might be a blocker).
Last but not least, you can access Windows Azure Table Storage data using the OData protocol and LINQ queries with WCF Data Service .NET Libraries; you do not have that ability with Azure Cache.
Please bear in mind that those points may or may not be valid in your case. All depends on your system architecture; expected load etc.
I hope my answer will help you in making good system architecture decisions.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string