Vitess sequence performance for multi-region setup

Vitess sequence performance for multi-region setup - vitess

I am planning to use geo-partitioned multi regional Vitess setup and I have a performance concerns about the tables which should use sequence feature for incremented ID.
My app will have 4 regions - Europe, North America, South Asia and East Asia and I am planning to have users table sharded by countries which means that I should have user_sequence table in an unsharded keyspace.
Does this means that each insert will have to make a roundtrip to unsharded keyspace in order to fetch the cached IDs? Docs are bit vague IMO:
In between those writes only the in-memory structures within the primary vttablet serving the unsharded keyspace where the backing table lives are updated, making the QPS only limited by the RPC latency between the vtgates and the the serving vttablet for the sequence table.

Related

Azure SQL - GEO-REPLICATION : Data loss?

I have an Azure SQL in WEST US with GEO-REPLICATION enabled to sync with EAST US.
and I want to know
How often Geo-Recovery sync gets executed to keep the EAST US up to date?
In case of WEST US regional failure and happen to failover to EAST US, would there be any data loss?

Update:
Automated backups, according to this documentation: Both SQL Database and SQL Managed Instance use SQL Server technology to create full backups every week, differential backups every 12-24 hours, and transaction log backups every 5 to 10 minutes. The frequency of transaction log backups is based on the compute size and the amount of database activity.
According to this documentation, if an outage is detected, Azure waits for the period you specified by GracePeriodWithDataLossHours. The default value is 1 hour. If you cannot afford data loss, make sure to set GracePeriodWithDataLossHours to a sufficiently large number, such as 24 hours. Use manual group failover to fail back from the secondary to the primary.
According to this answer, Grace period means to allow time for the database to failover within the primary region.

Azure SQL - would Geo-replication cause any performance impact?

I have an Azure SQL in WEST US and I want to setup the failover grop with EAST US.
would Azure SQL Geo-replication/failover group cause any performance impact? If so, what would be the impact?

Talking about the impact
In case of failure,
There might be 2 scenarios : Planned Failure and Unplanned Failure.
For Planned Failure,
Your primary database i.e. WEST US will first synchronize with secondary database i.e. EAST US. Then the EAST US db will become primary. This will prevent data loss.
For Unplanned Failure,
The secondary db EAST US will immediately takeover as primary db. Data Loss might happen depending on previous synchronization time.
There will be a performance impact in both the cases. Latency will increase. Microsoft has defined some best practices to minimize this impact.
Refer : https://learn.microsoft.com/en-us/azure/azure-sql/database/auto-failover-group-overview?tabs=azure-powershell#failover-groups-and-network-security

How do I achieve data span across multiple regions (not replication) with Azure SQL

I have a single Azure SQL Server and a single database in it. I want a solution to store specific records of selected tables in this database in different regions.
as an example, I have a users table with all PII data in it. these users can be from anywhere from the world. but i would want to store user records who are from EU region to be stored only in EU region.
To add it - i want all the other table records related to a specific user as well to get stored in that user's region.
from application perspective, i would be able to query across all users and all related tables to have dashboard data for the global users.
Any pointers to solve this scenario would be helpful for me.

Another approach could be sharding the database. Use horizontal sharding to store the rows for each country/region in a separate database in that country/region. The Elastic Database Client library will use a shardmap do most of the sharding work for you (assuming you are using .NET). You can use the country code in your shardmap to split regional data.
Reference Architecture: https://learn.microsoft.com/en-us/azure/architecture/patterns/sharding
Elastic Database Client: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-elastic-database-client-library

Here is one approach... When your user/tenant registers for your service they will need to pick where their data should reside. This is referred to as data residency. Then on subsequent requests to read or write data your application's repository layer needs to be aware of who the request is executing as so it can lookup the appropriate connection string and connect to that database to retrieve/write the data.
The routing data can be replicated to multiple regions and/or housed in a single location as it would not contain PII. The Azure Web App can be single region hosted (as depicted in the image below) or it can be replicated to multiple regions and traffic routed to it via a global traffic manager.
This approach supports the case where an European user picks to have their data reside in France but happens to be visiting the united states.
This picture shows how this might look. A guy named Barry Luijbregts has a nice pluralsight video that delves into this approach. https://www.pluralsight.com/courses/azure-paas-building-global-app
Good luck!

MarkLogic Cluster - Configure Forest with all documents

We are working on MarkLogic 9.0.8.2
We are setting up MarkLogic Cluster (3 VMs) on Azure and as per failover design, want to have 3 forests (each for Node) in Azure Blob.
I am done with Setup and when started ingestion, i found that documents are distributed across 3 forests and not stored all in each Forest.
For e.g.
i ingested 30000 records and each forest contains 10000 records.
What i need is to have all forest with 30000 records.
Is there any configuration (at DB or forest level) i need to achieve this?

MarkLogic does not work the same as some of the other noSQL document databases failover which may keep a copy of every document on each host.
The clustered nature of MarkLogic distributes the documents across the hosts to provide a balance of availability and resource consumption. For failover protection, you must create additional forests on each host and attach them to your existing forests as replicas. This ensures availability should any 1 of the 3 hosts fail.
Here is a sample forest layout:
Host 1: primary_forest_01 replica_forest_03
Host 2: primary_forest_02 replica_forest_01
Host 3: primary_forest_03 replica_forest_02
The replica forest must be on a different host than the primary forest, and if there are multiple forests per host, they should be striped across hosts to best balance out resource consumption when failed over.
It's also important to note that for HA, you need replicas configured for the system databases as well.
So there is no database setting to put all the documents on every hosts, because that is not the way MarkLogic is designed to work. The Scalability, Availability and Failover Guide is very informative, and in this case, the High Availability of Data Nodes with Failover section is particularly relevant. I also highly recommend checking out the free training that MarkLogic offers.

Improve CPU Utilization by Restructuring Nodes

We have a database located in North Europe region with 2 nodes of AppServices on Azure (West Europe & North Europe). We use traffic manager to route traffic.
Our SQL database and storage are located in Northern Europe.
When we started the website, European locations were the closest to our customers.
However, we saw a shift and most of our customers now are from USA.
We have high CPU utilization on our processors although we have a lot of instances on each.
The question is:
Since most of our customers are from USA and it's hard to relocate the database, is it better to keep the app structure as it is (N. Europe & W. Europe) or create a new node in USA but this node will still need to communicate with the database in North Europe?
Thank you

Having you app in US region and Database in Europe is not recommended.
These are a few of the things you will run into:
1) High latency since the queries for data will have to round-trip to Europe to get this.
2) Higher resource utilization since in general each request that access the DB will take longer, this will increase memory usage while requests are waiting on data it will also make the impact of load a lot more severe on the app.
3) cross region data egress, you will need to pay for all the data moving from Europe to us every-time there is a query.
A better solution would be to do the following:
1) Setup a new DB in us region and hook up active geo-replication
At this point you will have a hot/cold configuration where any instance can be used to read data form the DB but only the primary instance can be used for write operations.
2) Create a new version of the App/App Service plan in US region
3) Adapt your code to understand your geo distributed topology.
You App should be able to send all reads to the "closest" region and all writes to the primary database.
4) Deploy the code to all regions
5) add the new region to TM profile
While this is not ideal since write operation might still have to jump the pond, most apps have a read write patter than is heavily askewed towards read operations (roughly 85% reads / 15% writes ) so this solution works out with the added benefit of giving you HA in case one of the regions goes down.
You might want to look at this talk where I go over how to setup a geo distributed app using App Service, SQL Azure and the technique outlined above.

Have you considered sharding your data based on the location of your users? In terms of performance it will be better, You can provide maintenance on off-peak hours of each region. Allow me to recommend you this article.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string