Azure Search scalability

Azure Search scalability - azure

We are developing a mobile app that should scale for thousands of users and we are using Azure Search as our main storage. According to Azure pricing model the query limits are set to 15 queries per second/per unit for the standard plan. With these limits and with a system that should scale with thousands of concurent users we would hit the limits pretty quickly.
In our situation is Azure Search not the right option when scaling for thousands of concurrent users?
Would DocumentDB be a better option?
Thanks!

Interesting that you're using Azure Search as your primary storage, as it's not built to be a database engine. The storage is specifically for search content (type typical pattern is to use Azure Search in conjunction with a database engine, such as SQL Database or DocumentDB, for example), using results to point back to the "system of record" content in your database.
The scale for Search is specifically for full-text-search queries your users will generate. And Azure Search scales per unit, with each unit offering 15 searches / second. So, you can scale far beyond 15/sec if you buy more search units.
However: Don't confuse this with database engine queries. You asked about DocumentDB, so using that as an example: You can query far beyond 15/second with that database engine, and that scales independently. Same goes for any VM-based database solution, SQL Database, etc - they all can scale.
This really comes down to whether you need full-text-search at high volume. If so, great - just scale Azure Search to the number of units you need, to handle your request traffic. If you can do more database-specific searches, without driving your request through Azure Search, then you don't need to scale out as much, and can take advantage of the native database query capabilities.

One thing to add to David's excellent answer - if your scenario is primarily search driven and you don't need to store data for purposes other than search and are OK with eventual consistency, then using Azure Search as the primary store may be fine.
Also, 15 requests per second query throughput of Azure Search is just a ballpark - it's neither a hard limit nor a promise. Depending on your data and query complexity, the actual throughput can be significantly (many times) higher or lower.

Related

Where indices are stored behind Azure Cognitive Search service?

Please check this tweet chain. I am working on a PoC using Azure Cognitive Search Service and I am comparing it with AWS. AWS seems to be using MongoDB Atlas to store the indices and Search function is basically is on Mongo's default search capability which is built on Apache Lucene. I am trying to find how the inverted indices are stored behind the scenes of Azure Cognitive Search. They are using Apache Lucene which serves as the search engine to search the index.

Disclaimer
This answer should be considered accurate only as of July 2020, because implementation details do change. This information isn't material to which service is "better" for any particular purpose; just interesting for the sake of curiosity.
Also, do not take my answer to be any kind of API contract or promise of future functionality or performance. We encapsulate the storage details so that you don't have to worry about them, and also so that we have the freedom to change them if needed.
Answer
Azure Cognitive Search uses Apache Lucene under the hood, which manages the inverted indexes. As of the time of this writing, those indexes are stored on Azure virtual machine disks, which are backed by page blobs. The exact SKU of disks used depends on pricing tier and other factors; I won't get into the details here (because they do change). Those disks are attached to Azure virtual machines, which for pricing tiers other than Free map to the "search units" you pay for.

Maintenance required for Azure SQL DB in the long term

What is the maintenance required from an organization when deploying an Azure SQL Database in the long term?
My current organization is hoping to do as little database management as possible, and have looked for products that fully manage our databases without much intervention needed from our end. Some products that are being considered includes Snowflake (for their automated partitioning of tables) and Domo (for their data warehousing, connectors, and BI tool offerings).
I'm leaning towards using Azure SQL DB for multiple reasons (products offered, transparent pricing, integration ease, available documentation, SSO, etc.), but want to first understand the skills needed and ease in maintaining it in the long run.
Will we have to manually rebuild indexes and partition out tables as we scale up? Or is Azure intelligent enough that it'll do most of the heavy lifting of performance optimization itself?
Does Azure or other vendors provide services to optimize a DB?
Sorry for the vague prompts, but any additional considerations in choosing DB vendors would be great. Thanks!

Actually for your questions, you should know what is Azure SQL database and it's capabilities.
I'm leaning towards using Azure SQL DB for multiple reasons (products offered, transparent pricing, integration ease, available documentation, SSO, etc.), but want to first understand the skills needed and ease in maintaining it in the long run.
This document What is Azure SQL Database service introduced almost all message you want to know. SQL Database is a general-purpose relational database managed service in Microsoft Azure that supports structures such as relational data, JSON, spatial, and XML. SQL Database delivers dynamically scalable performance within two different purchasing models: a vCore-based purchasing model and a DTU-based purchasing model. SQL Database also provides options such as columnstore indexes for extreme analytic analysis and reporting, and in-memory OLTP for extreme transactional processing. Microsoft handles all patching and updating of the SQL code base seamlessly and abstracts away all management of the underlying infrastructure.
Will we have to manually rebuild indexes and partition out tables as we scale up? Or is Azure intelligent enough that it'll do most of the heavy lifting of performance optimization itself?
No, you don't. Scalability is one of the most important characteristics of PaaS that enables you to dynamically add more resources to your service when needed. Azure SQL Database enables you to easily change resources (CPU power, memory, IO throughput, and storage) allocated to your databases.
You can mitigate performance issues due to increased usage of your application that cannot be fixed using indexing or query rewrite methods. Adding more resources enables you to quickly react when your database hits the current resource limits and needs more power to handle the incoming workload. Azure SQL Database also enables you to scale-down the resources when they are not needed to lower the cost.
For more details, please reference: Scale Up/Down.
Does Azure or other vendors provide services to optimize a DB?
As Woblli said, Azure SQL database provides the Azure SQL database Monitoring and tuning for you.
As a complement, you also can use Azure SQL Database Automatic tuning to help you optimize the database automatically.
Hope this helps.

Azure SQL DB offers the services you're asking.
You can enable automatic tuning, which will create and drop indexes based on performance gains. Force good query plans again based on performance. It will roll back changes if the specific change has decreased the overall database performance level.
It will not partition or shard your database for you however.
Official documentation:
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning

Azure Search performance while uploading content

Lately I'm facing some performance issues while querying over my Azure search service index. I'm trying to figure out what happens. I came across the following article:
Azure Search performance and optimization considerations
It says:
Uploading of content to Azure Search will impact the overall performance and latency of the Azure Search service. If you expect to send data while users are performing searches, it is important to take this workload into account in your tests.
I want to clarify something. If, for example, I have two indexes on my search service account, let say: index-a, index-b.
If I upload content to index-a, it will impact the overall performance and latency of index-b?

If both indexes are within the same service, then yes, one index will have its performance affected by the other one. How much it's affected will depend on the service tier and the amount of information you are indexing.

Sync mechanism to azure search - How Reliable is azure search insertion?

How reliable is the insertion mechanism to azure search?
Say, a call on average to upload to azure search. Are there any slas on this? average insertion time for one document, average failure rate for one document.
I'm trying to send data from my database to azure search and I was wondering if it was more reliable to send data directly to azure search, or do a dual write for example to a high available queue like kafka and read from there.

From SLA for Azure Search:
We guarantee at least 99.9% availability for index query requests when
an Azure Search Service Instance is configured with two or more
replicas, and index update requests when an Azure Search Service
Instance is configured with three or more replicas. No SLA is provided
for the Free tier.
Your client code needs to follow the best practices: batch indexing requests, retry on transient failures with an exponential back-off policy, and scale service appropriately based on the size of the documents and indexing load.
Whether or not use an intermediate buffer depends not so much on SLA, but how spiky your indexing load will be, and how decoupled you want your search indexing component to be.
You may also find Capacity planning for Azure Search useful.

Microsoft Azure DocumentDB vs Azure Table Storage

For several recent years, Microsoft offers a "NoSQL" key/value storage, called "Table Storage" (http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/)
Table Storage offers a high performance, scalability (via partitioning) and relatively low cost. A primary drawback of Tables that only Partition and Row keys can be indexed - so making queries on values is very inefficient.
Recently Microsoft announced a new "NoSQL" service, called "DocumentDB" (http://azure.microsoft.com/en-us/documentation/services/documentdb/)
Instead of storing a list of properties (like Tables do), DocumentDB stores JSON objects. The whole object being indexed - so efficient queries may be created based on every property and any nested property of stored objects.
Microsoft says that DocumentDB provides high performance and scalability as well.
If that's so - why anyone would use Table Storage over DocumentDB? It sounds like DocumentDB provides the same functionality as Tables, but with additional capabilities such as the ability to index anything.
I will glad if someone could make a comparison between DocumentDB and Table Storage, highlighting cons and pros of each one.

Both are NoSQL technologies, but they are massively different. Azure Tables is a simple Key/Value store and does not support complex functionality like complex queries (most of them will require a full partition/table scan anyway, which will kill your performance and your cost savings), custom indexing (indexing is based on PartitionKey and RowKey only, you currently can't index on any other entity property and searching for anything other than PartitionKey/RowKey combination will require a partition/table scan), or stored procedures. You also can't batch read requests for multiple entities (through batch write requests are supported if all the entities belong to the same partition). For a real-life application of Azure Tables, see HERE.
If your data needs (particularly around querying them) are simple (like in the example above), then Azure Tables provide what you need, you might end up using that in favor of DocDB due to pricing, performance and storage capacity. For example, Azure Tables performance target is 20.000 operations per second. Trying to get that same level of performance on DocDB will have a significantly higher service cost for you. Also, Azure tables are limited by the capacity of your Azure storage account (500TB), whereas DocDB storage is limited by the capacity units you buy.

Table Services is mainly a key-value type NOSQL and DocumentDB is (as the name suggests) a Document Type NoSQL store. What you are asking is essentially the difference between these two types of NOSQL approaches. If you shape your research according to this you should be able to get a better understanding for sure.
Just to keep things simple I suggest you consider the differences between how DocumentDB and Table Services are priced. Not only the cost of these services vary a lot from each other but the fact that DocumentDB works on a "provision first" model and Table Services are offered on a pure consumption based pricing might give you some clues on your compare/contrast.
Let me ask you this; why would I use DocumentDB if the features in Table Services well serve my needs? ;) I suggest you to take a look at how the current Azure Diagnostics tooling use Azure Storage Services, how Storage Metrics use Azure Storage on itself to get a sense of how useful Table Services would be and how overkill DocumentDB might be in some situations.
Hope this helps.

I think that the comparison is all about trading price for performance. Table Services are just Storage Services, which seem to cap out at 20,000 ops/second, but paying for that kind of throughput all the time (because Storage gives it to us all the time) is $1,200/month. Crazy money.
Table services have simple indexes, so queries are very limited. Good for anything that is written and read via IDs. DocumentDB indexes the entire document, so a query can be done on any property.
And lastly, Table services are bound by the storage constraint of the Storage account it's on (which could get crazy high given negotiation with Microsoft directly), where DocumentDB storage seems unlimited.
So it's a balance. Do you have a LOT of data (hundreds of gigs, or terabytes) that you need in one place? DocumentDB. Do you need to support complex queries? DocumentDB. Do you have data that needs to come and go fast, but based on a 1-to-2 property lookup? Table services. Would you trade having to code around a simple index in order to avoid paying through the nose for throughput? Table services.
And Redis, someone mentioned that... man, I dunno. Even the existence of persistence in a caching framework (which Redis offers) doesn't turn it into a tech of choice... There is a huge difference between a persistent store that holds data that is "often used, but may be missing or time-retired", like a cache would, and a persistent store that guarantees your data to be there.

A real life example:
I have to store some tokens, retrieve them, delete them. Only query ever done will be based on User ID.
So I use Table Storage, as it fulfill my requirement perfectly. I save the token against User ID.
Document DB seemed to be overkill for this.

Here is the answer from microsoft's official docs
Common attributes of Cosmos DB, Azure Table Storage, and Azure SQL Database:
99.99 availability SLA
Fully managed database services
ISO 27001, HIPAA and EU Model Clauses Compliant
The following table shows the uncommon attributes of Azure Cosmos DB,
Azure Table Storage

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string