Where indices are stored behind Azure Cognitive Search service? - azure

Please check this tweet chain. I am working on a PoC using Azure Cognitive Search Service and I am comparing it with AWS. AWS seems to be using MongoDB Atlas to store the indices and Search function is basically is on Mongo's default search capability which is built on Apache Lucene. I am trying to find how the inverted indices are stored behind the scenes of Azure Cognitive Search. They are using Apache Lucene which serves as the search engine to search the index.

Disclaimer
This answer should be considered accurate only as of July 2020, because implementation details do change. This information isn't material to which service is "better" for any particular purpose; just interesting for the sake of curiosity.
Also, do not take my answer to be any kind of API contract or promise of future functionality or performance. We encapsulate the storage details so that you don't have to worry about them, and also so that we have the freedom to change them if needed.
Answer
Azure Cognitive Search uses Apache Lucene under the hood, which manages the inverted indexes. As of the time of this writing, those indexes are stored on Azure virtual machine disks, which are backed by page blobs. The exact SKU of disks used depends on pricing tier and other factors; I won't get into the details here (because they do change). Those disks are attached to Azure virtual machines, which for pricing tiers other than Free map to the "search units" you pay for.

Related

Azure Split/Merge Service, is it still relevant?

I have managed to get the C# and db setup using ListMappings. However, when I try to deploy the split/merge tool to Azure cloud classic the service it states 'The requested VM tier is currently not available in East US for this subscription. Please try another tier or deploy to a different location.' We tried a few other regions with the same result. Do you know if there is a workaround or updated version? Is the split / merge service even still relevant? Has anyone got this service to run on Azure lately?
https://learn.microsoft.com/en-us/azure/azure-sql/database/elastic-scale-overview-split-and-merge
The answer to the question on whether it is still relevant, in my opinion is ...no. Split\merge is no longer relevant with the maturation of elastic pools. Elastic pools with one data base per tenant seem the sustainable way to implement multi tenancy with legacy code. The initial plan was to add keys to each of our tables to have multiple tenants per database. Elastic pools give us the same flexibility without having to make breaking changes our existing code.
Late post here, but we are implementing ElasticScale for a client to split ~50 clients into a database-per-tenant model. I don't think the SplitMerge tool will be used over the long term, just for the initial data migration from one db to many shards, but it has been handy for that purpose. We are using the ElasticScale SDK to allow a single API to route queries to the appropriate shard(s) based on sharding key. Happy to compare notes with you if you are still working on this.

Which other Azure services do I need to take into account when using Azure Spatial Anchors?

I would like to use Azure Spatial Anchors and understand that it is free during the preview period.
Are there other Azure services like storage, bandwidth, etc. involved which I would need to take into consideration regarding pricing when working with Azure Spatial Anchors?
Does anyone have a rough estimate how much it costs to work with a "typical" Azure Spatial Anchors project? I know this is a broad question but if someone could give me a rough estimate from their experience I'd be more than happy!
While in preview, Azure Spatial Anchors is free (pricing page).
Depending on your project, you may need to use other cloud services, which may or may not also be free. For example, you may use Cosmos DB to store Azure Spatial Anchor IDs, as outlined in the ASA sharing sample. But Cosmos DB has a free tier, so, depending on the scale of your project, that may be free as well.

Azure Search performance while uploading content

Lately I'm facing some performance issues while querying over my Azure search service index. I'm trying to figure out what happens. I came across the following article:
Azure Search performance and optimization considerations
It says:
Uploading of content to Azure Search will impact the overall performance and latency of the Azure Search service. If you expect to send data while users are performing searches, it is important to take this workload into account in your tests.
I want to clarify something. If, for example, I have two indexes on my search service account, let say: index-a, index-b.
If I upload content to index-a, it will impact the overall performance and latency of index-b?
If both indexes are within the same service, then yes, one index will have its performance affected by the other one. How much it's affected will depend on the service tier and the amount of information you are indexing.

Azure Search scalability

We are developing a mobile app that should scale for thousands of users and we are using Azure Search as our main storage. According to Azure pricing model the query limits are set to 15 queries per second/per unit for the standard plan. With these limits and with a system that should scale with thousands of concurent users we would hit the limits pretty quickly.
In our situation is Azure Search not the right option when scaling for thousands of concurrent users?
Would DocumentDB be a better option?
Thanks!
Interesting that you're using Azure Search as your primary storage, as it's not built to be a database engine. The storage is specifically for search content (type typical pattern is to use Azure Search in conjunction with a database engine, such as SQL Database or DocumentDB, for example), using results to point back to the "system of record" content in your database.
The scale for Search is specifically for full-text-search queries your users will generate. And Azure Search scales per unit, with each unit offering 15 searches / second. So, you can scale far beyond 15/sec if you buy more search units.
However: Don't confuse this with database engine queries. You asked about DocumentDB, so using that as an example: You can query far beyond 15/second with that database engine, and that scales independently. Same goes for any VM-based database solution, SQL Database, etc - they all can scale.
This really comes down to whether you need full-text-search at high volume. If so, great - just scale Azure Search to the number of units you need, to handle your request traffic. If you can do more database-specific searches, without driving your request through Azure Search, then you don't need to scale out as much, and can take advantage of the native database query capabilities.
One thing to add to David's excellent answer - if your scenario is primarily search driven and you don't need to store data for purposes other than search and are OK with eventual consistency, then using Azure Search as the primary store may be fine.
Also, 15 requests per second query throughput of Azure Search is just a ballpark - it's neither a hard limit nor a promise. Depending on your data and query complexity, the actual throughput can be significantly (many times) higher or lower.

Microsoft Azure DocumentDB vs Azure Table Storage

For several recent years, Microsoft offers a "NoSQL" key/value storage, called "Table Storage" (http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/)
Table Storage offers a high performance, scalability (via partitioning) and relatively low cost. A primary drawback of Tables that only Partition and Row keys can be indexed - so making queries on values is very inefficient.
Recently Microsoft announced a new "NoSQL" service, called "DocumentDB" (http://azure.microsoft.com/en-us/documentation/services/documentdb/)
Instead of storing a list of properties (like Tables do), DocumentDB stores JSON objects. The whole object being indexed - so efficient queries may be created based on every property and any nested property of stored objects.
Microsoft says that DocumentDB provides high performance and scalability as well.
If that's so - why anyone would use Table Storage over DocumentDB? It sounds like DocumentDB provides the same functionality as Tables, but with additional capabilities such as the ability to index anything.
I will glad if someone could make a comparison between DocumentDB and Table Storage, highlighting cons and pros of each one.
Both are NoSQL technologies, but they are massively different. Azure Tables is a simple Key/Value store and does not support complex functionality like complex queries (most of them will require a full partition/table scan anyway, which will kill your performance and your cost savings), custom indexing (indexing is based on PartitionKey and RowKey only, you currently can't index on any other entity property and searching for anything other than PartitionKey/RowKey combination will require a partition/table scan), or stored procedures. You also can't batch read requests for multiple entities (through batch write requests are supported if all the entities belong to the same partition). For a real-life application of Azure Tables, see HERE.
If your data needs (particularly around querying them) are simple (like in the example above), then Azure Tables provide what you need, you might end up using that in favor of DocDB due to pricing, performance and storage capacity. For example, Azure Tables performance target is 20.000 operations per second. Trying to get that same level of performance on DocDB will have a significantly higher service cost for you. Also, Azure tables are limited by the capacity of your Azure storage account (500TB), whereas DocDB storage is limited by the capacity units you buy.
Table Services is mainly a key-value type NOSQL and DocumentDB is (as the name suggests) a Document Type NoSQL store. What you are asking is essentially the difference between these two types of NOSQL approaches. If you shape your research according to this you should be able to get a better understanding for sure.
Just to keep things simple I suggest you consider the differences between how DocumentDB and Table Services are priced. Not only the cost of these services vary a lot from each other but the fact that DocumentDB works on a "provision first" model and Table Services are offered on a pure consumption based pricing might give you some clues on your compare/contrast.
Let me ask you this; why would I use DocumentDB if the features in Table Services well serve my needs? ;) I suggest you to take a look at how the current Azure Diagnostics tooling use Azure Storage Services, how Storage Metrics use Azure Storage on itself to get a sense of how useful Table Services would be and how overkill DocumentDB might be in some situations.
Hope this helps.
I think that the comparison is all about trading price for performance. Table Services are just Storage Services, which seem to cap out at 20,000 ops/second, but paying for that kind of throughput all the time (because Storage gives it to us all the time) is $1,200/month. Crazy money.
Table services have simple indexes, so queries are very limited. Good for anything that is written and read via IDs. DocumentDB indexes the entire document, so a query can be done on any property.
And lastly, Table services are bound by the storage constraint of the Storage account it's on (which could get crazy high given negotiation with Microsoft directly), where DocumentDB storage seems unlimited.
So it's a balance. Do you have a LOT of data (hundreds of gigs, or terabytes) that you need in one place? DocumentDB. Do you need to support complex queries? DocumentDB. Do you have data that needs to come and go fast, but based on a 1-to-2 property lookup? Table services. Would you trade having to code around a simple index in order to avoid paying through the nose for throughput? Table services.
And Redis, someone mentioned that... man, I dunno. Even the existence of persistence in a caching framework (which Redis offers) doesn't turn it into a tech of choice... There is a huge difference between a persistent store that holds data that is "often used, but may be missing or time-retired", like a cache would, and a persistent store that guarantees your data to be there.
A real life example:
I have to store some tokens, retrieve them, delete them. Only query ever done will be based on User ID.
So I use Table Storage, as it fulfill my requirement perfectly. I save the token against User ID.
Document DB seemed to be overkill for this.
Here is the answer from microsoft's official docs
Common attributes of Cosmos DB, Azure Table Storage, and Azure SQL Database:
99.99 availability SLA
Fully managed database services
ISO 27001, HIPAA and EU Model Clauses Compliant
The following table shows the uncommon attributes of Azure Cosmos DB,
Azure Table Storage

Resources