Document Fetched from primary shard or replica - search

Why the documents are fetched from primary shard and replica shard when I run the same query again and again. Because of this I am getting different search results.
Example response - 1 - Replica
"_shard": 0,
"_node": "node_1",
"_index": "sample_ind",
"_type": "my_type",
"_id": "E1",
"_score": 2.9560382,
Response-2 Primary shard
"_shard": 0,
"_node": "node_2",
"_index": "sample_ind",
"_type": "my_type",
"_id": "E2",
"_score": 2.956294,
node-1 has the replica shard and node-2 has the primary shard. How the query fetch works and why the response comes from primary shard and replica shard when i run the same query multiple time ?

Difficult to say, may you give me more detail about your results ?
Elastic's site contains a good article to understand how to query fetch results from primary/replica shards: https://www.elastic.co/guide/en/elasticsearch/guide/current/_query_phase.html
Hth,

This is basic Elasticsearch information and I strongly suggest to go over the documentation to at least grasp the elementary knowledge about Elasticsearch.
In short, when a query comes to the cluster, the shards that need to be queried can be either primaries or replicas. It does not matter, they have the same data in them and can perform the query equally. I don't recommend running your queries against only primaries or only replicas, as it will create hot-spots in your cluster and can destabilize the cluster.
Also, the scoring on primaries and replicas should be almost the same. Part of the algorithm to calculate the score involves how many documents exist in the shard and the frequency of terms in these documents. The tricky part is that when you update or delete a document, that document is not immediately removed from disk, it is only marked for deletion. In the background Elasticsearch merges shard files and takes smaller, similar in size segments and create a bigger segment and deletes the smaller ones. At merging time the marked-as-deleted documents actually are removed from the index.
Until then, these documents are not returned in searches, but they are considered when calculating the scores as mentioned above.

Related

Azure cosmos DB partition key design selection

Selecting partition key is a simple but important design choice in Azure Cosmos DB. In terms of improving performance and costs (RUs). Azure cosmos DB does not allow us to change partition key. So it is very important to select right partition key.
I gone through Microsoft documents Link
But I still have confusion to choose partition key
Below is the item structure, I am planning to create
{
"id": "unique id like UUID", # just to keep some unique ID for item
"file_location": "/videos/news/finance/category/sharemarket/it-sectors/semiconductors/nvidia.mp4", # This value some times contains special symbols like spaces, dollars, caps and many more
"createatedby": "andrew",
"ts": "2022-01-10 16:07:25.773000",
"directory_location": "/videos/news/finance/category/sharemarket/it-sectors/semiconductors/",
"metadata": [
{
"codec": "apple",
"date_created": "2020-07-23 05:42:37",
"date_modified": "2020-07-23 05:42:37",
"format": "mp4",
"internet_media_type": "video/mp4",
"size": "1286011"
}
],
"version_id": "48ad8200-7231-11ec-abda-34519746721"
}
I am using Azure cosmos SQL API. By Default, Azure cosmos take cares of indexing all data. In above case all properties are indexed.
for reading items I use file_location property. Can I make file_location as primary key ? or anything else to consider.
Fews notes:
file_location values contains special characters like spaces, commas, dollars and many more.
Few containers contains 150 millions entries and few containers contains just 20 millions.
my operations are
more reads, frequent writes as new videos are added, less updates in case videos changed.
Few things to keep in mind while selecting partition keys:
Observe the query parameters while reading data, they give you good hints to what partition key candidates are.
You mentioned that few containers contain 150 million documents and few containers contain 20 million documents. Instead of number of documents stored in a container what matters is which containers are getting higher number of requests. If few containers are getting too many requests, that is a good indicator of poorly designed partition keys.
Try to distribute the request load as evenly as possible among containers so that it gets distributed evenly among the physical partitions. Otherwise, you will get hot-partition issues and will workaround by increasing throughput which will cost you more $.
Try to limit cross-partition queries as much as possible

Are secondary indices always a bad idea in Cassandra even if I specify them in conjunction with the partitioning key in all my queries?

I know that secondary indices in Cassandra are generally a bad idea because the index is stored locally in each node i.e. not distributed across the cluster which may result in a query scanning a huge number of nodes. However, I don't understand why they are still a bad idea if I always specify the partition key in my queries and only use the secondary index as a final filter. I've read that they don't scale with large amounts of data even if I specify the partition key. Is this true? and if it's then why?
In general secondary indexes are bad idea, not only for the distributed part, but also for the index size and the number of distinct value, so if you have a field with high or low cardinality,you will be spending time on scanning many rows or many columns.
Also you can have other issue while dealing with tombstones ...
To answer your question, secondary index in Cassandra doesn't scale that good, but if you use a partition key and by it you tell Cassandra which node have the data, it perform really better !
you can find more details here in section F :
https://www.datastax.com/blog/2016/04/cassandra-native-secondary-index-deep-dive
I hope this helps !
These guys have a nice write-up on the performance impacts of secondary indexes: 
https://pantheon.io/blog/cassandra-scale-problem-secondary-indexes
The main impact (from the post) is that secondary indexes are local to each node, so to
satisfy a query by indexed value, each node has to query its own records to build the
final result set (as opposed to a primary key query where it is known exactly which node
needs to be queried). So there's not just an impact on writes, but on read performance as
well.
Cassandra on a ring of five machines, with a primary index of user IDs and a secondary index of user emails. If you were to query for a user by their ID or by their primary indexed key any machine in the ring would know which machine has a record of that user. One query, one read from disk. However to query a user by their email or their secondary indexed value each machine has to query its own record of users. One query, five reads from disk. By either scaling the number of users system wide, or by scaling the number of machines in the ring, the noise to signal-to-ratio increases and the overall efficiency of reading drops. In some cases to the point of timing out also.
Please refer below link for good explanation on secondary index.
https://dzone.com/articles/cassandra-scale-problem

Write-heavy partition key strategy for Azure Cosmos DB

We’re using CosmosDB in production to store HTTP request/response audit data. The structure of this data generally looks as follows:
{
"id": "5ff4c51d3a7a47c0b5697520ae024769",
"Timestamp": "2019-06-27T10:08:03.2123924+00:00",
"Source": "Microservice",
"Origin": "Client",
"User": "SOME-USER",
"Uri": "GET /some/url",
"NormalizedUri": "GET /SOME/URL",
"UserAgent": "okhttp/3.10.0",
"Client": "0.XX.0-ssffgg;8.1.0;samsung;SM-G390F",
"ClientAppVersion": "XX-ssffgg",
"ClientAndroidVersion": "8.1.0",
"ClientManufacturer": "samsung",
"ClientModel": "SM-G390F",
"ResponseCode": "OK",
"TrackingId": "739f22d01987470591556468213651e9",
"Response": "[ REDACTED ], <— Usually quite long (thousands of chars)
"PartitionKey": 45,
"InstanceVersion": 1,
"_rid": "TIFzALOuulIEAAAAAACACA==",
"_self": "dbs/TIFzAA==/colls/TIFzALOuulI=/docs/TIFzALOuulIEAAAAAACACA==/",
"_etag": "\"0d00c779-0000-0d00-0000-5d1495830000\"",
"_attachments": "attachments/",
"_ts": 1561630083
}
We’re currently writing around 150,000 - 200,000 of documents similar to the above a day with /PartitionKey as the partition key path that's configured on the container. The value of the PartitionKey is a randomly generated number in C#.net between 0 and 999.
However, we are seeing daily hotspots where a single physical partition can hit a max of 2.5K - 4.5K RU/s and others are very low (around 200 RU/s). This has a knock on to cost implications as we need to provision throughput for our largest utilised partition.
The second factor is we're storing a fair bit of data, close to 1TB of documents, and we add a few GB each day. As a result we have currently have around 40 physical partitions.
Combining these two factors means we end up having to provision for at minimum somewhere between 120,000 - 184,000 RU/s.
I should mention that we barely ever need to query this data; apart from very occasional for ad-hoc manually constructed queries in Cosmos data explorer.
My question is... would we be a lot better off in terms of RU/s required and distribution of data by simply using the “id” column as our partition key (or a randomly generated GUID) - and then setting a sensible TTL so we don't have a continually growing dataset?
I understand this would require us to re-create the collection.
Thanks very much.
Max throughput per physical partition
While using the id or a GUID would give you better cardinality than the random number you have today, any query you run would be very expensive as it would always be cross-partition and over a huge amount of data.
I think a better choice would be to use a synthetic key that combines multiple properties that both have high cardinality and also are used to query for the data. Can learn more about these here, https://learn.microsoft.com/en-us/azure/cosmos-db/synthetic-partition-keys
As far as TTL I would definitely set that for whatever retention you need for this data. Cosmos will TTL the data off with unused throughput so will never get in the way.
Lastly, you should also consider (if you haven't already) using a custom indexing policy and exclude any paths which are never queried for. Especially the "response" property since you say it is thousands of characters long. This can save considerable RU/s in write-heavy scenarios like yours.
From my experience what I see is cosmos tends to degrade with new data. More data mean more physical partitons. So you meed more throughput to be allocated to each of them . Currently we are starting to archive old data into blob storage to avoid this kind of problems and keep the number of physical partition unchangeable. We use cosmos as hot storage and then the old data go to blobs storage as cold storage. We reduce RU allocated to each physical partitions and we save money.

CosmosDB - Querying Across All Partitions

I'm creating a logging system to monitor our (200 ish) main application installations, and Cosmos db seems like a good fit due to the amount of data we'll collect, and to allow a varying schema for the log data (particularly the Tags array - see document schema below).
But, never having used CosmosDb before I'm slightly unsure of what to use for my partition key.
If I partitioned by CustomerId, there would likely be several Gb of data in each of the 200 partitions, and the data will usually be queried by CustomerId, so this was my first choice for the partition key.
However I was planning to have a 'log stream' view in the logging system, showing logs coming in for all customers.
Would this lead to running a horribly slow / expensive cross partition query?
If so, is there an obvious way to avoid / limit the cost & speed implications of this cross partition querying? (Other than just taking out the log stream view for all customers!)
{
"CustomerId": "be806507-7cc4-4db4-881b",
"CustomerName": "Our Customer",
"SystemArea": 1,
"SystemAreaName": "ExchangeSync",
"Message": "Updated OK",
"Details": "",
"LogLevel": 2,
"Timestamp": "2018-11-23T10:59:29.7548888+00:00",
"Tags": {
"appointmentId": "109654",
"appointmentGroupId": "86675",
"exchangeId": "AAMkA",
"exchangeAlias": "customer.name#customer.com"
}
}
(Note - There isn't a defined list of SystemArea types we'll use yet, but it would be a lot fewer than the 200 customers)
Cross partition queries should be avoided as much as possible. If your querying is likely to happen with customer id then the customerid is a good logical partition key. However you have to keep in mind that there is a limit of 10GB per logical partition data.
A cross partition query across the whole database will lead to a very slow and very expensive operation but if it's not functionality critical and it's just used for infrequent reporting, it's not too much of a problem.

Search records by value or conditional value in Nosql with 100 millions records

We are looking NoSQL database where we can store more than 100 million records with many fields in Value like sets in Redis.
And database should be searchable with value. We checked Redis but it not supporting any option to search by value. because we have millions of records and we update some fields of records and then take a bunch of records which not updated at a specific time.
So, run a query on all records and then check which records not update from specific time take more time. Because in this solutions we are updating 100-200 records per minute and then take bunch record based on value.
So, Redis will not work here. We have the option to store into MongoDB but we are looking key-value database which supports search by value kind of features.
{
"_id" : ObjectId("5ac72e522188c962d024d0cd"),
"itemId" : 11.0,
"url" : "http://www.testurl.com",
"failed" : 0.0,
"proxyProvider" : "Test",
"isLocked" : false,
"syncDurationInMinute" : 60.0,
"lastUpdatedTimeUTC" : "",
"nextUpdateTimeUTC" : "",
"targetCountry" : "US",
"requestContentType" : "JSON",
"group" : "US"
}
In Aerospike, you can use predicate filtering to find records that have not been updated since a point in time, and return only the metadata of that record, which includes the record digest (its unique identifier). You can process the matched digests and do whatever update you need to do. This type of predicate filter is very fast because it only has to look at the primary index entry, which is kept in memory. See the examples in the Java client's repo.
You would not need to use a secondary index here, because you want to scan all the records in a namespace (or set of that namespace) and just check the 'last-update-time' piece of metadata of each record. Since you'll be returning just the record's digest (unique ID) and not any of its actual data, this scan will never need to read anything from SSD. It'll be very fast and lightweight on the results (again, only metadata is sent back). In the client you'll iterate the result set, build a list of IDs and then act on those with a subsequent write.

Resources