How to limit the number of items per section in an NSFetchedResultsController? - core-data

For example, your FRC fetches a news feed and groups the articles into sections by date of publication.
And then you want to limit the size of each section to be up to 10 articles each.
One option I’ve considered is having separate NSFetchedResultsControllers for each day and setting a fetch limit. But that seems unnecessary as the UI only really needs a single FRC (not to mention that the number of days is unbounded).
Edit:
I’m using a diffable data source snapshot.

If it was me, I'd leave the NSFetchedResultsController alone for this and handle it in the table view. Implement tableView(_:, numberOfRowsInSection:) so that it never returns a value greater than 10. Then the table will never ask for more than 10 rows in a section, and your UI will be as you want.

Since I’m using a diffable data source snapshot, I am able to take the snapshot I receive in the FRC delegate callback and use it to create a new snapshot, keeping only the first K items in a section.

Related

AWS Keyspaces effective pagination

I have to jump to specific rows in AWS Keyspaces table. In specific I am doing pagination so I want to be able to jump to a specific page. Do I have any options for that?
For example, I have to fetch 100 rows after 1e+6 row. And ofc I want to do it as quickly as possible.
My ideas/solutions:
Set page size to requested one (100 in this case) and iterate over all rows and get next_page until come up with a specific set
Find max possible size of the page and use max_page to iterate over the biggest possible sets of rows
But maybe there are more clever solutions?
I don't have the opportunity to somehow change the table by adding additional columns!
Pagination isn't a best practice in Cassandra because you don't know how many results you will have until you query for them. Amazon Keyspaces paginates results based on the number of rows that it reads to process a request, not the number of rows returned in the result set. As a result, some pages might contain fewer rows than you specify in PAGE SIZE for filtered queries. In addition, Amazon Keyspaces paginates results automatically after reading 1 MB of data to provide customers with consistent, single-digit millisecond read performance.
For Paginating through pages we have a tool called the Export tool. This tool allows you to asynchronously read page by page. It will read partition keys even if you skip them. In Keyspaces still has to read every partition key which means using more RCUs, but this will accomplish your goal. When you are using the tool to read by page you may see the tool stop after a certain number of pages, you just restart the tool at the page it left off at when you run it again.

Pagination after reindex in Azure Search

I am new in Azure Search Service and I am not sure I got one important thing about it:
Let's pretend the situation when I am as a client scrolling down through results of my search query:
"New Y". I have 1000 elements, every page contains 10 of it. But during my scroll reindex operation has been started and some elements changed their position concerning new updates in data source (Azure Table).
Will I see next pages during my scrolling after reindex with probably some duplicated data or it still be the old "snapshot" of data I was scrolling before?
You'll see the changes as you execute subsequent requests. To Azure Search each request is independent and it represents a new search (caching aside), which for paging scenarios just happens to have a different "skip" number.
This means that if your data is changing you might see an item more than once (if it moves across pages due to changes) or even skip one (if it moves from a page you didn't see yet to a page you already saw).
There's no way to get a strictly consistent view of search matches outside of a single result. If you need to approximate this behavior you can request a larger page (using "top"), cache the results and present them in chunks. We find that in practice this is rarely needed for most search scenarios, but if search is backing a part of an app that needs consistency you might need to something along those lines.

Dynamodb infrequently scheduled scan

I am implementing a session table with nodejs which will grow to a huge number of items. each hash key is a uuid representing a user.
In order to delete the expired sessions, I must scan the table for expired attribute and delete old sessions. I am planning to do this scan once a few days, and other than that, I don't really need high read capacity.
I came out with 2 solutions, and i would like to hear some feedback about them.
1) UpdateTable to higher capacities for only that scheduled routine, and after the scan is done, simply reduce the table capacities to it's original values.
2) Perform the scan, and when retrieving the 'LastEvaluatedKey' after an x*MB read, create a initiation delay (for not consuming all read/sec units), and then continue the scan with 'ExclusiveStartKey'.
If you're doing a scan, option 1 is your best best. This is the only real way to guarantee that you won't effect your application performance while the scan is ongoing.
The only thing you need to be sure of is that you only run this operation once a day -- I believe you can only DOWNGRADE throughput units on a DynamoDB table 2x's per day (at most).
This is an old question, but I saw it through a related question.
There is now a much better native solution: DynamoDB Time to Live
It allows you to specify one attribute per table that serves as the time to live value for each item. You can then set the attribute per item with a Unix-Timestamp that specifies when the item should be deleted.
Within about 24 hours of that timestamp, the item will be deleted at no additional charge.

Using a pList to Update Core Data

Can use a little guidance here. I get a daily update of a 15,000+ record database in XML format. This is the source data for my app in which I am using Core Data. The contents of the XML dump changes on a daily basis in the following ways:
1) Some records will be deleted.
2) New records will be added.
3) Existing records may be modified.
What is the best way to update Core Data with the daily changes from this XML file? My thinking is that I am going to have to iterate through the pList and somehow compare that to what is already in Core Data. Not sure how to do this.
I did a search on the site and found this article but not sure if this is what I need to do: Initialize Core Data With Default Data
Thank you in advance.
Darin
You didn't say specifically, but I'm guessing that your total database size is 15,000+ records, and that your XML update contains values for all of them. Here are some ideas to consider.
Do the XML records contain a date of last modification? If not, can you add that? Then note the last time your Core Data version was updated, and ignore all XML records older than that.
For the records that are deleted, you'll have to find them in Core Data and then delete them. You'll probably see better performance if you set your fetch request's result type to NSManagedObjectIDResultType. The NSPersistentObjects don't need to be fully realized in order to delete them.
If you're stuck with undated XML, try adding an entity just for change detection. Store the 6 digit pin number, and the -hash of the entire original XML string for the relevant record. Upon update, fetch the pin/hash pair and compare. If the hash values are the same, it's unlikely that the data has changed.
This is going to turn into an optimization problem. The best way to proceed will depend on the characteristics of your data: number of attributes, size of records, size of the delta in each daily update. Structure your fetch request predicates to minimize the number of fetch requests you perform (for instance, by using "IN" operator to pass multiple 6-digit pin numbers). Consider using NSDictionaryResultType if there's just one attribute you need. Measure first, optimize second.

Rankings in Azure Table

I am just stuck in a design problem. I want to assign ranks to user records in a table. They do some action on the site and given a rank on basis of leader board. And the select I want on them could be on Top 10, User's position, Top 10 logged in today etc.
I just can not find a way to store it in Azure table. Than I thought about storing custom collection object (a sorted list) in blob.
Any suggestions?
Table entities are sorted by PartitionKey, RowKey. While you could continually delete and recreate users (thus allowing you to change the PK, RK) to give the correct order, it seems like a bad idea or at least overkill. Instead, I would probably store the data that you use the compute the rankings and periodically compute and store the rankings (as you say). We do this a lot in our work - pre-compute what the data should look like in JSON view, store it in a blob, and let the UI query it directly. The trick is to decide when to re-compute the view. After a user does an item that would cause the rankings to be re-computed, I would probably queue a message and let a worker process go and re-compute the view. This prevents too many workers from trying to update the data at once.

Resources