Cursor based pagination in Cassandra? - cassandra

I am planning to build cursor-based pagination APIs for the data in Cassandra.
Does Cassandra Java driver support cursor-based pagination? Or what would be a good approach to achieve this?

Please review this DataStax document as it talks about large result sets and how to save "state" to continue where you left off:
https://docs.datastax.com/en/developer/java-driver-dse/1.2/manual/paging/
In particular, the piece I believe that discusses what you're trying to do:
"Saving and reusing the paging state
Sometimes it is convenient to save the paging state in order to restore it later. For example, consider a stateless web service that displays a list of results with a link to the next page. When the user clicks that link, we want to run the exact same query, except that the iteration should start where we stopped on the previous page."

Related

Robot's Tracker Threads and Display

Application: The purposed application has an tcp server able to handle several connections with the robots.
I choosed to work with database/ no files, so i'm using a sqlite db to save information about the robots and their full history, models of robots, tasks, etc...
The robots send us several data like odometry, tasks information, and so on...
I create a thread for every new robot's connection to handle the messages and update the informations of the robots on the database. Now lets start talk about my problems:
The application got to show information about the robots in realtime, and I was thinking about using QSqlQueryModel, set the right query and the show it on a QTableView but then I got to some problems/ solutions to think about:
Problem number 1: There are informations to show on the QTableView that are not on the database: I have the current consumption on the database and the actual charge on the database in capacity, but I want to show also on my table the remaining battery time, how can I add that column with the right behaviour (math implemented) in my TableView.
Problem number 2: I will be receiving messages each second for each robot, so, updating the db and the the gui(loading the query) may not be the best solution when I have a big number of robots connected? Is it better to update the table, and only update the db each minute or something like this? If I use this method I cant work with the table with the QSqlQueryModel to update the tables, so what is the approach that you recommend me to use?
Thanks
SancheZ
I have run into similar problem before; my conclusion was QSqlQueryModel is not the best option for display purposes. You may want some processing on query results, or you may want to create, remove, change display data based on the result for a fancier gui. I think best is to implement your own delegates and override the view related methods - setData, setEditor
This way you have the control over all your columns and direct union of raw data and its display equivalent (i.e. EditData, UserData).
Yes, it is better if you update your view real-time and run a batch execute at lower frequency to update the big data. In general app is the middle layer and db is a bottom layer for data monitoring, unless you use db in memory shared cache.
EDIT: One important point, you cannot run updates in multiple threads (you can, but sqlite blocks the thread until it gets the lock) so it is best to run update from a single thread

Loading of view takes too long in XPages

I have 4 databases and they have more than 200.000 datas. A viewPanel which shows all datas of database does not load correcttly. It turns out with an error after little bit waiting. If that view does not have lots of datas no error is given.
I could not find a solution for this situation :(
I added this line into Application Property but It did not solved my problem.
xsp.domino.view.navigator=ByNoteId
Regards
Cumhur Ata
There are a number of performance "sins" you can commit on Domino. Unfortunately Domino is too forgiving and somehow still works even if you do them. The typical sins:
Using #Yesterday, #Today, #Now, #Tomorrow ind a view selection formula or a sorted column in a view. I wrote an article about your options to mitigate that
Having code that does a view.refresh before opening a page
Using reader fields and accessing a view that is not categorized by that reader field. Hits only users who can see only few documents. Check this article for possible remedies
Not having a fast temp location for view rebuilds. Typical errors are: not enough disk I/O or having your transaction log on the same channel as your databases. Make sure you have a high performance server
For Windows servers: not taking care of disk fragmentation - includes links to performance trouble shooting
Not using ODS51/52 and have compression for data and design active. Takes a simple command to fix it
That's off my head what you can check. Loading 200k documents into a panel in one go doesn't look like a good UX approach. Paginate it eventually

Limiting documents in temporary views

I'm playing around with the map and reduce through temporary views, but at 1,000,000+ documents it is a bit slow, rather than creating a separate dataset for testing, is it possible to only use a subset of data in the temporary view?
A map-reduce view is more like "CREATE INDEX" than it is like "SELECT * FROM".
In other words, when you do a map-reduce view, CouchDB will crunch through every document.
However, for testing, one thing you can do is make a normal view (not temporary). Just develop your work in a temporary design document, _design/my_experiments.
Save your map-reduce view code and then query the view with the ?stale=update_after option. You will probably get no results, however stale=update_after will tell CouchDB to begin processing the view. Now try your query again. You will see the results that have been processed so far. Now try a third time. You will see even more data reflected.
Roughly speaking, views process documents in the same order that a _changes query returns them to you: basically the first update is processed first, and then in order and the most recent change is processed last.

Most efficient way to create couchdb views

My CouchDB view indexes are being created slower than I would like. Writing the documents is not such a problem but the users can edit them offline and then bulk update, which seems to slow things right down.
This answer helped but I was just wondering is it better to separate out various views into different design documents (eg1) or to store them all in one (eg2).
Eg. 1
_design/posts/_view/id
_design/comments/_view/id
_design/tags/_view/id
Eg.2
_design/webresources/_view/_id?key="posts"
_design/webresources/_view/_id?key="comments"
_design/webresources/_view/_id?key="tags"
*This example is just for illustration purposes. I am only concerned with the time it takes to build the indexes.
You will gain better performance if you read often. Couchdb views are updated and build at read time. So you can can read the view every time the document updates to keep it hot*.
Or maybe listen to the changes feed and keep a track of documents updated. Once they reach a certain threshold value read a view.
Another option is use stale parameter.
If stale=ok is set, CouchDB will not refresh the view even if it is stale, the benefit is a an improved query latency. If stale=update_after is set, CouchDB will update the view after the stale result is returned
Every design document is a separate erlang process. So separating your views across different design documents will cause them to be built concurrently. However each view will still be built in a blocking manner. That is the two views across different design documents can start updating at the same time but the time it takes to update the individual views will be the same as if they were in the same design document.
*You don't necessarily have to care about the result. Our goal here is to trick couchdb to update the view. So you can fire off a request in a separate async process and be done with it.

How ManifoldCF job scheduling behaves?

I am working on integrating manifoldcf or mcf with alfresco cms as repository connector using CMIS query and using solr as output channel where all index are stored. I am able to do it fine & can search documents in solr index.
Now as part of implementation, i am planing to introduce multiple repository such as sharepoint, file systems etc. so now i have three document repositories : alfresco, sharepoint & filesystem. I am planning to have scheduled jobs which run through each of repositories and crawl these at particular intervals. But i have following contentions.
Although i am scheduling jobs for frequent intervals, i want to make sure that mcf jobs pick only those content which are either added new or updated say i have 100 docs dring current job run but say 110 at next job run so i only want to run jobs for new 10 docs not entire 110 docs.
As there are relatively lesser mcf tutorials available, i have no means to ensure that mcf jobs behaves this way but i assume it is intelligent enough to behave this way but again no proof to substantiate it.
I want to know more about mcf job schedule type : scan every document once/rescan documents directly. Similarly i want to know more about job invocation : complete/minimal. i would be sorry for being a newbie.
Also i am considering about doing some custom coding to ensure that only latest/updated docs are eligible for processing but again going thru code only as less documentation available.
Is it wise to doc custom coding in this case or mcf provides all these features OOTB.
Many thanks in advance.
ManifoldCF schedules the job based on what you have configured for the Job.
it depends on how you repository connector is written, usually when when job runs it runs the getDocumentVersion() of repository connector, if the version of a document specification is different that earlier version, manifold indexes that document else not. Usually your document version string is the last modified date of the document
Unfortunately, manifold does not contain much of the document from the developer perspective side, your probable bet is to go through the code. It is quite explanatory.
This is what minimal is presented as per the mcf documentation
Using the "minimal" variant of the listed actions will perform the minimum possible amount of work, given the model that the connection type for the job uses. In some cases, this will mean that additions and modifications are indexed, but deletions are not detected mcf doc jobs
you should implement your logic in public String[] getDocumentVersions(..)
OOTB feature, is quite enough. But one thing to consider additionally the permission of the documents. if the permission of the document is changed you can choose change the version of document.

Resources