How does the fetchNext(int) method works in jooq? - jooq

From the documentation of fetchNext(int number) -
"This will conveniently close the Cursor, after the last Record was fetched."
Assuming number=100 and there are 1000 records in total.
Will it close the cursor once the 100th record was fetched, or when the 1000th was fetched?
In other words, what is the "last record" referred in the documentation?
Cursor<Record> records = dsl.select...fetchLazy();
while (records.hasNext()) {
records.fetchNext(100).formatCSV(out);
}
out.close();

This convenience is a historic feature in jOOQ, which will be removed eventually: https://github.com/jOOQ/jOOQ/issues/8884. As with every Closeable resource in Java, you should never rely on this sort of auto closing. It is always better to eagerly close the resource when you know you're done using it. In your case, ideally, wrap the code in a try-with-resources statement.
What the Javadoc means is that the underlying JDBC ResultSet will be closed as soon as jOOQ's call to ResultSet.next() yields false, i.e. the database returns no more records. So, no. If there are 1000 records in total from your select, and you're only fetching 100, then the cursor will not be closed. If it were, this wouldn't be a "convenience feature", but break all sorts of other API, including the one you've called. It's totally possible to call fetchNext(100) twice, or in a loop, as you did.

Related

Consecutive calls to updateOne of mongodb: 3rd one does not work

I receive 3 post calls from client, let say in a second, and with nodejs-mongodb immediately(without any pause, sleep, etc) I try to insert the data that is posted in database using updateOne. All data is new, so in every call, insert would happen.
Here is the code (js):
const myCollection = mydb.collection("mydata")
myCollection.updateOne({name:req.data.name},{$set:{name:req.data.name, data:req.data.data}}, {upsert:true}, function(err, result) {console.log("UPDATEONE err: "+err)})
When I call just 1 time this updateOne, it works; 2 times successively, it works. But if I call 2+ times in succession, only the first two ones correctly inserted into database, and the rest, no.
The error that I get after updateOne is, MongoWriteConcernError: No write concern mode named 'majority;' found in replica set configuration. However, I always get this error, also even when the insertion is done correctly. So I don't think this is related to my problem.
Probably you will suggest to me to use updateMany, bulkWrite, etc. and you will be right, but I want to know the reason why after 2+ the insertion is not done.
Have in mind .updateOne() returns a Promise so it should be handled properly in order to avoid concurrency issues. More info about it here.
The error MongoWriteConcernError might be related to the connection string you are using. Check if there is any &w=majority and remove it as recommended here.

How to cleanup the JdbcMetadataStore?

Initially our flow of cimmunicating with google Pub/Sub was so:
Application accepts message
Checks that it doesn't exist in idempotencyStore
3.1 If doesn't exist - put it into idempotency store (key is a value of unique header, value is a current timestamp)
3.2 If exist - just ignore this message
When processing is finished - send acknowledge
In the acknowledge successfull callback - remove this msg from metadatastore
The point 5 is wrong because theoretically we can get duplicated message even after message has processed. Moreover we found out that sometimes message might not be removed even although successful callback was invoked( Message is received from Google Pub/Sub subscription again and again after acknowledge[Heisenbug]) So we decided to update value after message is proccessed and replace timestamp with "FiNISHED" string
But sooner or later we will encounter that this table will be overcrowded. So we have to cleanup messages in the MetaDataStore. We can remove messages which are processed and they were processed more 1 day.
As was mentioned in the comments of https://stackoverflow.com/a/51845202/2674303 I can add additional column in the metadataStore table where I could mark if message is processed. It is not a problem at all. But how can I use this flag in the my cleaner? MetadataStore has only key and value
In the acknowledge successfull callback - remove this msg from metadatastore
I don't see a reason in this step at all.
Since you say that you store in the value a timestamp that means that you can analyze this table from time to time to remove definitely old entries.
In some my project we have a daily job in DB to archive a table for better main process performance. Right, just because we don't need old data any more. For this reason we definitely check some timestamp in the raw to determine if that should go into archive or not. I wouldn't remove data immediately after process just because there is a chance for redelivery from external system.
On the other hand for better performance I would add extra indexed column with timestamp type into that metadata table and would populate a value via trigger on each update or instert. Well, MetadataStore just insert an entry from the MetadataStoreSelector:
return this.metadataStore.putIfAbsent(key, value) == null;
So, you need an on_insert trigger to populate that date column. This way you will know in the end of day if you need to remove an entry or not.

Guava cache not taking values from cache and refreshing for every call

I have created cache in my spark application, to refresh few values after every 6 hours.
The code looks as below.
val cachedList: Cache[String, String] = CacheBuilder.newBuilder()
.maximumSize(10)
.expireAfterWrite(refreshTimeInSeconds.toLong, TimeUnit.SECONDS)
.build()
This method gets the items and caches for me.
def prepareUnfilteredAccountList(): Array[String] = {
logger.info("Refreshing the UNFILTERED Accounts List from API")
val unfilteredAccounts = apiService.getElementsToCache().get.map(_.accountNumber).toArray
logger.trace("cached list has been refreshed New List is " +
unfilteredAccounts
}
This method is used to get the cached values into list.
def getUnfilteredList(): Array[String] = {
unfilteredAccounts.get("cached_list", new Callable[String]() {
override def call(): String = {
prepareUnfilteredAccountList().mkString(",")
}
}).split(",")
}
However, i observed that the cache is getting refreshed for every call, instead of getting refreshed after specified time period.
First off, if you want to store a list or an array in a Cache you can do so, there's no need to convert it into a string and then split it back into an array.
Second, maximumSize() configures how many entries are in your cache - by the looks of it your cache only ever has one entry (your list), so specifying a maximum size is meaningless.
Third, if you're only trying to cache one value you might prefer the Suppliers.memoizeWithExperiation() API, which is simpler and less expensive than a Cache.
Fourth, in prepareUnfilteredAccountList() unfilteredAccounts appears to be an array, but in getUnfilteredList() the same variable appears to be a cache. At best this is going to be confusing for you to work with. Use distinct variable names for distinct purposes. It's possible this is the cause of your problems.
All that said calling Cache.get(K, Runnable) should work as you expect - it will invoke the Runnable only if the given key does not already exist in the the cache and has not expired. If that isn't the behavior you're seeing the bug is likely elsewhere in your code. Perhaps your refreshTimeInSeconds is not what you expect, or you're not actually reading the value you're caching, or the cache is actually working as intended and you're misdiagnosing its behavior as erroneous.

Are old data accessible in CouchDB?

I've read a bit about CouchDB and I'm really intrigued by the fact that it's "append-only". I may be misunderstanding that, but as I understand it, it works a bit like this:
data is added at time t0 to the DB telling that a user with ID 1's name is "Cedrik Martin"
a query asking "what is the name of the user with ID 1?" returns "Cedrik Martin"
at time t1 an update is made to the DB telling: "User with ID 1's name is Cedric Martin" (changing the 'k' to a 'c').
a query asking again "what is the name of the user with ID 1" now returns "Cedric Martin"
It's a silly example, but it's because I'd like to understand something fundamental about CouchDB.
Seen that the update has been made using an append at the end of the DB, is it possible to query the DB "as it was at time t0", without doing anything special?
Can I ask CouchDB "What was the name of the user with ID 1 at time t0?" ?
EDIT the first answer is very interesting and so I've got a more precise question: as long as I'm not "compacting" a CouchDB, I can write queries that are somehow "referentially transparent" (i.e. they'll always produce the same result)? For example if I query for "document d at revision r", am I guaranteed to always get the same answer back as long as I'm not compacting the DB?
Perhaps the most common mistake made with CouchDB is to believe it provides a versioning system for your data. It does not.
Compaction removes all non-latest revisions of all documents and replication only replicates the latest revisions of any document. If you need historical versions, you must preserve them in your latest revision using any scheme that seems good to you.
"_rev" is, as noted, an unfortunate name, but no other word has been suggested that is any clearer. "_mvcc" and "_mcvv_token" have been suggested before. The issue with both is that any description of what's going on there will inevitably include the "old versions remain on disk until compaction" which will still imply that it's a user versioning system.
To answer the question "Can I ask CouchDB "What was the name of the user with ID 1 at time t0?" ?", the short answer is "NO". The long answer is "YES, but then later it won't work", which is just another way of saying "NO". :)
As already said, it is technically possible and you shouldn't count on it. It isn't only about compaction, it's also about replication, one of CouchDB's biggest strengths. But yes, if you never compact and if you don't replicate, then you will be able to always fetch all previous versions of all documents. I think it will not work with queries, though, they can't work with older versions.
Basically, calling it "rev" was the biggest mistake in CouchDB's design, it should have been called "mvcc_token" or something like that -- it really only implements MVCC, it isn't meant to be used for versioning.
Answer to the second Question:
YES.
Changed Data is always Added to the tree with a higher revision number. same rev is never changed.
For Your Info:
The revision (1-abcdef) ist built that way: 1=Number of Version ( here: first version),
second is a hash over the document-content (not sure, if there is some more "salt" in there)...
so the same doc content will always produce the same revision number ( with the same setup of couchdb) even on other machines, when on the same changing-level ( 1-, 2-, 3-)
Another way is: if you need to keep old versions, you can store documents inside a bigger doc:
{
id:"docHistoryContainer_5374",
"doc_id":"5374",
"versions":[
{"v":1,
"date":[2012,03,15],
"doc":{ .... doc_content v1....}
},
{"v":2,
"date":[2012,03,16],
"doc":{ .... doc_content v2....}
}
]
}
then you can ask for revisions:
View "byRev":
for (var curRev in doc.versions) {
map([doc.doc_id,doc.versions[curRev].v],doc.versions[curRev]);
}
call:
/byRev?startkey=["5374"]&endkey=["5374",{}]
result:
{ id:"docHistoryContainer_5374",key=[5374,1]value={...doc_content v1 ....} }
{ id:"docHistoryContainer_5374",key=[5374,2]value={...doc_content v2 ....} }
Additionaly you now can write also a map-function that amits the date in the key, so you can ask for revisions in a date-range
t0(t1...) is in couchdb called "revision". Each time you change a document, the revision-number increases.
The docs old revisions are stored until you don't want to have old revisions anymore, and tell the database "compact".
Look at "Accessing Previous Revisions" in http://wiki.apache.org/couchdb/HTTP_Document_API

Creating a pagination index in CouchDB?

I'm trying to create a pagination index view in CouchDB that lists the doc._id for every Nth document found.
I wrote the following map function, but the pageIndex variable doesn't reliably start at 1 - in fact it seems to change arbitrarily depending on the emitted value or the index length (e.g. 50, 55, 10, 25 - all start with a different file, though I seem to get the correct number of files emitted).
function(doc) {
if (doc.type == 'log') {
if (!pageIndex || pageIndex > 50) {
pageIndex = 1;
emit(doc.timestamp, null);
}
pageIndex++;
}
}
What am I doing wrong here? How would a CouchDB expert build this view?
Note that I don't want to use the "startkey + count + 1" method that's been mentioned elsewhere, since I'd like to be able to jump to a particular page or the last page (user expectations and all), I'd like to have a friendly "?page=5" URI instead of "?startkey=348ca1829328edefe3c5b38b3a1f36d1e988084b", and I'd rather CouchDB did this work instead of bulking up my application, if I can help it.
Thanks!
View functions (map and reduce) are purely functional. Side-effects such as setting a global variable are not supported. (When you move your application to BigCouch, how could multiple independent servers with arbitrary subsets of the data know what pageIndex is?)
Therefore the answer will have to involve a traditional map function, perhaps keyed by timestamp.
function(doc) {
if (doc.type == 'log') {
emit(doc.timestamp, null);
}
}
How can you get every 50th document? The simplest way is to add a skip=0 or skip=50, or skip=100 parameter. However that is not ideal (see below).
A way to pre-fetch the exact IDs of every 50th document is a _list function which only outputs every 50th row. (In practice you could use Mustache.JS or another template library to build HTML.)
function() {
var ddoc = this,
pageIndex = 0,
row;
send("[");
while(row = getRow()) {
if(pageIndex % 50 == 0) {
send(JSON.stringify(row));
}
pageIndex += 1;
}
send("]");
}
This will work for many situations, however it is not perfect. Here are some considerations I am thinking--not showstoppers necessarily, but it depends on your specific situation.
There is a reason the pretty URLs are discouraged. What does it mean if I load page 1, then a bunch of documents are inserted within the first 50, and then I click to page 2? If the data is changing a lot, there is no perfect user experience, the user must somehow feel the data changing.
The skip parameter and example _list function have the same problem: they do not scale. With skip you are still touching every row in the view starting from the beginning: finding it in the database file, reading it from disk, and then ignoring it, over and over, row by row, until you hit the skip value. For small values that's quite convenient but since you are grouping pages into sets of 50, I have to imagine that you will have thousands or more rows. That could make page views slow as the database is spinning its wheels most of the time.
The _list example has a similar problem, however you front-load all the work, running through the entire view from start to finish, and (presumably) sending the relevant document IDs to the client so it can quickly jump around the pages. But with hundreds of thousands of documents (you call them "log" so I assume you will have a ton) that will be an extremely slow query which is not cached.
In summary, for small data sets, you can get away with the page=1, page=2 form however you will bump into problems as your data set gets big. With the release of BigCouch, CouchDB is even better for log storage and analysis so (if that is what you are doing) you will definitely want to consider how high to scale.

Resources