Using a script to conditionally update a document in Elasticsearch - groovy

I have a use case in which concurrent update requests make hit my Elasticsearch cluster. In order to make sure that a stale event (one that is made irrelevant by a newer request) does not update a document after a newer event has already reached the cluster, I would like to pass a script with my update requests to compare a field to determine if the incoming request is relevant or not. The request would look like this:
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '
{
"script": " IF ctx._source.user_update_time > my_new_time THEN do not update ELSE proceed with update",
"params": {
"my_new_time": "2014-09-01T17:36:17.517""
},
"doc": {
"name": "new_name"
},
"doc_as_upsert": true
}'
Is the pseudo code I wrote in the "script" field possible in Elasticsearch ? If so, I would love some help with the syntax (groovy, python or javascript).
Any alternative approach suggestions would be greatly appreciated too.

Elasticsearch has built-in optimistic concurrency control (+ here and here).
The way it works is that the Update API allows you two use the version parameter in order to control whether the update should proceed or not.
So taking your above example, the first index/update operation would create a document with version: 1. Then take the case where you have two concurrent requests. Both components A and B will send an updated document, they initially have both retrieved the document with version: 1 and will specify that version in their request (see version=1 in the query string below). Elasticsearch will update the document if and only if the provided version is the same as the current one
Component A and B both send this, but A's request is the first to make it:
curl -XPOST 'localhost:9200/test/type1/1/_update?version=1' -d '{
"doc": {
"name": "new_name"
},
"doc_as_upsert": true
}'
At this point the version of the document will be 2 and B's request will end up with HTTP 409 Conflict, because B assumed the document was still at version 1, even though the version increased in the meantime due to A's request.
B can definitely retrieve the document with the new version (i.e. 2) and try its update again, but this time with ?version=2in the URL. If it's the first one to reach ES, the update will succeed.

I think the script should be like this:
"script": "if(ctx._source.user_update_time > my_new_time) ctx._source.user_update_time=my_new_time;"
or
"script": "ctx._source.user_update_time > my_new_time ? ctx.op=\"none\" : ctx._source.user_update_time=my_new_time"

Related

How can I use the REST API to Remove Hold on a Bill?

I've created a bill using the REST API - but now I want to release it (which exists as an action) - but I'm assuming I have to remove the HOLD before I do that. I don't see that as an action. How would I go about doing that? And if this topic exists somewhere in the documentation, please let me know - I haven't been able to find anything.
Thanks...
You can extend the Bills endpoint and add action to it.
You need to add a new action to the Actions node of the Bills endpoint in the extended Web Service Endpoint like below
Then you can use that action like the below:
curl --request POST 'http://localhost/Acumatica/entity/DefaultExt/22.200.001/Bill/ReleaseFromHold'
--data-raw '{
"Entity": {
"Type": {"value":"Bill"},
"ReferenceNbr":{"value":"003238"}
}
}'
Since the hold status is still defined using the hold field, you can achieve that result by passing the false value to the Hold field.
{{BaseUrl}}/entity/default/22.200.001/Bill
{
"id": "589535fc-4a60-ed11-84a8-00e04cc71f57",
"Hold": {"value": false}
}

ExpressJS: How to cache on demand

I'm trying to build a REST API with express, sequelize (PostgreSQL dialect) and node.
Essentially I have two endpoints:
Method
Endpoint
Desc.
GET
/api/players
To get players info, including assets
POST
/api/assets
To create an asset
And there is a mechanism which updates a property (say price) of assets, over a cycle of 30 seconds.
Goal
I want to cache the results of GET /api/players, but I want some control over it, so that whenever a user creates an asset (using POST /api/assets) and right after that a request to GET /api/players should give the updated data (i.e. including the property which updates for every 30 seconds) and cache it until it gets updated in the next cycle.
Expected
The following should demonstrate it:
GET /api/players
JSON Response:
[
{
"name": "John Doe"
"assets": [
{
"id":1
"price": 10
}
]
}
]
POST /api/assets
JSON Request:
{
"id":2
}
GET /api/players
JSON Response:
{
"name": "John Doe"
"assets": [
{
"id":1
"price": 10
},
{
"id":2
"price": 7.99
}
]
}
What I have managed to do so far
I have made the routes, but GET /api/players has no cache mechanism and basically queries the database every time it is requested.
Some solutions I have found, but none seem to meet my scenario
apicache (https://www.youtube.com/watch?v=ZGymN8aFsv4&t=1360s): But I don't have a specific duration, because a user can create an asset anytime.
Example implementation
I have seen (kind off) similar implementation (that I desire) in Github actions workflow for implementing cache, where you define a key and unless the key has changed it uses the same packages and doesn't install packages everytime, (example: https://github.com/python-discord/quackstack/blob/6792fd5868f28573bb8f9565977df84e7ba50f42/.github/workflows/quackstack.yml#L39-L52)
Is there any package, to do that? So that while processing POST /api/assets I can change the key in its handler, and thus GET /api/players gives me the updated result (also I can change the key in that 30 seconds cycle too), and after that it gives me the cached result (until it is updated in the next cycle).
Note: If you have a solution please try to stick with some npm packages, rather than something like redis, unless its the only/best solution.
Thanks in advance!
(P.S. I'm a beginner and this is my first question in SO)
Typically caching is done with help of Redis. Redis is in-memory key-value store. You could handle the cache in the following manner.
In your handler for POST operation update/reset cached entry for players.
In your handler for GET operation if the Redis has the entry in cache return it, otherwise do the logic query the data, add the entry to the cache and return the data.
Alternatively, you could use Memcached.
A bit late to this answer but I was looking for a similar solution. I found that the apicache library not only allows for caching for specified durations, but the cache can also be manually cleared.
apicache.clear([target]) - clears cache target (key or group), or entire cache if no value passed, returns new index.
Here is an example for your implementation:
// POST /api/assets
app.post('/api/assets', function(req, res, next) {
// update assets then clear cache
apicache.clear()
// or only clear the specific players cache by using a parameter
// apicache.clear('players')
res.send(response)
})

I am using river-couchdb with ElasticSearch and would like to reset its last_seq to 0. Anybody know if that's possible?

I am using this a plugin with elasticSearch called river-couchdb to create a full text index of my couchdb. It uses the couchdb _changes api to listen for documents. I assume it is keeping track of the last seq from the _changes api.
Sometimes we rebuild our CouchDB and set our last-seq back to 0. Only way I've found to reset the river-couchdb seq is to delete both its index and the river itself and recreate it. Is there a better way?
As far as I remember, you have a _seq document in your _river index for your river.
This document has a _last_seq entry.
If you want to restart from scratch, I think you can simply delete this document:
curl -XDELETE localhost:9200/_river/yourrivername/_seq
Does it help?
From couchdb-river manual:
starting-at-a-specific-sequence
curl -XDELETE localhost:9200/_river/yourrivername/_seq
curl -XPUT 'localhost:9200/_river/yourrivername/_seq' -d '
{
"couchdb": {
"last_seq": "100"
}
}'
Elasticsearch cant update last_seq without deleting old document

Can I retrieve all revisions of a deleted document?

I know I can retrieve all revisions of an "available" document, but can I retrieve the last "available" version of a deleted document? I do not know the revision id prior to the delete. This is the command I am currently running...it returns {"error":"not_found","reason":"deleted"}.
curl -X GET http://localhost:5984/test_database/a213ccad?revs_info=true
I've got this problem, trying to recover deleted document, here is my solution:
0) until you run a compaction, get deleted history, e.g.:
curl http://example.iriscouch.com/test/_changes
1) you'll see deleted documents with $id and $rev, put empty document as new version, e.g.:
curl -X PUT http://example.iriscouch.com/test/$id?rev=$rev -H "Content-Type: application/json" -d {}
2) now you can get all revisions info, e.g:
curl http://example.iriscouch.com/test/$id?revs_info=true
See also Retrieve just deleted document
Besides _changes, another good way to do this is to use keys with _all_docs:
GET $MYDB/_all_docs?keys=["foo"] ->
{
"offset": 0,
"rows": [
{
"id": "foo",
"key": "foo",
"value": {
"deleted": true,
"rev": "2-eec205a9d413992850a6e32678485900"
}
}
],
"total_rows": 0
}
Note that it has to be keys; key will not work, because only keys returns info for deleted docs.
You can get the last revision of a deleted document, however first you must first determine its revision id. To do that, you can query the _changes feed and scan for the document's deletion record — this will contain the last revision and you can then fetch it using docid?rev=N-XXXXX.
I remember some mailinglist discussion of making this easier (as doing a full scan of the changes feed is obviously not ideal for routine usage), but I'm not sure anything came of it.
I've hit this several times recently, so for anyone else wandering by ...
This question typically results from a programming model that needs to know which document was deleted. Since user keys such as 'type' don't survive deletion and _id is best assigned by couch, it would often be nice to peak under the covers and see something about the doc that was deleted. An alternative is to have a process that sets deleted:True (no underscore) for documents, and to adjust any listener filters, etc., to look for deleted:True. One of the processes can then actually delete the document. This means that any process triggering on the document doesn't need to track an _id for eventual deletion.

Change notification in CouchDB when a field is set

I'm trying to get notifications in a CouchDB change poll as soon as pre-defined field is set or changed. I've already had a look at filters that can be used for filtering change events(db/_changes?filter=myfilter). However, I've not yet found a way to include this temporal information, because you can only get the current version of the document in this filter functions.
Is there any possibility to create such a filter?
If it does not work, I could export my field to a separate database and the only poll for changes in that db, but I'd prefer to keep together my data for obvious reasons.
Thanks in advance!
You are correct: filters and _changes feeds can only see snapshots of a document. What you need is a function which can see the old document and the new document and act correctly. But that is unavailable in _filters and _changes.
Obviously your client code knows if it updates that field. You might update your client code however there is a better solution.
Update functions can access both documents. I suggest you make an _update
function which notices the field change and flags that in the document. Next you
have a simple filter checking for that flag. The best part is, you can use a
rewrite function to make the HTTP API exactly the same as before.
1. Create an update function to flag interesting updates
Your _design/myapp would be {"updates", "smart_updater": "(see below)"}.
Update functions are very flexible (see my recent update handlers
walkthrough). However we only want to mimic the normal HTTP/JSON API.
Your updates.smart_updater field would look like this:
function (doc, req) {
var INTERESTING = 'dollars'; // Set me to the interesting field.
var newDoc = JSON.parse(req.body);
if(newDoc.hasOwnProperty(INTERESTING)) {
// dollars was set (which includes 0, false, null, undefined
// values. You might test for newDoc[INTERESTING] if those
// values should not trigger this code.
if((doc === null) || (doc[INTERESTING] !== newDoc[INTERESTING])) {
// The field changed or created!
newDoc.i_was_changed = true;
}
}
if(!newDoc._id) {
// A UUID generator would be better here.
newDoc._id = req.id || Math.random().toString();
}
// Return the same JSON the vanilla Couch API does.
return [newDoc, {json: {'id': newDoc._id}}];
}
Now you can PUT or POST to /db/_design/myapp/_update/[doc_id] and it will feel
just like the normal API except if you update the dollars field, it will add
an additional flag, i_was_changed. That is how you will find this change
later.
2. Filter for documents with the changed field
This is very straightforward:
function(doc, req) {
return doc.i_was_changed;
}
Now you can query the _changes feed with a ?filter= parameter. (Replication
also supports this filter, so you could pull to your local system all documents
which most recently changed/created the field.
That is the basic idea. The remaining steps will make your life easier if you
already have lots of client code and do not want to change the URLs.
3. Use rewriting to keep the HTTP API the same
This is available in CouchDB 0.11, and the best resource is Jan's blog post,
nice URLs in CouchDB.
Briefly, you want a vhost which sends all traffic to your rewriter (which itself
is a flexible "bouncer" to all design doc functionality based on the URL).
curl -X PUT http://example.com:5984/_config/vhosts/example.com \
-d '"/db/_design/myapp/_rewrite"'
Then you want a rewrites field in your design doc, something like (not
tested)
[
{
"comment": "Updates should go through the update function",
"method": "PUT",
"from": "db/*",
"to" : "db/_design/myapp/_update/*"
},
{
"comment": "Creates should go through the update function",
"method": "POST",
"from": "db/*",
"to" : "db/_design/myapp/_update/*"
},
{
"comment": "Everything else is just like normal",
"from": "*",
"to" : "../../../*"
}
]
(Once again, I got this code from examples and existing code I have laying
around but it's not 100% debugged. However I think it makes the idea very clear.
Also remember this step is optional however the advantage is, you never have to
change your client code.)

Resources