Github APIs allow use to get a check run by id: https://docs.github.com/en/rest/reference/checks#get-a-check-run / https://octokit.github.io/rest.js/v18#checks-get
This can be used to get the output of a check run and then one can append it with new output using the Update method: https://docs.github.com/en/rest/reference/checks#update-a-check-run / https://octokit.github.io/rest.js/v18#checks-update
However if some other task has externally modified the check between the read and the write, that external modification will be lost.
Given that the GitHub api for checks (https://docs.github.com/en/rest/reference/checks) does not have an upsert method, is there a general approach that can be used to perform upserts for such APIs?
I keep thinking of some method of locks, but haven't found a satisfying answer or been able to come up with a solution myself.
Related
I was looking at the sample code here: https://docs.aws.amazon.com/qldb/latest/developerguide/getting-started.python.step-5.html, and noticed that the three functions get_document_id_by_gov_id, is_secondary_owner_for_vehicle and add_secondary_owner_for_vin are executed separately in three driver.execute_lambda calls. So if there are two concurrent requests that are trying to add a secondary owner, would this trigger a serialization conflict for one of the requests?
The reason I'm asking is that I initially thought we would have to run all three functions within the same execute_lambda call in order for the serialization conflict to happen since each execute_lambda call uses one session, which in turn uses one transaction. If they are run in three execute_lambda calls, then they would be spread out into three transactions, and QLDB wouldn't be able to detect a conflict. But it seems like my assumption is not true, and the only benefit of batching up the function calls would just be better performance?
Got an answer from a QLDB specialist so going to answer my own question: the operations should have been wrapped in a single transaction so my original assumption was actually true. They are going to update the code sample to reflect this.
I'm trying to understand if the node-cache package uses locks for the cache object and can't find anything.
I tried to look at the source code and it doesn't look like it, but this answer suggests otherwise with the quote:
So there is Redis and node-cache for memory locks.
This cache is used in a CRUD server and I want to make sure that GET/UPDATE requests will not create a race condition on the data.
I don't see any evidence of locking in the code.
If two requests for the same key which is not in the cache are made one after the other, then it will launch two separate fetch() operations and whichever request comes back last is the one that will remain in the cache. This is probably not normally a problem, but an improved implementation could make only one request for that same key and have the second request just wait for the first request to provide the value that was already in flight.
Since the cache itself is all in-memory, all access to the cache is synchronous and thus regulated by Javascript's single threaded nature. So, the only place concurrency issues could affect things in the cache code itself are when they launch an asynchronous fetch() operation.
There are, of course, race conditions waiting to happen in how one uses the code that accesses the data just like there are with a database interface so the calling code has to be smart about how it uses the interface to avoid creating race conditions because of how it calls things.
Unfortunately no, you can write a unit test to confirm it.
I have written a library to fix that and also added read through method to easy the code usage:
https://github.com/KhanhPham2411/node-cache-async-lock
I have inherited a website built on Expression Engine which is having a lot of trouble under load. Looking in the server console for the database I am seeing a lot of database writes (300-800/second)
Trying to track down why we are getting so much write activity compared to read activity and seeing things like
UPDATE `exp_snippets` SET `snippet_contents` = 'some content in here' WHERE `snippet_name` = 'member_login_form'
Why would EE be writing these to the database when no administrative changes are happening and how can I turn this behavior off?
Any other bottlenecks which could be avoided? The site is using an EE ad module so I cannot easily run it through Varnish since the ads need to change on each page load - looking to try and integrate DFP instead so they can be loaded asynchronously
There are a lot of front end operations that trigger INSERT and UPDATE operations. (Having to do with tracking users, hits, sessions, also generating hashes for forms etc.)
The snippets one tho seems very strange indeed I wouldn't think that snippets would call an UPDATE under normal circumstances. Perhaps the previous developer did something where the member_login_form (which has dynamic hash in it) is written to a snippet each time it is called? Not sure why you would do it, but there's a guess.
For general speed optimization see:
Optimizing ExpressionEngine
There are a number of configs in the "Extreme Traffic" section that will reduce the number of writes (tho not the snippet one which doesn't seem to be normal behavior).
I have a use case where I need a sequence to wait for a period of time before it continues. Basically it is a "Thread.Sleep(x)", but this would mean the Thread is not available for the Thread pool. This could have consequences for high load systems. So therefore I have two questions:
1) What would be the best way to implement this use case?
2) How much of a burden would using Thread.Sleep be for WSO?
Alternative solutions, for example using topic and stuff are also welcome :)
Hope you guys can help!
Answering the questions in the responses:
We are sending requests to an external system and an offline data store (ODS; DSS component of WSO2). The external system has precedense, but when it doesn't return within one second we want the ODS to answer the request.
Alternative paths:
- The ODS is offline, in this case the system has to wait for the external system for a longer time;
- The external system returns after some time, althought the ODS result has been send to the requester we still want the response of the external system to update our ODS.
We are currently investigating clone and aggregator.
When you say, Thread.sleep(), the first thing came to my mind is using a Class Mediator. This would be an easy way to write custom logic and add a sleep.
The sample for "Writing your own Custom Mediation in Java" will help you to learn the steps for writing a Class Mediator.
You need to copy the Jar containing custom mediator class to repository/components/lib/
When you use thread sleep inside your mediation logic, the request will hang for the specified time period.
This may impact your performance. But you should be able to tune the parameters for your needs.
It all depends on your requirements.
I have the requirement to transform images attached to every document (actually need images to be shrinked to 400px width). What is the best way to achieve that? Was thinking on having nodejs code listening on _changes and performing necessary manipulations on document save. However, this have bunch of drawbacks:
a) document change does not always means that new attachment was added
b) all the time we have to process already shrinked images (at least check image width)
I think you basically have some data in a database and most of your problem is simply application logic and implementation. I could imagine a very similar requirements list for an application using Drizzle. Anyway, how can your application "cut with the grain" and use CouchDB's strengths?
A Node.js _changes listener sounds like a very good starting point. Node.js has plenty of hype and silly debates. But for receiving a "to-do list" from CouchDB and executing that list concurrently, Node.js is ideal.
Memoizing
I immediately think that image metadata in the document will help you. Fetching an image and checking if it is 400px could get expensive. If you could indicate "shrunk":true or "width":400 or something like that in the document, you would immediately know to skip the document. (This is an optimization, you could possibly skip it during the early phase of your project.)
But how do you keep the metadata in sync with the images? Maybe somebody will attach a large image later, and the metadata still says "shrunk":true. One answer is the validation function. validate_doc_update() has the privilege of examining both the old and the new (candidate) document version. If it is not satisfied, it can throw() an exception to prevent the change. So it could enforce your policy in a few ways:
Any time new images are attached, the "shrunk" key must also be deleted
Or, your external Node.js tool has a dedicated username to access CouchDB. Documents must never set "shrunk":true unless the user is your tool
Another idea worth investigating is, instead of setting "shrunk":true, you set it to the MD5 checksum of the image. (That is already in the document, in the ._attachments object.) So if your Node.js tool sees this document, it knows that it has work to do.
{ "_id": "a_doc"
, "shrunk": "md5-D2yx50i1wwF37YAtZYhy4Q=="
, "_attachments":
{ "an_image.png":
{ "content_type":"image/png"
, "revpos": 1
, "digest": "md5-55LMUZwLfzmiKDySOGNiBg=="
}
}
}
In other words:
if(doc.shrunk == doc._attachments["an_image.png"].digest)
console.log("This doc is fine")
else
console.log("Uh oh, I need to check %s and maybe shrink the image", doc._id)
Execution
I am biased because I wrote the following tools. However I have had success, and others have reported success using the Node.js package Follow to watch the _changes events: https://github.com/iriscouch/follow
And then use Txn for ACID transactions in the CouchDB documents: https://github.com/iriscouch/txn
The pattern is,
Run follow() on the _changes URL, perhaps with "include_docs":true in the options.
For each change, decide if it needs work. If it does, execute a function to make the necessary changes, and let txn() take care of fetching and updating, and possible retries if there is a temporary error.
For example, Txn helps you atomically resize the image and also update the metadata, pretty easily.
Finally, if your program crashes, you might fetch a lot of documents that you already processed. That might be okay (if you have your metadata working); however you might want to record a checkpoint occasionally. Remember which changes you saw.
var db = "http://localhost:5984/my_db"
var checkpoint = get_the_checkpoint_somehow() // Synchronous, for simplicity
follow({"db":db, "since":checkpoint}, function(er, change) {
if(change.seq % 100 == 0)
store_the_checkpoint_somehow(change.seq) // Another synchronous call
})
Work queue
Again, I am embarrassed to point to all my own tools. But image processing is a classic example of a work queue situation. Every document that needs work is placed in the queue. An unlimited, elastic, army of workers receives a job, fixes the document, and marks the job done (deleted).
I use this a lot myself, and that is why I made CQS, the CouchDB Queue System: https://github.com/iriscouch/cqs
It is for Node.js, and it is identical to Amazon SQS, except it uses your own CouchDB server. If you are already using CouchDB, then CQS might simplify your project.