DocuSign - How to handle system downtime - docusignapi

When we maintenance our server, or redeploy our external facing REST services for DocuSign, is there a way we can lock all envelopes that are currently sitting with signers? We use Connect to process signer/document updates from DocuSign, and we don't want these requests coming through while we're under maintenance.
I've seen in the documentation we can lock individual envelopes. Is the best route to run through each envelope that's still pending signature and temporarily lock it? This method seems very resource intensive considering the amount of consecutive API calls needed.

Connect supports exponential retires when the events fail to be sent to your endpoint. How long does your system down time take exactly?
When your system is back up, new events should arrive in your endpoint and you can react to them accordingly. Please let us know if you see otherwise.
https://developers.docusign.com/platform/webhooks/connect/architecture

Related

air traffic controller for threads when calling a REST API

DISCLAIMER: If this post is off-topic to this site, please recommend a site where this post would be appropriate.
On Ubuntu 18.04, in bash, I am writing a network-based, threaded application that requires multiple servers. It receives files through the network and processes them, ultimately making an API call that finishes the processing and logs the results to a database for later retrieval and reporting.
So far I have written the application using non-threaded programming models and concepts. That means the files are processed one at a time in real-time. This works great if there is no sudden burst of files and/or a backlog of files to process. The main bottle neck has been the way I sequentially send files to the API one after another, waiting until the entire operation has taken place for one file and the API returns the results. The API has a rate limit of 8 calls per second. But since each call takes from .75 to 1 second, my program waits until the operation is done and only processes about 1 file per second through the API. In short, I did not have to worry about scheduling API calls because I could barely do one call per second.
Since the capacity is there to process 8 files per second, and I need more speed, I have been converting my single-threaded, sequential application into a parallel, scalable, multi-threaded application. This new version can spawn enough threads to send 8 files per second to the REST API and much more. So now I have the opposite problem. I am sending too many requests per second to the REST API and am in danger of triggering penalties, etc. Ultimately, when my traffic is higher, I will upgrade my subscription to the API and get more calls per second, but this current dilemma has got me thinking about how to schedule the API calls with different threads.
The purpose of this post is to discuss an idea about how to schedule these REST API calls across various threads. Specifically, I want to discuss how to coordinate timing and usage of the API while maintaining efficiency and yet not overloading the API. In short, I want to coordinate a group of threads so that the API is properly used. Not too fast and not too slow.
Independent of my application, this idea could be useful in a number of generically similar scenarios.
My idea is to create an "air traffic controller" ("ATC") so that the threads of the application have a centralized timing authority to check when they are ready to submit files to the REST API. The ATC would know how many time slots/calls per time period (in this case, calls per second) the API can schedule. The ATC would be listening for the threads to request a time slot ("launch code") which would give them a time slot in the future to perform their API call. The ATC would decide based on the schedule of other launch codes that it has already handed out.
In my case, from the start of the upload of the file to the API, it could take 0.75 to 1 second to complete the processing and receive a response from the API. This does not affect the count of new API calls that can be performed. It is just a consideration of how long the threads will be waiting once they call the API. It may not be relevant to this overall discussion.
Each thread would obviously have to do some error handling. If the API timed out or threw an error, then the thread would have to handle it and get back in line with the ATC -if appropriate- and ask for a new launch code. Maybe it should report the error to the ATC for centralized logging?
In situations where the file processing needs burst above 8 files per second, there would be a scheduling backlog where the threads should wait their turn as assigned by the ATC.
Here are some other considerations:
Function
The ATC would be a lightweight daemon that does the following:
- listens on some TCP port
- receives a request
security token (?), thread id, priority
- authenticates the request (?)
- examines schedule
- reserves the next available time slot
- returns the launch code
security token (?), current time, launch timing offset to current time, URL and auth token for the API
- expunged expired launch codes
The ATC would need the following:
- to know what port it is supposed to run on
- to know how many slots per time period it was set to schedule
(e.g. 8 per second)
- to have a super fast read/write access to the schedule (associative array?)
- to know the URL and corresponding auth token for the thread to use
- maybe to know multiple URLs and auth tokens for load balancing
Here are more things to consider:
Security
How could we keep the ATC secure while ensuring high performance?
Network-level security (e.g. firewalls allowing only the IP addresses of the file-processing servers?)
Auth tokens or logins and passwords?
Performance
What would the requirements be for this ATC server? Would this be taxing to a CPU and memory?
Timing
How often would an NTP call be needed? By the ATC server? By the servers which call the API?
Scalability
Being able to provide different URLs and auth tokens would allow the ATC to load balance with different API providers.
Threading of the ATC itself
Would the ATC need to spawn threads to be able to handle each new request?
How does a web server handle requests?
How would the various threads share a common schedule?
In a non-threaded environment, the ATC would possibly keep an associative array in memory to keep performance as high as possible. How would the various threads of the ATC have access to the same schedule?
So here is my question. Does this exist? If not, what are some best practices in trying to build the above?
It seems like a beanstalkd kind of network service except it only provides permission/scheduling and is extremely dependant on timing.

How to send a message to ReactPHP/Amp/Swoole/etc. from PHP-FPM?

I'm thinking about making a worker script to handle async tasks on my server, using a framework such as ReactPHP, Amp or Swoole that would be running permanently as a service (I haven't made my choice between these frameworks yet, so solutions involving any of these are helpful).
My web endpoints would still be managed by Apache + PHP-FPM as normal, and I want them to be able to send messages to the permanently running script to make it aware that an async job is ready to be processed ASAP.
Pseudo-code from a web endpoint:
$pdo->exec('INSERT INTO Jobs VALUES (...)');
$jobId = $pdo->lastInsertId();
notify_new_job_to_worker($jobId); // how?
How do you typically handle communication from PHP-FPM to the permanently running script in any of these frameworks? Do you set up a TCP / Unix Socket server and implement your own messaging protocol, or are there ready-made solutions to tackle this problem?
Note: In case you're wondering, I'm not planning to use a third-party message queue software, as I want async jobs to be stored as part of the database transaction (either the whole transaction is successful, including committing the pending job, or the whole transaction is discarded). This is my guarantee that no jobs will be lost. If, worst case scenario, the message cannot be sent to the running service, missed jobs may still be retrieved from the database at a later time.
If your worker "runs permanently" as a service, it should provide some API to interact through. I use AmPHP in my project for async services, and my services implement HTTP/Websockets servers (using Amp libraries) as an API transport.
Hey ReactPHP core team member here. It totally depends on what your ReactPHP/Amp/Swoole process does. Looking at your example my suggestion would be to use a message broker/queue like RabbitMQ. That way the process can pic it up when it's ready for it and ack it when it's done. If anything happens with your process in the mean time and dies it will retry as long as it hasn't acked the message. You can also do a small HTTP API but that doesn't guarantee reprocessing of messages on fatal failures. Ultimately it all depends on your design, all 3 projects are a toolset to build your own architectures and systems, it's all up to you.

Backoff Strategy after hitting rate limits

When you hit the rate limits on getstream, the APIs start responding with errors.
What is the recommended approach as a backoff strategy to handle those failures and start recovery after that. I thought about logging them all and send all of them again after a minute or hour.
But what if user created a post (failed to be created on getstream, waiting for a backoff) and meanwhile user deletes it. The backoff script will send the post to getstream even if user deleted it.
What is recommended by getstream or anyone handled the situation like that?
As you point out, API rate-limit errors are typically handled with (exponential) backoff solutions.
This often involves additional application logic (flow control and queues) and special purpose data services / storage (message queues, async workers etc). This can add quite some complexity to an application.
When it comes to the Stream service, being rate-limited is usually an indication of either a flaw/deficiency in the implementation (much like a performance bug) or that the application has reached a scale that is beyond that the current plan is intended to support.
It'd be wise to contact Stream support directly about this.

Asana API Sync Error

I currently have a application running that passes data between Asana and Zendesk.
I have webhooks created for all my Project in Asana and all project events are sent to my webhook end point that verifies the request and tries to identify the event and update Zendesk with relevant data depending on the event type (Some events aren't required).
However I have been receiving the following request from the Webhooks just recently:
"events": [
{
"action": "sync_error",
"message": "There was an error with the event queue, which may have resulted in missed events. If you are keeping resources in sync, you may need to manually re-fetch them.",
"created_at": "2017-05-23T16:29:13.994Z"
}
]
Now because I don't poll the API for event updates I react when the events arrive with me, I haven't considered using a Sync key, the docs suggest this is only required when polling for events. Do I need to use one when using Webhooks also?
What am I missing?
Thanks in advance for any suggestions.
You're correct, you don't need to track a sync key for webhooks - we proactively try to reach out with them when something changes in Asana, and we track the events that haven't yet been delivered across webhooks (essentially, akin to us updating the sync key server-side whenever webhooks have been successfully delivered).
Basically what's happening here is that for some reason, our event queues detect that there's a problem with their internal state. This means that events didn't get recorded, or webhooks didn't get delivered after a long time. Our events and webhooks try to track changes in a best-effort sense, and there are some things that can happen with our production machines that can cause these sorts of issues, like a machine dying at an inopportune time.
Unfortunately, then, the only way to get back to a good state is to do a full scan of the projects you're tracking, which is what is meant by you may need to manually re-fetch them. Basically, a robust implementation of syncing Asana to external resources looks like:
A diff function that, given a particular task and external resource, detects what state is out of date or different between each resource and choose a merge/patch resolution (i.e. "Make Zendesk look like Asana")
Receiving a webhook runs that diff/patch process for that one task in a "live" fashion.
Periodically (on script startup, say, or when webhooks/events are missed and you get an error message like this) update all resources that might have been missed by scanning the entire project and do the diff/patch for every task. This is more expensive, but should be significantly more rare.

Commit protocol

I'm building a REST web service that receives a request and must return "Ok" if the operation was done correctly. How could I deal with the possibility of the loose of the connection while returning this "Ok" message?
For example, a system like Amazon SimpleDB.
1) It receives a request.
2) Process the request (store and replicates the content).
3) Return a confirmation message.
If the connection was lost between phases 2 and 3, the client thinks the operation was not successful then submits again.
Thanks!
A system I reviewed earlier this year had a process similar to this. The solution they implemented was to have the client reply to the commit message, and clear a flag on the record at that point. There was a periodic process that checked every N minutes, and if an entry existed that was completed, but that the client hadn't acknowledged, that transaction was rolled back. This allowed a client to repost the transaction, but not have 2 'real' records committed on the server side.
In the event of the timeout scenario, you could do the following:
Send a client generated unique id with the initial request in a header.
If the client doesn't get a response, then it can resend the request with the same id.
The server can keep a list of ids successfully processed and return an OK, rather than repeating the action.
The only issue with this is that the server will need to eventually remove the client ids. So there would need to be a time window for the server to keep the ids before purging them.
Depends on the type of web service. The whole nature of HTTP and REST is that it's basically stateless.
e.g. In the SimpleDB case, if you're simply requesting a value for a given key. If in the process of returning it the client connection is dropped then the client can simply re-request the data at a later time. That data is likely to have been cached by the db engine or the operating system disk cache anyway.
If you're storing or updating a value and the data is identical then quite often the database engines know the data hasn't changed and so the update won't take very long at all.
Even complex queries can run quicker the second time on some database engines.
In short, I wouldn't worry about it unless you can prove there is a performance problem. In which case, start caching the results of some recent queries yourself. Some REST based frameworks will do this for you. I suspect you won't even find it to be an issue in practice though.

Resources