Is there a systematic/programmatic way to monitor for real-time push notification failures? I've found that the real time API notifications can fail intermittently and I would like to be notified when this happens.
Usually pushes "fail" because of errors on the application's end, essentially whenever your push URL doesn't return a 200. You will still receive the push from Foursquare though, so assiduous monitoring/logging of all pushes to your app should help you identify when they fail.
If you're doing a lot of processing after receiving a push from us, you may be returning a 200 too slowly. Foursquare will assume the request timed out and report it as a failed push. To mitigate this, you should always return a 200 immediately, then do any processing.
It rarely happens, but sometimes the error could be coming from Foursquare's end. For the latest on platform status, you should follow #foursquareAPI on Twitter, where we communicate downtime and other updates.
Related
When we maintenance our server, or redeploy our external facing REST services for DocuSign, is there a way we can lock all envelopes that are currently sitting with signers? We use Connect to process signer/document updates from DocuSign, and we don't want these requests coming through while we're under maintenance.
I've seen in the documentation we can lock individual envelopes. Is the best route to run through each envelope that's still pending signature and temporarily lock it? This method seems very resource intensive considering the amount of consecutive API calls needed.
Connect supports exponential retires when the events fail to be sent to your endpoint. How long does your system down time take exactly?
When your system is back up, new events should arrive in your endpoint and you can react to them accordingly. Please let us know if you see otherwise.
https://developers.docusign.com/platform/webhooks/connect/architecture
We use the REST service API to pull messages from a PubSub subscription. Messages ready to be serviced are acknowledged, leaving other messages unacknowledged to be serviced during a later execution cycle.
During an execution cycle, we send a single reqeust to the pull service REST API with returnImmediately=true and maxMessages=100.
While testing we encountered a situation when only 3 "old" messages would be returned during each execution cycle. Newly published messages were never included in the request to pull. We verified new messages were successfully arriving at the subscription by monitoring the Undelivered messages in Stackdriver monitoring.
Does the pull REST API not include all undelivered messages?
Does it ignore the maxMessages parameter?
How should all messages, up to the maximum specified, be read with the REST API?
Notes:
We worked around the problem by sending 2 parallel requests to the pull API and merging the results. We found the workaround (requiring parallel requests) discussed here.
Update Feb. 22, 2018
I wrote an article on our blog that explains why we forced to use the PubSub service REST API.
A single pull call will not necessarily return all undelivered messages, especially when returnImmediately is set to true. The pull promises to return at most maxMessages, it does not mean that it will always return maxMessages if there are that many messages available.
The pull API tries to balance returning more messages with keeping end-to-end latency low. It would rather return a few messages quickly than wait a long time to return more messages. Messages need to be retrieved from storage or from other servers and so sometimes all of those messages aren't immediately available to be delivered. A subsequent pull request would then receive other messages that were retrieved later.
If you want to maximize the chance of receiving more messages with a pull request, set returnImmediately to false. This still will not guarantee that messages will all be delivered in a single pull request, even if maxMessages is greater than the number of messages yet to be delivered. You should still send subsequent pull requests (or even more ideally, several pull requests at the same time) to retrieve all of the messages.
Alternatively, you should consider switching to the Google Cloud Pub/Sub client libraries, which handle all of this under the hood and deliver messages to a callback you specify as soon as they are available.
I currently have a application running that passes data between Asana and Zendesk.
I have webhooks created for all my Project in Asana and all project events are sent to my webhook end point that verifies the request and tries to identify the event and update Zendesk with relevant data depending on the event type (Some events aren't required).
However I have been receiving the following request from the Webhooks just recently:
"events": [
{
"action": "sync_error",
"message": "There was an error with the event queue, which may have resulted in missed events. If you are keeping resources in sync, you may need to manually re-fetch them.",
"created_at": "2017-05-23T16:29:13.994Z"
}
]
Now because I don't poll the API for event updates I react when the events arrive with me, I haven't considered using a Sync key, the docs suggest this is only required when polling for events. Do I need to use one when using Webhooks also?
What am I missing?
Thanks in advance for any suggestions.
You're correct, you don't need to track a sync key for webhooks - we proactively try to reach out with them when something changes in Asana, and we track the events that haven't yet been delivered across webhooks (essentially, akin to us updating the sync key server-side whenever webhooks have been successfully delivered).
Basically what's happening here is that for some reason, our event queues detect that there's a problem with their internal state. This means that events didn't get recorded, or webhooks didn't get delivered after a long time. Our events and webhooks try to track changes in a best-effort sense, and there are some things that can happen with our production machines that can cause these sorts of issues, like a machine dying at an inopportune time.
Unfortunately, then, the only way to get back to a good state is to do a full scan of the projects you're tracking, which is what is meant by you may need to manually re-fetch them. Basically, a robust implementation of syncing Asana to external resources looks like:
A diff function that, given a particular task and external resource, detects what state is out of date or different between each resource and choose a merge/patch resolution (i.e. "Make Zendesk look like Asana")
Receiving a webhook runs that diff/patch process for that one task in a "live" fashion.
Periodically (on script startup, say, or when webhooks/events are missed and you get an error message like this) update all resources that might have been missed by scanning the entire project and do the diff/patch for every task. This is more expensive, but should be significantly more rare.
I just did a little stress test on Azure Notification Hub.
Sent 200 exactly the same messages to iPhone:
There are 62
"The Push Notification System returned an Internal Server Error"
And 138
"The Notification was successfully sent to the Push Notification System"
So the failure rate is 31%!!!
I turned on 'enableTestSend' mode and the message is got from NotificationOutcome->RegistrationResult->Outcome
Does anyone also have done some tests on it?
This is definitely not acceptable.
Depending on how your load test was designed, it might or might have not been Notification Hubs-related failure. Either as #efimovandr noted, those could have been throttled by APNS or as #simon-w suggested, could be any other PNS-specific issue. One way to verify that is to run exactly the same code that you ran using NH, but instead call the PNS directly. Changes are you'll get the same success rate. Then it means you need to design the test in a different way that better reflects real-world service usage.
Microsoft offers SLA on Notification Hubs service which means that they invest in making sure that failure rates are manageable for customers.
If you are still experiencing the problem, contact customer support with your namespace name and approximate time (with time zone) when it happened and they will help you understand what was going on.
This is a Brain-Question for advice on which scenario is a smarter approach to tackle situations of heavy lifting on the server end but with a responsive UI for the User.
The setup;
My System consists of two services (written in node); One Frontend Service that listens on Requests from the user and a Background Worker, that does heavy lifting and wont be finished within 1-2 seconds (eg. video conversion, image resizing, gzipping, spidering etc.). The User is connected to the Frontend Service via WebSockets (and normal POST Requests).
Scenario 1;
When a User eg. uploads a video, the Frontend Service only does some simple checks, creates a job in the name of the User for the Background Worker to process and directly responds with status 200. Later on the Worker see's its got work, does the work and finishes the job. It then finds the socket the user is connected to (if any) and sends a "hey, job finished" with the data related to the video conversion job (url, length, bitrate, etc.).
Pros I see: Quick User feedback of sucessfull upload (eg. ProgressBar can be hidden)
Cons I see: User will get a fake "success" respond with no data to handle/display and needs to wait till the job finishes anyway.
Scenario 2;
Like Scenario 1 but that the Frontend Service doesn't respond with a status 200 but rather subscribes to the created job "onComplete" event and lets the Request dangle till the callback is fired and the data can be sent down the pipe to the user.
Pros I see: "onSuccess", all data is at the User
Cons I see: Depending on the job's weight and active job count, the Users request could Timeout
While writing this question things are getting clearer to me by the minute (Scenario 1, but with smart success and update events sent). Regardless, I'd like to hear about other Scenarios you use or further Pros/Cons towards my Scenarios!?
Thanks for helping me out!
Some unnecessary info; For websockets I'm using socket.io, for job creating kue and for pub/sub redis
I just wrote something like this and I use both approaches for different things. Scenario 1 makes most sense IMO because it matches the reality best, which can then be conveyed most accurately to the user. By first responding with a 200 "Yes I got the request and created the 'job' like you requested" then you can accurately update the UI to reflect that the request is being dealt with. You can then use the push channel to notify the user of updates such as progress percentage, error, and success as needed but without the UI 'hanging' (obviously you wouldn't hang the UI in scenario 2 but its an awkward situation that things are happening and the UI just has to 'guess' that the job is being processed).
Scenario 1 -- but instead of responding with 200 OK, you should respond with 202 Accepted. From Wikipedia:
https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
202 Accepted The request has been accepted for processing, but the
processing has not been completed. The request might or might not
eventually be acted upon, as it might be disallowed when processing
actually takes place.
This leaves the door open for the possibility of worker errors. You are just saying you accepted the request and is trying to do something with it.