Azure IOT hub transmission fails - azure

I am using the azure IOT sdk in an ESP32-based device to connect to the IOT hub using MQTT, sending messages with QOS 1. When the connection is good, all works exactly as intended. However, when we deploy to areas where the connectivity seems somewhat more spotty, the messages often time out (i.e. callback is called with the timeout error). The MQTT still thinks it has a connection (i.e. the disconnect callback has not been called), but all sends end up timing out. Interestingly, I see that when I send c2d messages, they do get picked up.
I have configured the firmware to tear down and rebuild the MQTT connection in these scenarios and that sometimes helps but not always.
Two questions:
Why does this seem to happen, and are there parameters that I can twiddle to prevent it. I have reduced the size of the packets but that did not seem to make a difference.
What is the appropriate way of handling this condition? I have seen scenarios where once the communication gets "stuck" like this, it can stay stuck for tens of minutes.
Hope there's someone from MSFT IOT group listening... :)

Related

How to measure Websocket backpressure or network buffer from client

I am using the ws Node.js package to create a simple WebSocket client connection to a server that is sending hundreds of messages per second. Even with a simple onMessage handler that just console.logs incoming messages, the client cannot keep up. My understanding is that this is referred to as backpressure, and incoming messages may start piling up in a network buffer on the client side, or the server may throttle the connection or disconnect all-together.
How can I monitor backpressure, or the network buffer from the client side? I've found several articles speaking about this issue from the perspective of the server, but I have no control over the server and need to know just how slow is my client?
So you don't have control over the server and want to know how slow your client is.(seems like you already have read about backpressure). Then I can only think of using a stress tool like artillery
Check this blog, it might help you setting up a benchmarking scenario.
https://ma.ttias.be/benchmarking-websocket-server-performance-with-artillery/
Add timing metrics to your onMessage function to track how long it takes to process each message. You can also use RUM instrumentation like from the APM providers -- NewRelic or Appdynamics for paid options or you could use free tier of Google Analytics timing.
If you can, include a unique identifier for correlation between the client and server for each message sent.
Then you can correlate for a given window how long a message took to send from the server and how long it spent being processed by the client.
You can't get directly to the network socket buffer associated with your websocket traffic since you're inside the browser sandbox. I checked the WebSocket APIs and there's no properties that expose receive buffer information.
If you don't have control over the server, you are limited. But you could try some client tricks to simulate throttling.
This heavily assumes you don't mind skipping messages.
One approach would be to enable the socket, start receiving events and set your own max count in a in-memory queue/array. Once you reach a full queue, turn off the socket. Process enough of the queue, then enable the socket again.
This has high cost to disable/enable the socket, as well as the loss of events, but at least your client will not crash.
Once your client is not crashing, you can put some additional counts on timestamp and the queue size to determine the threshold before the client starts crashing.

Developing a web app to log messages from GPS device

this is my first question here and I realize this question might be open ended, but I'm looking for specific solutions, and any solution would be accepted.
I have GPS devices which send data packets to an IP on a port, both of which I can configure. I wish to use one of Google's, Amazon's or Microsoft's offering of cloud services. I am using python. Here is an implementation I found online :-
https://github.com/rdkls/gps-tracker-server
The data is coming as packets which are not over HTTP protocol. I have considered building a network listener over a socket on Google Compute Engine, but I'm not sure if it will be able to handle simultaneous requests from 1000 devices if such a situation ever arises. The Google Cloud IoT core offering seems to fit my need perfectly, but it is in private beta right now, which means I can't use it. I think I'll need a message queue service. But most of the offerings from these three companies requires messages over HTTP. Keep in mind that I can't change how the messages are sent from the GPS devices.
The messages sent are in this format -
https://drive.google.com/file/d/0B2EklrIn3KugS2NJYWZGWlVWeGdMbjM4WHQ2TUZmYWhIRmt3/view?usp=drive_web
Format:
data is sent in (byte sized) packets directly to the IP:Port over GPRS connections, one heartbeat packet every minute and GPS details every minute from each device. It also requires teh server to eply to the messagee for acknowledgement since it's not over TCP/IP.
So basically, which service and which architecture should I use keeping scalability, reliability and cost in mind?
I think for a 1000 devices, that would send such messages every minute, total would be 43M messages. I'm not sure but I'm looking for something that'll cost me about 1000$ that is 1$ per device per month.

Signal R randomly loses connection to the server side

We use Signal R with an Azure web app in an ASE for our real time web application.
We noticed that Signal R sometimes looses connection to the hub in no particular pattern.
This happens both during high traffic periods as well as low traffic ones but I am more interested in why this i happening during low traffic periods.
Note: We have a so called "1-minute auto refresh" which is triggered by the JavaScript on the page. That seems to be working.
Anyone experienced similar issues using SignalR, and if so, how did you resolve this?
Thank you
(a tester, don't be too harsh!lol )
I have definitely experienced this, and it drove me nuts.
By default, a SignalR client will try to reconnect for 20 seconds after losing connection to its Hub. After 20 seconds without a successful reconnect, the disconnected event is raised on JavaScript clients. After disconnected is raised, the client will give up trying to reconnect and the connection is dead. This page describes SignalR lifecycle events and offers some code on trying to reconnect after the disconnected event is raised.
Now as to why this happens. I've noticed that an App Pool recycle can take longer than 20 seconds in some apps, which can lead to a disconnected event. Intermittent drops in network connectivity between your JavaScript clients and Hub that lasts more than 20 seconds can cause this also. The bottom line is that things can go wrong that are beyond your control and you cannot code around them. Therefore, put in place the logic to attempt to reconnect after your JavaScript client receives the disconnected event.

How to configure MassTransit in an unreliable network environment?

I'm trying to get my head around MassTransit in combination with RabbitMQ.
The basic concepts are working in a test project, but what I need is the following:
My system will have one or more servers that react to real life events (telephony). These events wil, by means of MassTransit and RabbitMQ, translate into messages that will be picked up by one or more receivers via a separate server, set up as RabbitMQ host. So far so good.
However, I cannot assume that I always have a connection between the publisher and the host machines. Just assume that the publishing server will continue to consume the real life events, but now cannot publish it's messages.
So, the question is: Does MassTransit have some kind of mechanism to store messages locally some way until the connection is re-established?
Or should I install RabbitMQ on every publishing server as well, in order to create a local exchange? Then I have to make the exchanges synchronize themselves after a reconnect.
Probably you have to implement a store and forward policy. Instead of publishing directly your message through MassTransit and RabbitMQ, you can store the message in a persistence repository (a local database) and delegate to some other process the notification through Masstransit of the messages stored before. This approach is often referred as "Client High Availability". This does not substitute the standard HA (High Availability) on server like the one implemented by RabbitMQ. But it's a good approach to use in a distributed system (like the one you described) because it could help you a lot in scenarios of server failure (e.g. an issue on RabbitMQ server that causes some loss of messages that you still have inside the store of some client and therefore you can make it process again).

How to not receive the accumulated pushes from Pusher after returning online?

How can one prevent Pusher from automatically pushing all the piled up messages to the client after the client eventually goes online after being offline, i.e. after the client re-establishes the connection?
After exchanging messages with a Pusher support enginner, the issue became more clear.
The connection may still be opened even when the laptop gets asleep (this behaviour varies among computers). Thus, after waking up, it may still be connected. (This is exactly what happened in my case so that everything looked like Pusher pushed the accumulated messages.)
However, the default activity timeout is 120s, and the time to wait for a pong response before closing the connection is 30s. So, allowing it around three minutes would make the client disconnect completely, and the behaviour I encountered would not take place.
Pusher doesn't presently buffer messages to be delivered upon reconnection. So the functionality described in the questions isn't something an application needs to consider right now.
Future releases may contains something called Event Buffer which will offer this functionality. Documentation will be released around that time to detail how to avoid receiving buffered events.

Resources