Flow control in pushtechnology diffusion server delaying publishing client updates - flow-control

We have control client sending 100 updates each sized 200-250 bytes, per 2 sec to clients through diffusion for different topics(one update per topic in 2 sec). the issue is once after sending these for around 20-30 minutes, flow control starts and updates are being delayed from 5 ms to 100 ms after 1-2 hrs because of flow control. Is there any way to avoid flow control for publishing Control Client in diffusion?
maxqueuesize is set 10000
diffusion api log: pressure=0.04622500000000004 => sleep for 4 ms

Flow Control was introduced into the Java client in v5.1 and the .NET client in v5.5 It exists to prevent internal queue overflow, which would otherwise close the client session. It a symptom that betrays deeper underlying issues.
It's happening for a few reasons:
Your Diffusion server is not keeping up with its workload. That this happens after some period of time makes me wonder if your server JVM is spending too much time collecting garbage. Java Missions Control is good at answering that question.
Less frequently we see this affecting control clients with dual roles, e.g. both creating and updating topics, and reacting to events such as Missing Topic Notifications. Flow Control is a function of a number of things including queue saturation and the number of unsatisfied requests. If this is the case consider a discrete session for each role.
Consider and explore the first & simpler possibility before moving to the second. If the issue persists then contact us at support#pushtechnology.com,
Martin

Related

EventHub data bursty with long pauses

I'm seeing multi-second pauses in the event stream, even reading from the retention pool.
Here's the main nugget of EH setup:
BlobContainerClient storageClient = new BlobContainerClient(blobcon, BLOB_NAME);
RTMTest.eventProcessor = new EventProcessorClient(storageClient, consumerGroup, ehubcon, EVENTHUB_NAME);
And then the do nothing processor:
static async Task processEventHandler(ProcessEventArgs eventArgs)
{
RTMTest.eventsPerSecond++;
RTMTest.eventCount++;
if ((RTMTest.eventCount % 16) == 0)
{
await eventArgs.UpdateCheckpointAsync(eventArgs.CancellationToken);
}
}
And then a typical execution:
15:02:23: no events
15:02:24: no events
15:02:25: reqs=643
15:02:26: reqs=656
15:02:27: reqs=1280
15:02:28: reqs=2221
15:02:29: no events
15:02:30: no events
15:02:31: no events
15:02:32: no events
15:02:33: no events
15:02:34: no events
15:02:35: no events
15:02:36: no events
15:02:37: no events
15:02:38: no events
15:02:39: no events
15:02:40: no events
15:02:41: no events
15:02:42: no events
15:02:43: no events
15:02:44: reqs=3027
15:02:45: reqs=3440
15:02:47: reqs=4320
15:02:48: reqs=9232
15:02:49: reqs=4064
15:02:50: reqs=395
15:02:51: no events
15:02:52: no events
15:02:53: no events
The event hub, blob storage and RTMTest webjob are all in US West 2. The event hub as 16 partitions. It's correctly calling my handler as evidenced by the bursts of data. The error handler is not called.
Here are two applications side by side, left using Redis, right using Event Hub. The events turn into the animations so you can visually watch the long stalls. Note: these are vaccines being reported around the US, either live or via batch reconciliations from the pharmacies.
vaccine reporting animations
Any idea why I see the multi-second stalls?
Thanks.
Event Hubs consumers make use of a prefetch queue when reading. This is essentially a local cache of events that the consumer tries to keep full by streaming in continually from the service. To prioritize throughput and avoid waiting on the network, consumers read exclusively from prefetch.
The pattern that you're describing falls into the "many smaller events" category, which will often drain the prefetch quickly if event processing is also quick. If your application is reading more quickly than the prefetch can refill, reads will start to take longer and return fewer events, as it waits on network operations.
One thing that may help is to test using higher values for PrefetchCount and CacheEventCount in the options when creating your processor. These default to a prefetch of 300 and cache event count of 100. You may want try testing with something like 750/250 and see what happens. We recommend keeping at least a 3:1 ratio.
It is also possible that your processor is being asked to do more work than is recommended for consistent performance across all partitions it owns. There's good discussion of different behaviors in the Troubleshooting Guide, and ultimately, capturing a +/- 5-minute slice of the SDK logs described here would give us the best view of what's going on. That's more detail and requires more back-and-forth discussion than works well on StackOverflow; I'd invite you to open an issue in the Azure SDK repository if you go down that path.
Something to keep in mind is that Event Hubs is optimized to maximize overall throughput and not for minimizing latency for individual events. The service offers no SLA for the time between when an event is received by the service and when it becomes available to be read from a partition.
When the service receives an event, it acknowledges receipt to the publisher and the send call completes. At this point, the event still needs to be committed to a partition. Until that process is complete, it isn't available to be read. Normally, this takes milliseconds but may occasionally take longer for the Standard tier because it is a shared instance. Transient failures, such as a partition node being rebooted/migrated, can also impact this.
With you near real-time reading, you may be processing quickly enough that there's nothing client-side that will help. In this case, you'd need to consider adding more TUs, moving to a Premium/Dedicated tier, or using more partitions to increase concurrency.
Update:
For those interested without access to the chat, log analysis shows a pattern of errors that indicates that either the host owns too many partitions and load balancing is unhealthy or there is a rogue processor running in the same consumer group but not using the same storage container.
In either case, partition ownership is bouncing frequently causing them to stop, move to a new host, reinitialize, and restart - only to stop and have to move again.
I've suggested reading through the Troubleshooting Guide, as this scenario and some of the other symptoms tare discussed in detail.
I've also suggested reading through the samples for the processor - particularly Event Processor Configuration and Event Processor Handlers. Each has guidance around processor use and configuration that should be followed to maximize throughput.
#jesse very patiently examined my logs and led me to the "duh" moment of realizing I just needed a separate consumer group for this 2nd application of the EventHub data. Now things are rock solid. Thanks Jesse!

air traffic controller for threads when calling a REST API

DISCLAIMER: If this post is off-topic to this site, please recommend a site where this post would be appropriate.
On Ubuntu 18.04, in bash, I am writing a network-based, threaded application that requires multiple servers. It receives files through the network and processes them, ultimately making an API call that finishes the processing and logs the results to a database for later retrieval and reporting.
So far I have written the application using non-threaded programming models and concepts. That means the files are processed one at a time in real-time. This works great if there is no sudden burst of files and/or a backlog of files to process. The main bottle neck has been the way I sequentially send files to the API one after another, waiting until the entire operation has taken place for one file and the API returns the results. The API has a rate limit of 8 calls per second. But since each call takes from .75 to 1 second, my program waits until the operation is done and only processes about 1 file per second through the API. In short, I did not have to worry about scheduling API calls because I could barely do one call per second.
Since the capacity is there to process 8 files per second, and I need more speed, I have been converting my single-threaded, sequential application into a parallel, scalable, multi-threaded application. This new version can spawn enough threads to send 8 files per second to the REST API and much more. So now I have the opposite problem. I am sending too many requests per second to the REST API and am in danger of triggering penalties, etc. Ultimately, when my traffic is higher, I will upgrade my subscription to the API and get more calls per second, but this current dilemma has got me thinking about how to schedule the API calls with different threads.
The purpose of this post is to discuss an idea about how to schedule these REST API calls across various threads. Specifically, I want to discuss how to coordinate timing and usage of the API while maintaining efficiency and yet not overloading the API. In short, I want to coordinate a group of threads so that the API is properly used. Not too fast and not too slow.
Independent of my application, this idea could be useful in a number of generically similar scenarios.
My idea is to create an "air traffic controller" ("ATC") so that the threads of the application have a centralized timing authority to check when they are ready to submit files to the REST API. The ATC would know how many time slots/calls per time period (in this case, calls per second) the API can schedule. The ATC would be listening for the threads to request a time slot ("launch code") which would give them a time slot in the future to perform their API call. The ATC would decide based on the schedule of other launch codes that it has already handed out.
In my case, from the start of the upload of the file to the API, it could take 0.75 to 1 second to complete the processing and receive a response from the API. This does not affect the count of new API calls that can be performed. It is just a consideration of how long the threads will be waiting once they call the API. It may not be relevant to this overall discussion.
Each thread would obviously have to do some error handling. If the API timed out or threw an error, then the thread would have to handle it and get back in line with the ATC -if appropriate- and ask for a new launch code. Maybe it should report the error to the ATC for centralized logging?
In situations where the file processing needs burst above 8 files per second, there would be a scheduling backlog where the threads should wait their turn as assigned by the ATC.
Here are some other considerations:
Function
The ATC would be a lightweight daemon that does the following:
- listens on some TCP port
- receives a request
security token (?), thread id, priority
- authenticates the request (?)
- examines schedule
- reserves the next available time slot
- returns the launch code
security token (?), current time, launch timing offset to current time, URL and auth token for the API
- expunged expired launch codes
The ATC would need the following:
- to know what port it is supposed to run on
- to know how many slots per time period it was set to schedule
(e.g. 8 per second)
- to have a super fast read/write access to the schedule (associative array?)
- to know the URL and corresponding auth token for the thread to use
- maybe to know multiple URLs and auth tokens for load balancing
Here are more things to consider:
Security
How could we keep the ATC secure while ensuring high performance?
Network-level security (e.g. firewalls allowing only the IP addresses of the file-processing servers?)
Auth tokens or logins and passwords?
Performance
What would the requirements be for this ATC server? Would this be taxing to a CPU and memory?
Timing
How often would an NTP call be needed? By the ATC server? By the servers which call the API?
Scalability
Being able to provide different URLs and auth tokens would allow the ATC to load balance with different API providers.
Threading of the ATC itself
Would the ATC need to spawn threads to be able to handle each new request?
How does a web server handle requests?
How would the various threads share a common schedule?
In a non-threaded environment, the ATC would possibly keep an associative array in memory to keep performance as high as possible. How would the various threads of the ATC have access to the same schedule?
So here is my question. Does this exist? If not, what are some best practices in trying to build the above?
It seems like a beanstalkd kind of network service except it only provides permission/scheduling and is extremely dependant on timing.

Is delivery of Azure Application Insights custom events guaranteed once TelemetryClient.TrackEvent() is called?

Microsoft states that the SLA for Application Insights is:
We guarantee that the data latency of the Application Insights Service will not exceed two hours 99.9% of the time.
https://azure.microsoft.com/en-us/support/legal/sla/application-insights/v1_0/
For the 0.1% of time outside the SLA, when TelemetryClient.TrackEvent() executes in my code, Is Microsoft guaranteeing that the event will definitely be published at some point (just not within 2 hours)? Or could the event be lost during that 0.1% time?
No, just calling TrackEvent doesn't guarantee it is published, for lots of reasons:
sampling at any level of the process. see https://learn.microsoft.com/en-us/azure/application-insights/app-insights-sampling?toc=/azure/azure-monitor/toc.json but in general if sampling is on, some % of your events might be merged together. there are various ways to find those events, but in general it is possible that if you call trackMessage 1000 times in a tight loop with the same content, an SDK might sample that and send a single event with itemCount set to 1000.
the content of the event could be invalid (to large a payload, exceeding thresholds for sizes of fields, too many custom properties, too many custom metrics, etc)
the time of the event could be invalid. events too far in the past (>48h old?) or too far into the future (not sure the exact time there, but some future time is allowed to account for clock skew/drift)
caps - you could exceed the amount you're allowed to send per month - see https://learn.microsoft.com/en-us/azure/application-insights/app-insights-pricing, which at the time of this answer states:
The maximum cap is 1,000 GB/day unless you request a higher maximum for a high-traffic application.
throttling - you could exceed the allowed number of events per second/etc - see https://learn.microsoft.com/en-us/azure/application-insights/app-insights-pricing, which at the time of this answer states:
Throttling limits the data rate to 32,000 events per second, averaged over 1 minute per instrumentation key.
network issues, etc. calling track on the various sdks doesn't guarantee the data is accepted or retried. some of the sdks attempt to retry, some do not.
your application could shut down / crash between the call to track and the actual connection to application insights is created/completed.
other random issues, service issues, downtime of other dependent services, etc that account for that 0.1% of missing data. I'm not sure there's any APM/telemetry service that guarantees it will accept and process 100% of the events you send.
(100% - 99.9% is not 0.01%, it is 0.1%. there's a 10x difference there.)
I have escalated this issue to app insights team. If any feedback, I will update you.
As per my understanding, for the other 0.01% time outside SLA, if there is some downtime, the data would get lost. In any other condition, it would be published beyond 2 hours.
Hope it helps.

Using Fleck Websocket for 10k simultaneous connections

I'm implementing a websocket-secure (wss://) service for an online game where all users will be connected to the service as long they are playing the game, this will use a high number of simultaneous connections, although the traffic won't be a big problem, as the service is used for chat, storage and notifications... not for real-time data synchronization.
I wanted to use Alchemy-Websockets, but it doesn't support TLS (wss://), so I have to look for another service like Fleck (or other).
Alchemy has been tested with high number of simultaneous connections, but I didn't find similar tests for Fleck, so I need to get some real info from users of fleck.
I know that Fleck is non-blocking and uses Async calls, but I need some real info, cuz it might be abusing threads, garbage collector, or any other aspect that won't be visible to lower number of connections.
I will use c# for the client as well, so I don't need neither hybiXX compatibility, nor fallback, I just need scalability and TLS support.
I finally added Mono support to WebSocketListener.
Check here how to run WebSocketListener in Mono.
10K connections is not little thing. WebSocketListener is asynchronous and it scales well. I have done tests with 10K connections and it should be fine.
My tests shows that WebSocketListener is almost as fast and scalable as the Microsoft one, and performs better than Fleck, Alchemy and others.
I made a test on a Windows machine with Core2Duo e8400 processor and 4 GB of ram.
The results were not encouraging as it started delaying handshakes after it reached ~1000 connections, i.e. it would take about one minute to accept a new connection.
These results were improved when i used XSockets as it reached 8000 simultaneous connections before the same thing happened.
I tried to test on a Linux VPS with Mono, but i don't have enough experience with Linux administration, and a few system settings related to TCP, etc. needed to change in order to allow high number of concurrent connections, so i could only reach ~1000 on the default settings, after that he app crashed (both Fleck test and XSocket test).
On the other hand, I tested node.js, and it seemed simpler to manage very high number of connections, as node didn't crash when reached the limits of tcp.
All the tests where echo test, the servers send the same message back to the client who sent the message and one random other connected client, and each connected client sends a random ~30 chars text message to the server on a random interval between 0 and 30 seconds.
I know my tests are not generic enough and i encourage anyone to have their own tests instead, but i just wanted to share my experience.
When we decided to try Fleck, we have implemented a wrapper for Fleck server and implemented a JavaScript client API so that we can send back acknowledgment messages back to the server. We wanted to test the performance of the server - message delivery time, percentage of lost messages etc. The results were pretty impressive for us and currently we are using Fleck in our production environment.
We have 4000 - 5000 concurrent connections during peak hours. On average 40 messages are sent per second. Acknowledged message ratio (acknowledged messages / total sent messages) never drops below 0.994. Average round-trip for messages is around 150 miliseconds (duration between server sending the message and receiving its ack). Finally, we did not have any memory related problems due to Fleck server after its heavy usage.

Azure Service Bus - Determine Number of Active Connections (Topic/Queue)

Since Azure Service Bus limits the maximum number of concurrent connections to a Queue or Topic to 100, is there a method that we can use to query our Queues/Topics to determine how many concurrent connections there are?
We are aware that we can capture the throttling events, but would very much prefer an active approach, where we can proactively increase or decrease the number of Queues/Topics when the system is under a heavy load.
The use case here is a process waiting for a reply message, where the reply is coming from a long-running process, and the subscription is using a Correlation Filter to facilitate two-way communication between the Publisher and Subscriber. Thus, we must have a BeginReceive() going in order to await the response, and each such Publisher will be consuming a connection for the duration of their wait time. The system already balances load across multiple Topics, but we need a way to be proactive about how many Topics are created, so that we do not get throttled too often, but at the same time not have an excess of Topics for this purpose.
I don't believe it is currently possile to query the listener counts. I think that the subscriber object also figures into that so in theory, if you have up to 2000 subscribers per topic and if each allows up to 100 connections, that's alot of potential connections. We just need to keep in mind that subscribers are cooperative (each gets a copy of all messages) and receivers on subscriers are competitive (only one gets it).
I've also seen unconfirmed reports of performance delays when you start running > 1,000 subscribers so make sure you test this scenario.
But... given your scenario, I'd deduce that performance time likely isn't the biggest factor (you have long running processes already). So introducing a couple seconds lag into the workflow likely won't be critical. If that's the case, I'd set the timeout for your BeginRecieve to something fairly short (couple seconds) and have a sleep/wait delay between attempts. This gives other listeners an opportnity to get messsages as well. We might also want to consider an approach where we attempt to recieve multiple messages and then assign them out other processes for processing (coorelation in this case?).
Juts some thoughts.

Resources