Based on this answer : Amazon QLDB have any scaling/performance limits?
Short answer: yes.
A QLDB session can have 1 active transaction. And, a ledger (not account!) can have up to 1500 sessions. This limit can be raised by filing for a limit increase.
I gave additional info in my answer to your question on the other thread.
The default value for maxPoolSize on the MongoClient is 100.
How would one know if the default value, 100, is the optimal value? Is there a formula, a way of calculating to determine the ideal maxPoolSize?
Thanks a lot.
The pool size is solely dependent on the driver you are using, for Node.js minimum is 5 which is default and the maximum is 100(default).
Pool size helps to make the concurrent requests to the DB and it depends upon how much concurrent connection(queries you can run on) you need to create from the application. You can increase it based on your usage- that you have to need more than 100 connections at a time with DB or you need some connections for long-running tasks etc.
And these connections are of course based on the server that you are running the DB on. Mongo atlas does provide certain clusters that can have up to 1500 connections at a time.
You should check this post to see how does it impact application performance.
Microsoft states that the SLA for Application Insights is:
We guarantee that the data latency of the Application Insights Service will not exceed two hours 99.9% of the time.
For the 0.1% of time outside the SLA, when TelemetryClient.TrackEvent() executes in my code, Is Microsoft guaranteeing that the event will definitely be published at some point (just not within 2 hours)? Or could the event be lost during that 0.1% time?
No, just calling TrackEvent doesn't guarantee it is published, for lots of reasons:
sampling at any level of the process. see but in general if sampling is on, some % of your events might be merged together. there are various ways to find those events, but in general it is possible that if you call trackMessage 1000 times in a tight loop with the same content, an SDK might sample that and send a single event with itemCount set to 1000.
the content of the event could be invalid (to large a payload, exceeding thresholds for sizes of fields, too many custom properties, too many custom metrics, etc)
the time of the event could be invalid. events too far in the past (>48h old?) or too far into the future (not sure the exact time there, but some future time is allowed to account for clock skew/drift)
caps - you could exceed the amount you're allowed to send per month - see, which at the time of this answer states:
The maximum cap is 1,000 GB/day unless you request a higher maximum for a high-traffic application.
throttling - you could exceed the allowed number of events per second/etc - see, which at the time of this answer states:
Throttling limits the data rate to 32,000 events per second, averaged over 1 minute per instrumentation key.
network issues, etc. calling track on the various sdks doesn't guarantee the data is accepted or retried. some of the sdks attempt to retry, some do not.
your application could shut down / crash between the call to track and the actual connection to application insights is created/completed.
other random issues, service issues, downtime of other dependent services, etc that account for that 0.1% of missing data. I'm not sure there's any APM/telemetry service that guarantees it will accept and process 100% of the events you send.
(100% - 99.9% is not 0.01%, it is 0.1%. there's a 10x difference there.)
I have escalated this issue to app insights team. If any feedback, I will update you.
As per my understanding, for the other 0.01% time outside SLA, if there is some downtime, the data would get lost. In any other condition, it would be published beyond 2 hours.
Hope it helps.
We are experiencing lots of these exceptions sending events to EventHubs during peak traffic:
"Failed to send event to EventHub. Exception : Microsoft.ServiceBus.Messaging.MessagingException: The server was unable to process the request; please retry the operation. If the problem persists, please contact your Service Bus administrator and provide the tracking id."
"Failed to send event to EventHub. Exception : System.TimeoutException: The operation did not complete within the allocated time "
You can see it clearly here:
As you can see, we got lots of Internal Errors, Server Busy Errors, Failed Request when Incoming messages are over 400K events/hour (or ~270 MB/hour). This is not just a transient issue. It's clearly related to throughput.
Our EH has 32 partitions, message retention of 7 days, and 5 throughput units assigned. OperationTimeout is set to 5 mins, and we are using the default RetryPolicy.
Is it anything we still need to tweak here? We are really concerned about the scalability of EH.
Send throughput tuning can be achieved using efficient partition distribution strategies. There isn't any single knob which can do this. Below is the basic information you will need to be able to design for High-Thruput Scenarios.
1) Lets start from the Namespace: Throughput Units(aka TUs) are configured at Namespace level. Pls. bear in mind, that, TUs configured is applied - aggregate of all EventHubs under that Namespace. If you have 5 TUs on your Namespace and 5 eventhubs under it - it will be divided among all 5 eventhubs.
2) Now lets look at EventHub level: If the EventHub is allocated with 5 TUs and it has 32 partitions - No single partition can use all 5 TUs. For ex. if you are trying to send 5TU of data to 1 partition and 'Zero' to all other 31 partitions - this is not possible. Maximum you should plan per Partition is 1 TU. In general, you will need to ensure that the data is distributed evenly across all partitions. EventHubs support 3 types of sends - which gives users different level of control on Partition distribution:
EventHubClient.Send(EventDataWithoutPartitionKey) -> if you are using this API to send - eventhub will take care of evenly distributing the data across all partitions. EventHubs service gateway will round-robin the data to all partitions. When a specific partition is down - the Gateways auto-detect and ensure Clients doesn't see any impact. This is the most recommended way to Send to EventHubs.
EventHubClient.Send(EventDataWithPartitionKey) -> if you are using this API to send to EventHubs - the partitionKey will determine the distribution of your data. PartitionKey is used to Hash the EventData to the appropriate partition (algo. to hash is Microsoft Proprietary and not Shared). Typically users who require correlation of a group of messages will use this variant of Send.
EventHubSender.Send(EventData) -> In this variant, the Sender is already attached to the Partition. So - this gives complete control of Distribution across partitions to the Client.
To measure your present distribution of Data - use EventHubClient.GetPartitionRuntimeInfo Api to estimate which Partition is overloaded. The difference b/w BeginSequenceNumber and LastEnqueuedSequenceNumber is supposed to give an estimate of that partitions load compared to others.
3) Last but not the least - you can tune performance (not Throughput) at send operation level - using the SendBatch API.
1 TU can buy a Max of 1000 msgs/sec or 1MBPS - you will be throttled with whichever limit hits first - this cannot be changed.
If your messages are small - lets say 100 bytes and you can send only 1000 msgs/sec (as per the TU limit) - you will first hit the 1000 events/sec limit. However, overall using SendBatch API - you can batch lets say 10 of 100byte msgs and push at the same rate - 1000 msgs/sec with just 100 API calls and improve the end-to-end latency of the system (as it helps service also to persist messages efficiently). Remember, the only limitation here is the Max. Msg Size that can be sent - which is 256 kb (this limit will apply on your BatchSize if you use SendBatch API).
Given that background, in your case:
- Having 32 partitions and 5 TUs - I would really double-check the Partition distribution strategy.
here's some more general reading on Event Hubs...
After a lot of digging we decided to stop setting the PK for posted messages, and the issue simply went away!. We were using GUID as PK. We start to get very few erros on the Azure Portal, and no more exceptions. Hope this helps someone else
I am trying to design create a cloud based system (IaaS) that will gather data from sensors (water pollution related activity) and upon certain events will decide to process the data for a specific sensor.
Data characteristics are:
1. For each sensor data is being sent once every couple of days (up to 6 times a month)
2. each sensor reading contains about 5000 events that are encapsulated in 50-100 messages that are sent to the server (such "session" takes about 20 minutes where messages are sent every 5 seconds)
3. I am building the system to handle rate of 30,000 messages per second.
4. processing of the data shouldn't be real time , I have about 10 minutes once the "session" is finished to do the processing.
5. 90% of the sessions are not interesting and can be thrown away once they are finished. the other 10% have event or event encapsulated in the messages that according to them I need to decide if I need to process the entire session data and send an alert to the sensor that there is a pollution.
I created a tool that generates 5000 messages per second and I am trying to figure out which database would be the most optimal for my scenario.
These are the databases I am thinking to try:
Cassandra - I will save for each session an in memory collection of keys. the keys are for the messages that are stored in cassandra. Once I detect a message that contains bad readings I will need to pull all of the other messages in the "session" and process them (that means 50-100 requests to cassandra). My concern here is about write performance (since I have many read and write operations) + I don't have a good strategy for deleting the 90% not needed sessions.
Couchbase - I will save a document for each "session" according to sensorID and will append each message to the document. Once I detect a message that contains bad readings I will only need to send one request for the document. My concern here is about the read performance.
Redis - use it like cassandra. I assume performance will be the best but I will need to handle the sharding and replication of data myself in order not to reach the memory limit
I would love to hear which option would be the most appropriate
Reg. Redis – You may consider using a DAAS (Data as a Service). The service will manage for you all the instances, clusters, scaling, data persistence and high availability settings.
One example, is Redis Cloud by Redis Labs
This is an interesting one. If we go to basics of CAP Theorem and try to choose one DB based upon need of consistency, availability, and partition tolerance.
For High consistency and availability- Choose MySQL, PostgreSQL,Greenplum, Vertica, Neo4J.
For High availability and partition tolerance- Use Cassandra,Voldemort,Dynamo,CouchDB, Riak
For High consistency and partition tolerance- Use HBase, Redis, MongoDB,
BerkeleyDB, BigTable
So my Vote is for Cassandra here.
I wanted to tailor the application I am making which communicates with the quickbooks server and adds things like customers and check expenses and I wanted my application to be as efficient as possible regarding performance. For example, my intention was to have all customer additions (batch process) on one thread and all check expenses or bills (batch process) on another thread which is logically possible as the two procedures don't interfere and are not related to one another.
My question is would such a design approach be permissible by Intuit? I guess my concern is regarding any limitations on communication with their servers.
In the docs site, the following throttling policy is mentioned.
What are the throttling limits based on QB accounts, OAuth client, and RealmId at any given time?
EDIT Following line is not valid anymore. FAQ page is updated.
Apart from an upper limit set that ensures no more than 10 requests in progress at any given time;
we have a throttling policy across all IDS apis to permit 500 requests/minute per AuthId and per RealmId. The policy permits 200 requests/minute per AuthId for reports endpoints.
Ref -
So, if you follow the above throttling limit then parallel processing using multiple threads is not an issue.
PN - You can't create multiple name entities ( ex - Vendor, Employee and Customer) using parallel threads. Service puts a lock across these 3 entities to ensure an unique name is getting used while creating a new entity.