why dubbo provider retries configuration ineffective? - rpc

dubbo version:2.7.14
spring cloud && nacos 2.0.3 for registration center
I set dubbo.provider.retries=0 in application.yml, and when it registers to nacos, seems that the configuration is effetive.
But when the consumer invoke one of the methods, consumer still retried 3 times(defalut times). So why the configuration is ineffetive confuse me.
If you have some ideas, please tell me, thanks! Sorry for my poor English...

Check configuration of consumer.
https://dubbo.apache.org/zh/docs/v2.7/user/configuration/xml/
Configuration of consumer has a higher priority.
Overrides and Priorities
Take timeout as an example, here is the priorities, from high to low (retries, loadbalance, actives also applies the same rule):
method level,interface level,default/global level
at the same level, consumer has higher priority than provider

Related

Hazelcast readiness probe

I am aware that there is a probe available at /hazelcast/health/ready at port 5701.
However I need to do this progamatically through code as I am using embedded hazelcast deployed on a kubernetes cluster and all communication should pass through the main application (that means hazelcast cannot expose that endpoint, using http requests through localhost would not suffice). I tried looking into the documentation but I found no help in this.
Only thing that I found is to use instance.getServer().getPartitionService().isLocalMemberSafe() but I got no evidence that this is effectively the same as checking the readiness probe.
Any help would be appreciated, thanks!
The exact logic for /ready endpoint is:
node.isRunning() && node.getNodeExtension().isStartCompleted()
I guess you can't use exactly the same from the code, but fairly good approximations are:
instance.getLifecycleService().isRunning() (the only difference is that it won't wait with being ready for joining other members)
instance.getPartitionService().isClusterSafe() (the difference is that it will wait for all the Hazelcast migration to finish)
You can use any of them. If you want to be really sure that Hazelcast member can receive the traffic when its ready, then the second option is totally safe.

Mule: Thread count under load with doThreading="false"

we have a mule app with HTTP inbound endpoint and I'm trying to figure out how to control the thread count under load. As an experiment I have added the following configuration:
<core:configuration>
<core:default-threading-profile doThreading="false" maxThreadsActive="500" poolExhaustedAction="RUN"/>
</core:configuration>
Under load I'm seeing the thread count peak at over 1000 threads. Am not sure why this is the case give the maxThreadsActive setting and the doThreading="false". Reading about poolExhaustedAction="RUN", I would expect the listener thread to block while processing inbound requests rather than spawn new ones, and finally reject the connection if its backlog queue is full. I never see rejected client connections.
Does Mule maintain a separate thread pool for each inbound endpoint in the app (sorry if this is in the documentation)? Even if so, don't think it helps explain what I'm seeing.
Any help appreciated. We are running a number of mule apps in one container and I'd like to control the total number of threads.
Thanks, Alfie.
Clearly the doThreading attribute on default-threading-profile is not enough to control Mule threading as a whole nor limit with a global cap the specific threading behaviour of transports. I reckon you're getting 500 threads for the HTTP message receiver pool and 500 for the VM message dispatcher pool.
I strongly suggest you reading about tuning Mule: http://www.mulesoft.org/documentation/display/current/Tuning+Performance
My gut feel is that you need to
configure threading on each transport (VM, HTTP), strictly specifying the pool size for receivers and dispatchers,
select flow processing strategies that prevent Mule from spawning new threads (i.e. use synchronous to hog the receiver threads),
select exchange patterns that also prevent Mule from spawning new threads (i.e. use request-response to piggyback the current execution thread).

Tomcat - one thread per request - or other alternatives?

My understanding is that in Tomcat, each request will take up one Java/(and thus OS) thread.
Imagine I have an app with lots of long-running requests (eg a poker game with multiple players,) that involves in-game chat, and AJAX long-polling etc.
Is there a way to change the tomcat configuration/architecture for my webapp so that I'm not using a thread for each request but 'intercept' the request and response so they can be processed as part of a queue?
I think you're right about tomcat likes to handle each request in its own thread. This could be problematic for several concurrent threads. I have the following suggestions:
Configure maxThreads and acceptCount attributes of the Connector elements in server.xml. In this way you limit the number of threads that can get spawned to a threshold. Once that limit is reached, requests get queued. The acceptCount attribute is to set this queue size. Simplest to implement but not a good long term solution
Configure multiple Connector elements in server.xml and make them share a threadpool by adding an Executor element in server.xml. You probably want to point tomcat to your own implementation of Executor interface.
If you want finer grain control no how requests are serviced, consider implementing your own connector. The 'protocol' attribute of the Connector element in server.xml should point to your new connector. I have done this to add a custom SSL connector and this works great.
Would you reduce this problem to a general requirement to make tomcat more scalable in terms of the number of requests/connections? The generic solution to that would be configuring a loadbalancer to handle multiple instances of tomcat.

Windows Azure Service Bus Queues: Throttling and TOPAZ

Today at a customer we analysed the logs of the previous weeks and we found the following issue regarding Windows Azure Service Bus Queues:
The request was terminated because the entity is being throttled.
Please wait 10 seconds and try again.
After verifying the code I told them to use the Transient Fault Handing Application Block (TOPAZ) to implement a retry policy like this one:
var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(2));
var retryPolicy = new RetryPolicy<ServiceBusTransientErrorDetectionStrategy>(retryStrategy);
The customer answered:
"Ah that's great, so it will also handle the fact that it should wait
for 10 seconds when throttled."
Come to think about it, I never verified if this was the case or not. I always assumed this was the case. In the Microsoft.Practices.EnterpriseLibrary.WindowsAzure.TransientFaultHandling assembly I looked for code that would wait for 10 seconds in case of throttling but didn't find anything.
Does this mean that TOPAZ isn't sufficient to create resilient applications? Should this be combined with some custom code to handle throttling (ie: wait 10 seconds in case of a specific exception)?
As far as throttling concerned, Topaz provides a set of built-in retry strategies, including:
- Fixed interval
- Incremental intervals
- Random exponential back-off intervals
You can also write your custom retry stragey and plug-it into Topaz.
Also, as Brent indicated, 10 sec wait is not mandatory. In many cases, retrying immediately may succeed without the need to wait. By default, Topaz performs the first retry immediately before using the retry intervals defined by the strategy.
For more info, see Ch.6 of the "Building Elastic and Resilient Cloud Apps" Developer's Guide, also available as epub/mobi/pdf from here.
If you have suggestions/feature requests for Topaz, please submit them via the uservoice.
As I recall, the "10 second" wait isn't a requirement. Additionally, TOPAZ I believe also has backoff capabilities which would help you over come thing.
On a personal note, I'd argue that simply utilzing something like TOPAZ is not sufficient to creating a truely resilient solution. Resiliency goes beyond just throttling on a single connection point, you'll also need to be able to handle failover to a redundant endpoint which TOPAZ won't do.

Azure Table Storage RetryPolicy questions

A couple questions on using RetryPolicy with Table Storage,
Is it best practice to use RetryPolicy whenever you can, hence use ctx.SaveChangeWithRetries() instead of ctx.SaveChanges() accordingly whenever you can?
When you do use RetryPolicy, for example,
ctx.RetryPolicy = RetryPolicies.Retry(5, TimeSpan.FromSeconds(1));
What values do people normally use for the retryCount and the TimeSpan? I see 5 retries and 1 second TimeSpan are a popular choice, but would 5 retries 1 second each be too long?
Thank you,
Ray.
I think this is highly dependent on your application and requirements. The timeout errors to ATS happen so rarely that if a retry policy will not hurt to have in place and would be rarely utilized anyway. But if something fishy is happening, it may save yourself from having to debug weird errors.
Now, I would suggest that in the beginning you do not enable the RetryPolicy at all and have tracing instead so that you can see any issues with persistence to ATS. Once you're stabilized, putting a RetryPolicy maybe good idea to work around some runtime glitches on the ATS side. Just make sure you're not masking your own problems with RetryPolicy.
If your client is user facing like a web page you would probably like to use a linear retry with short waits (milliseconds) in between each retry, if your client is actually a non user facing backend service etc. then you would most likely want to use Exponential retries in order not to overload the table storage service in case it is already giving 5xx errors due to high load for instance.
Using the latest Azure Storage client SDK, if you do not define any retry policy in your table requests via the TableRequestOptions, then the default retry policy is used which is the Exponential retry. The sdk makes 3 retries in total for the errors that it deems retriable and this in total takes more or less 20 seconds if all retries fail.

Resources