Easiest way of getting the instance count of current Azure Cloudservice - azure

I'm running this Azure Cloudservice.
To make sure I can leverage the Servicebus in a safe way,
I have implemented the advice of this blog post: http://blogs.msdn.com/b/clemensv/archive/2012/07/30/transactions-in-windows-azure-with-service-bus-an-email-discussion.aspx
Instead of polling a database table constantly, I send a message to a queue to trigger a background process to look in the database and send the message.
But of course this alone, would not be safe, so I also have to periodically check the database on a schedule to make sure, I didn't miss anything.
But since I'm running multiple instances, I would like to spread this schedule, to make it more efficient, I thought I could do this, by getting the last integer from the Instance name. I know how to get this number using Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.CurrentRoleInstance.Id
But I don't know how to get the total number of instances, I need that, in order to spread the schedule equally.
Does anybody know how to get this, without having to use Azure's management API's ?

Try Role.Instances property. Here's code from this page only to find all instances:
foreach (RoleInstance roleInst in RoleEnvironment.CurrentRoleInstance.Role.Instances)
{
Trace.WriteLine("Instance ID: " + roleInst.Id);
}

Related

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.
You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread
Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

Azure Web Roles and Instances

I have a web role with multiple instances. I'd like to send an email to my customers with their statistics every morning at 7AM.
My problem is the following: if I use a Cron job to do the work mails will be sent multiple times.
How can I configure my instances in order to send only one email to each of my customers?
Per my experience, I think you can try to use a unique instance id to be ensure there is a single instance works for emailing as cron job.
Here is a simple code.
import com.microsoft.windowsazure.serviceruntime.RoleEnvironment;
String instanceId = RoleEnvironment.getCurrentRoleInstance().getId();
if("<instace-id-for-running-cronjob>".equals(instanceId)) {
// Coding for cron job
.....
}
As reference, please see the description of the function RoleInstance.getId below.
Returns the ID of this instance.
The returned ID is unique to the application domain of the role's instance. If an instance is terminated and has been configured to restart automatically, the restarted instance will have the same ID as the terminated instance.
More details for using the class above from Azure SDK for Java, please see the list of classes below.
RoleEnvironment
RoleInstance
Hope it helps.
Typically the way to solve this problem with Multi-instance Cloud Services would be to use Master Election pattern. What you would do is at 7:00 AM (or whenever your CRON job will fire), all your instances will try to acquire lease on a blob. However only one instance will be successful in acquiring the lease while other instances will fail with PreConditionFailed (419) error (make sure you catch this exception!). The instance that acquires the lease is essentially a Master and that instance will send out email.
Obviously the problem with this approach is that what if this Master instance fails to deliver the messages. But I guess that's another question :).

Windows Azure - leader instance without single point of failure

I am looking for a way to have a "Singleton" module over multiple worker role instances.
I would like to have a parallel execution model with Queues and multiple worker roles in Azure.
The idea is that would like to have a "master" instance, that is let's say checking for new data, and is scheduling it by adding it to a queue, processing all messages from a special queue, that is not processed by nobody else, and has mounted blob storage as a virtual drive, with read/write access.
I will always have only one "master instance". When that master instance goes down for some reason, another instance from the one already instantiated should very quickly be "elected" for a master instance (couple of seconds). This should happen before the broken instance is replaced by a new one by the Azure environment (about 15 min).
So it will be some kind of self-organizing, dynamic environment.
I was thinking of having some locking, based on a storage or table data. the opportunity to set lock timeouts and some kind of "watchdog" timer if we can talk with microprocessor terminology.
There is general approach to what you seek to achieve.
First, your master instance. You could do your check based on instance ID. It is fairly easy. You need RoleEnvironment.CurrentRoleInstance to get the "Current instance", now compare the Id property with what you get out of RoleEnvironment.CurrentRoleInstance.Role.Instances first member ordered by Id. Something like:
var instance = RoleEnvironment.CurrentRoleInstance;
if(instance.Id.Equals(instance.Role.Instances.OrderBy(ins => ins.Id).First().Id))
{
// you are in the single master
}
Now you need to elect master upon "Healing"/recycling.
You need to get the RoleEnvironment's Changed event. Check if it is TopologyChange (just check whether it is topology change, you don't need the exact change in topology). And if it is Topology Change - elect the next master based on the above algorithm. Check out this great blog post on how to exactly perform events hooking and change detection.
Forgot to add.
If you like locks - blob lease is the best way to acquire / check locks. However working with just the RoleEnvironment events and the simple master election based on Instance ID, I don't think you'll need that complicated locking mechanism. Besides - everything lives in the Queue until it is successfully processed. So if the master dies before it processes something, the "next master" will process it.

Correct code pattern for recurrent events in Azure worker roles with sizable delays between each event

I have an Azure worker role whose job is to periodically run some code against a SQL Azure database. Here's my current code:
const int oneHour = 216000000; // milliseconds
while (true)
{
var numConversions = SaveSeedsToSQL.ConvertRemainingPotentialQueryURLsToSeeds();
SaveLogEntryToSQL.Save(new LogEntry { Count = numConversions });
Thread.Sleep(oneHour);
}
Is Thread.Sleep(216000000) the best way of programming such regular but infrequent events or is there some kind of wake-up-and-run-again mechanism for Azure worker roles that I should be utilizing?
This code works of course, but there are some problems:
You can fail somewhere and this schedule gets all thrown off. That
is important if you must actually do it at a specific time.
There is no concurrency control here. If you want something only done once,
you need a mechanism such that a single instance will perform the
work and the other instances won't.
There are a few solutions to this problem:
Run the Windows Scheduler on the role (built in). That solves problem 1, but not 2.
Run Quartz.NET and schedule things. That solves #1 and depending on how you do it, also #2.
Use future scheduled queue messages in either Service Bus or Windows Azure queues. That solves both.
The first two options work with caveats, so I think the last option deserves more attention. You can simply create a message(s) that your role(s) will understand and post it to the queue. Once the time comes, it becomes visible and your normally polling roles will see it and can work on it. The benefit here is that it is both time accurate as well as a single instance operates on it since it is a queue message. When completed with the work, you can have the instance schedule the next one and post it to the queue. We use this technique all the time. You only have to be careful that if for some reason your role fails before scheduling the next one, the whole system kinda fails. You should have some sanity checks and safeguards there.

Controlling azure worker roles concurrency in multiple instance

I have a simple work role in azure that does some data processing on an SQL azure database.
The worker basically adds data from a 3rd party datasource to my database every 2 minutes. When I have two instances of the role, this obviously doubles up unnecessarily. I would like to have 2 instances for redundancy and the 99.95 uptime, but do not want them both processing at the same time as they will just duplicate the same job. Is there a standard pattern for this that I am missing?
I know I could set flags in the database, but am hoping there is another easier or better way to manage this.
Thanks
As Mark suggested, you can use an Azure queue to post a message. You can have the worker role instance post a followup message to the queue as the last thing it does when processing the current message. That should deal with the issue Mark brought up regarding the need for a semaphore. In your queue message, you can embed a timestamp marking when the message can be processed. When creating a new message, just add two minutes to current time.
And... in case it's not obvious: in the event the worker role instance crashes before completing processing and fails to repost a new queue message, that's fine. In this case, the current queue message will simply reappear on the queue and another instance is then free to process it.
There is not a super easy way to do this, I dont think.
You can use a semaphore as Mark has mentioned, to basically record the start and the stop of processing. Then you can have any amount of instances running, each inspecting the semaphore record and only acting out if semaphore allows it.
However, the caveat here is that what happens if one of the instances crashes in the middle of processing and never releases the semaphore? You can implement a "timeout" value after which other instances will attempt to kick-start processing if there hasnt been an unlock for X amount of time.
Alternatively, you can use a third party monitoring service like AzureWatch to watch for unresponsive instances in Azure and start a new instance if the amount of "Ready" instances is under 1. This will save you can save some money by not having to have 2 instances up and running all the time, but there is a slight lag between when an instance fails and when a new one is started.
A Semaphor as suggested would be the way to go, although I'd probably go with a simple timestamp heartbeat in blob store.
The other thought is, how necessary is it? If your loads can sustain being down for a few minutes, maybe just let the role recycle?
Small catch on David's solution. Re-posting the message to the queue would happen as the last thing on the current execution so that if the machine crashes along the way the current message would expire and re-surface on the queue. That assumes that the message was originally peeked and requires a de-queue operation to remove from the queue. The de-queue must happen before inserting the new message to the queue. If the role crashes in between these 2 operations, then there will be no tokens left in the system and will come to a halt.
The ESB dup check sounds like a feasible approach, but it does not sound like it would be deterministic either since the bus can only check for identical messages currently existing in a queue. But if one of the messages comes in right after the previous one was de-queued, there is a chance to end up with 2 processes running in parallel.
An alternative solution, if you can afford it, would be to never de-queue and just lease the message via Peek operations. You would have to ensure that the invisibility timeout never goes beyond the processing time in your worker role. As far as creating the token in the first place, the same worker role startup strategy described before combined with ASB dup check should work (since messages would never move from the queue).

Resources