Azure Autoscale Restarts Running Instances - azure

I've been using Autoscale to shift between 2 and 1 instances of a cloud service in a bid to reduce costs. This mostly works except that from time to time (not sure what the pattern seems to be here), the act of scaling up (1->2) causes both instances to recycle, generating a service outage for users.
Assuming nothing fancy is going on in RoleEntry in response to topology changes, why would scaling from 1->2 restart the already running instance?
Additional notes:
It's clear both instances are recycling by looking at the Instances
tab in Management Portal. Outage can also be confirmed by hitting the
public site.
It doesn't happen consistently but I'm not sure what the pattern is. It feels like when the 1-instance configuration has been running for multiple days, attempts to scale up recycle both. But if the 1-instance configuration has only been running for a few hours, you can scale up and down without outages.
The first instance always comes back much faster than the 2nd instance being introduced.

This has always been this way. When you have 1 server running and you go to 2+, the initial server is restarted. In order to have a full SLA, you need to have 2+ servers at all time.

Nariman, see my comment on Brent's post for some information about what is happening. You should be able to resolve this with the following code:
public class WebRole : RoleEntryPoint
{
public override bool OnStart()
{
// For information on handling configuration changes
// see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.
IPHostEntry ipEntry = Dns.GetHostEntry(Dns.GetHostName());
string ip = null;
foreach (IPAddress ipaddress in ipEntry.AddressList)
{
if (ipaddress.AddressFamily.ToString() == "InterNetwork")
{
ip = ipaddress.ToString();
}
}
string urlToPing = "http://" + ip;
HttpWebRequest req = HttpWebRequest.Create(urlToPing) as HttpWebRequest;
WebResponse resp = req.GetResponse();
return base.OnStart();
}
}

You should be able to control this behavior. In the roleEntrypoint, there's an event you can trap for, RoleEnvironmentChanging.
A shell of some code to put into your solution will look like...
RoleEnvironment.Changing += RoleEnvironmentChanging;
private void RoleEnvironmentChanging(object sender, RoleEnvironmentChangingEventArgs e)
{
}
RoleEnvironment.Changed += RoleEnvironmentChanged;
private void RoleEnvironmentChanged(object sender, RoleEnvironmentChangedEventArgs e)
{
}
Then, inside the RoleEnvironmentChanged method, we can detect what the change is and tell Azure if we want to restart or not.
if ((e.Changes.Any(change => change is RoleEnvironmentConfigurationSettingChange)))
{
e.Cancel = true; // don't recycle the role
}

Related

Handling Acumatica timeout on API Invoke action

I have code in a standalone application that invokes an Acumatica action to generate reports; I am running into timeouts on large documents while the action completes.
What is the best method to handle these timeouts? I need to wait for the action to complete in order to retrieve the files I've generated.
Standalone application code:
public SalesOrder GenerateAcumaticaLabels(string orderNbr, string reportType)
{
SalesOrder salesOrder = null;
using (ISoapClientProvider clientProvider = soapClientFactory.Create())
{
try
{
SalesOrder salesOrderToFind = new SalesOrder
{
OrderType = new StringSearch { Value = orderNbr.Split(OrderSeparator.SalesOrder).First() },
OrderNbr = new StringSearch { Value = orderNbr.Split(OrderSeparator.SalesOrder).Last() },
ReturnBehavior = ReturnBehavior.OnlySpecified,
};
salesOrder = clientProvider.Client.Get(salesOrderToFind) as SalesOrder;
InvokeResult invokeResult = new InvokeResult();
invokeResult = clientProvider.Client.Invoke(salesOrder, new exportSFPReport());
ProcessResult processResult = clientProvider.Client.GetProcessStatus(invokeResult);
//Wait for the update to complete before we attempt to retrieve the files
while (processResult.Status == ProcessStatus.InProcess)
{
Thread.Sleep(1000); //pause for 1 second
processResult = clientProvider.Client.GetProcessStatus(invokeResult);
}
}
And the action in Acumatica:
public PXAction<SOOrder> ExportSFPReport;
[PXButton]
[PXUIField(DisplayName = "Generate Robot SFP PDF")]
protected IEnumerable exportSFPReport(PXAdapter adapter)
{
//Report Paramenters
Dictionary<String, String> parameters = new Dictionary<String, String>();
parameters["SOOrder.OrderType"] = Base.Document.Current.OrderType;
parameters["SOOrder.OrderNbr"] = Base.Document.Current.OrderNbr;
IEnumerable reportFileInfo = ExportReport(adapter, "IN619217", parameters);
exportTrayLabelReport(adapter, "SFP");
return reportFileInfo;
}
The problem here is that your action is synchronous, so it is trying to complete within the Invoke call (which is not a good thing for long processes). You have to explicitly make your operation long-running by using PXLongOperation.StartOperation inside your handler, and then your client code should work properly, as it already handles the waiting and checking.
I believe the reason why you encounter time-out is because there is no TCP communication between the time you sent the request and receive the response. With TCP KeepAlive flag set to true, the client will periodically ping the server to reset the time-out period.
That would be the best way. However Acumatica connections are rather high level so I don't think you'll be able to easily access that flag. What I would try first in a scenario that doesn't involve external application is to wrap your action event-handler code in a PXLongOperation block which has to do something similar to keep connection alive under the hood:
PXLongOperation.StartOperation(this or Base, delegate
{
your code here
});
When I do encounter time-outs in Acumatica that can't be solved with PXLongOperation I go for the simplest method which is increasing IIS timeout in Web.Config file. I'm not sure if your use case with external application will go well with async PXLongOperation. The handler would return prematurely and the client could not be able to retrieve the async payload.
So you might have to increase time-out instead. As far as I know there's no real practical drawback to doing this unless your website is under threat of DOS attacks.
You can locate and edit the Web.Config file of your Acumatica instance using inetmgr program if you are self-hosting Acumatica. Otherwise talk to your SAAS contact to see if that's an option.
I'm pretty sure you are hitting IIS time-out. A tell-tale sign would be lost connection after exactly 5 minutes which is the default 300 seconds value. You can edit Web.Config file to increase executionTimeout value. It's not a bad idea to increase maxRequestLength too if you are requesting large amount of data from Acumatica API as this is also a common cause of failure that you miss in testing and occurs in real-life scenarios:
<httpRuntime executionTimeout="300" requestValidationMode="2.0" maxRequestLength="1048576" />

Subscribing to Service Fabric cluster level events

I am trying to create a service that will update an external list of Service Endpoints for applications running in my service fabric cluster. (Basically I need to replicate the Azure Load Balancer in my on premises F5 Load Balancer.)
During last month's Service Fabric Q&A, the team pointed me at RegisterServiceNotificationFilterAsync.
I made a stateless service using this method, and deployed it to my development cluster. I then made a new service by running the ASP.NET Core Stateless service template.
I expected that when I deployed the second service, the break point would hit in my first service, indicating that a service had been added. But no breakpoint was hit.
I have found very little in the way of examples for this kind of thing on the internet, so I am asking here hopping that someone else has done this and can tell me where I went wrong.
Here is the code for my service that is trying to catch the application changes:
protected override async Task RunAsync(CancellationToken cancellationToken)
{
var fabricClient = new FabricClient();
long? filterId = null;
try
{
var filterDescription = new ServiceNotificationFilterDescription
{
Name = new Uri("fabric:")
};
fabricClient.ServiceManager.ServiceNotificationFilterMatched += ServiceManager_ServiceNotificationFilterMatched;
filterId = await fabricClient.ServiceManager.RegisterServiceNotificationFilterAsync(filterDescription);
long iterations = 0;
while (true)
{
cancellationToken.ThrowIfCancellationRequested();
ServiceEventSource.Current.ServiceMessage(this.Context, "Working-{0}", ++iterations);
await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken);
}
}
finally
{
if (filterId != null)
await fabricClient.ServiceManager.UnregisterServiceNotificationFilterAsync(filterId.Value);
}
}
private void ServiceManager_ServiceNotificationFilterMatched(object sender, EventArgs e)
{
Debug.WriteLine("Change Occured");
}
If you have any tips on how to get this going, I would love to see them.
You need to set the MatchNamePrefix to true, like this:
var filterDescription = new ServiceNotificationFilterDescription
{
Name = new Uri("fabric:"),
MatchNamePrefix = true
};
otherwise it will only match specific services. In my application I can catch cluster wide events when this parameter is set to true.

Same Azure topic is processed multiple times

We have a job hosted in an azure website, the job reads entries from a topic subscription. Everything works fine when we only have one instance to host the website. Once we scale out to more than one instance we observe the message is processed as many times as instances we have. Each instance points to the same subscription. From what we read, once the item is read, it won't be available for any other process. The duplicated processing is happening inside the same instance, meaning that if we have two instances, the item is processed twice in one of the instances, it is not splitted.
What can be possible be wrong in the way we are doing things?
This is how we proceed to configure the connection to the queue, if the subscription does not exists, it is created:
var serviceBusConfig = new ServiceBusConfiguration
{
ConnectionString = transactionsBusConnectionString
};
config.UseServiceBus(serviceBusConfig);
var allRule1 = new RuleDescription
{
Name = "All",
Filter = new TrueFilter()
};
SetupSubscription(transactionsBusConnectionString,"topic1", "subscription1", allRule1);
private static void SetupSubscription(string busConnectionString, string topicNameKey, string subscriptionNameKey, RuleDescription newRule)
{
var namespaceManager =
NamespaceManager.CreateFromConnectionString(busConnectionString);
var topicName = ConfigurationManager.AppSettings[topicNameKey];
var subscriptionName = ConfigurationManager.AppSettings[subscriptionNameKey];
if (!namespaceManager.SubscriptionExists(topicName, subscriptionName))
{
namespaceManager.CreateSubscription(topicName, subscriptionName);
}
var subscriptionClient = SubscriptionClient.CreateFromConnectionString(busConnectionString, topicName, subscriptionName);
var rules = namespaceManager.GetRules(topicName, subscriptionName);
foreach (var rule in rules)
{
subscriptionClient.RemoveRule(rule.Name);
}
subscriptionClient.AddRule(newRule);
rules = namespaceManager.GetRules(topicName, subscriptionName);
rules.ToString();
}
Example of the code that process the topic item:
public void SendInAppNotification(
[ServiceBusTrigger("%eventsTopicName%", "%SubsInAppNotifications%"), ServiceBusAccount("OutputServiceBus")] Notification message)
{
this.valueCalculator.AddInAppNotification(message);
}
This method is inside a Function static class, I'm using azure web job sdk.
Whenever the azure web site is scaled to more than one instance, all the instances share the same configuration.
It sounds like you're creating a new subscription each time your new instance runs, rather than hooking into an existing one. Topics are designed to allow multiple subscribers to attach in that way as well - usually though each subscriber has a different purpose, so they each see a copy of the message.
I cant verify this from your code snippet but that's my guess - are the config files identical? You should add some trace output to see if your processes are calling CreateSubscription() each time they run.
I think I can access the message id, I'm using azure web job sdk but I think I can find a way to get it. Let me check it and will let you know.

Threading does not work when deploying to another server

I have created a web page where users can upload a file which contains data that are inserted into the Lead table in Microsoft Dynamics CRM 2011.
The weird thing now is that when I deploy to our test environment the application runs fine seemingly but absolutely no rows are imported. In my dev environment it works just fine.
After a while when trying to find the error I created a setting that runs the application without using threading (Thread.Run basically) and then it inserts all the Leads. If I switch back to using threads it inserts no leads at all although I get no application errors.
When using SQL Server Profiler I can see in dev (where it works with threading) that all of the insert statements run. When profiling in test environment no insert statements at all are run.
I sort of get the feeling that there is some server issue or setting that is causing this behaviour but when googling I don't really get any results that have helped me.
I was sort of hoping that someone recognizes this problem. I haven't got much experience in using threading so maybe this is some road bump that I need to go over.
I cannot show my code completely but this is basically how I start the threads:
for (int i = 0; i < _numberOfThreads; i++)
{
MultipleRequestObject mpObject = new MultipleRequestObject() { insertType = insertType, listOfEntities = leadsForInsertionOrUpdate[i].ToList<Entity>() };
Thread thread = new Thread(delegate()
{
insertErrors.AddRange(leadBusinessLogic.SaveToCRMMultipleRequest(mpObject));
});
thread.Start();
activeThreads.Add(thread);
}
// Wait for threads to complete
foreach (Thread t in activeThreads)
t.Join();
I initialize my crm connection like this (it reads the connectionstring from web.config -> connectionstrings)
public CrmConnection connection { get; set; }
private IOrganizationService service { get; set; }
public CrmContext crmContext { get; set; }
public CrmGateway()
{
connection = new CrmConnection("Crm");
service = (IOrganizationService)new OrganizationService(connection);
crmContext = new CrmContext(service);
}

Custom maintenance mode module does not work on Azure Web Role

I've created and registered custom http module to show maintenance message to user after administrator turns on maintenance mode via configuration change.
When I pass request for html it should return custom html loaded from file, but it returns message: "The service is unavailable." I can't find that string in my entire solution. Custom log message from custom maintenance module is written to log4net logs.
... INFO DdiPlusWeb.Common.MaintenanceResponder - Maintenance mode is on. Request rejected. RequestUrl=...
Seems something is miss configured in IIS on Azure. Something intercepts my 503 response. How to fix it?
Module code
void context_BeginRequest(object sender, EventArgs e)
{
HttpApplication application = (HttpApplication)sender;
HttpContext context = application.Context;
if (AppConfig.Azure.IsMaintenance)
{
MaintenanceResponder responder = new MaintenanceResponder(context, MaintenaceHtmlFileName);
responder.Respond();
}
}
Interesting part of responder code.
private void SetMaintenanceResponse(string message = null)
{
_context.Response.Clear();
_context.Response.StatusCode = 503;
_context.Response.StatusDescription = "Maintenance";
if (string.IsNullOrEmpty(message))
{
_context.Response.Write("503, Site is under maintenance. Please try again a bit later.");
}
else
{
_context.Response.Write(message);
}
_context.Response.Flush();
_context.Response.End();
}
EDIT: I lied. Sorry. Maintenance module returns the same message for requests that expect json or html.
This answer led me to the solution.
I've added another line to SetMaintenanceResponse method.
_context.Response.TrySkipIisCustomErrors = true;
It works now. Here is more about what it exactly means.

Resources