Azure Service Fabric Activation Error 7148 - azure

I have a service fabric cluster which hosts numerous applications. One of the applications has a service type where the service is created, runs for a bit, and then is deleted. Everything works great, but the cluster virtually always has its state set to error because there will be a few of these in the "Unhealthy evaluations" section.
Error event: SourceId='System.Hosting', Property='CodePackageActivation:Code:EntryPoint'.
There was an error during CodePackage activation.The service host terminated with exit code:7148
I've wrapped both the program's main and RunAsync in exception handlers, but never see anything in analytics. Is there any way to look up what exit code 7148 means? Thanks.

7148 is a general error code that indicates that something failed in SF in the process of setting up or activating your service's host process. So that's the reason that you're not seeing any errors or exceptions - your code is never getting a chance to run.
Examples of things I've seen that led to 7148:
The exe was not actually a windows exe due to corruption
The service's manifest had a reference to a cert or some other pre-req like an endpoint that was incorrectly configured (like a port that was already in use or the wrong thumbprint for a cert)
Something blew up inside Windows that cause the process creation to fail, like a failure to correctly configure host networking for a container
Most of the times when I see this I have to look at the windows error logs to see what's really happening. The SF folks are also trying to capture more common causes of failures and reporting them as better health errors rather than relying on 7148.

Related

Azure Function getting 503

I am trying to run an Azure Function App, that we already have running in a different resource group / service plan / storage account. The original app works fine. But when I try to run this one, I get a 503.
The problem is that all I know is that I'm getting the 503. There is no other information. I turned on tracing in the app, but I still get no messages. I have tried to execute the app from both the Azure Portal Function App Code / Test section, and from Postman, with the same results. It spins for a long time, and then I get the 503.
When I try to execute the function, it is showing me the following in the logs:
Request successfully matched the route with name 'IngestRfidScan' and template 'api/v1/rfidScan'
Executing 'Functions.IngestRfidScan' (Reason='This function was programmatically called via the host APIs.', Id=a9c37c44-6a27-41e0-bff8-74fbb4275ecc)
Sending invocation id:a9c37c44-6a27-41e0-bff8-74fbb4275ecc
Posting invocation id:a9c37c44-6a27-41e0-bff8-74fbb4275ecc on workerId:7195f57f-b8ff-4613-84e4-9d4bc5dd7c4a
I don't see any log messages after this. I tried adding logging to the app, but I am not seeing my messages in the log anywhere. So this leads me to believe that it's not executing the function at all. But I can't seem to find any way to determine why. At first I thought it could be a firewall issue, but I don't think I'd see those messages in the log above.
Any ideas how to diagnose this?
Check one of my workarounds to know the reasons of Azure Functions 503 service unavailable error causes.
It is definitely timing out. But I don't know why that is, I don't have enough info in the logs. I checked App Insights, but again, it just tells me the request is timing out, but no explanation.
I have given the timeout limits in the above workaround reference, check that and also the resolution.
For getting the logs / more information, you can check the Diagnose and solve problems menu in the Azure Portal Function App and also my workaround that shows different ways to see the Function App and Host Logs.

Azure Service Bus InvalidOperationException

I have a .NET Core app running which is listening to an Azure Service Bus topic.
When I run the app I get this error:
A sessionful message receiver cannot be created on an entity that does
not require sessions. Ensure RequiresSession is set to true when
creating a Queue or Subscription to enable sessionful behavior.
I think the code is running on a separate thread and I can't seem to find the point where the Exception is thrown.
I also don't have a D:\ drive which is where the SessionClient.cs file is located.
Is this D:\ drive in Azure, and if so does this mean there is code running in Azure?
How can I debug this?
Erase your queue and create it again wihtout the 'Require session' checked. Also check your code because maybe you have set the bool RequireSession to true. The fact is that you have enable or disable the sessions once you have configured the queue in the opposite configuration
You're getting the stacktrace that includes the reference of the sources compiled and deployed for that version of the package. The D:\ drive is irrelevant to the issue. What this is is a mismatch how you receive messages and how the entity is configured. You appear to have a sessionless entity, but in the code, you use a receiver that is configured to work with a session.

Ghost (NodeJS blog) on Azure: Periodic 500 error troubleshooting

Background / Issue
Having a strange issue running a Ghost blog on Azure. The site seems to run fine for a while, but every once in a while, I'll receive a 500 error with no further information. The next request always appears to succeed (in tests so far).
The error seems to happen after a period of inactivity. Since I'm currently just getting set up, I'm utilizing an Azure "Free" instance, so I'm wondering if some sort of resource conservation is causing it behind the scenes (which will be allevaited when I upgrade).
Any idea what could be causing this issue? I'm sort of at a loss for where to start since the logs don't necessarily help me in this case. I'm new to NodeJS (and nodeJS on Azure) and since this is my first foray, any tips/tricks on where to look would be helpful as well.
Some specific questions:
When receiving an error like this, is there anywhere I can go to see any output, or is it pretty much guaranteed that Node actually didn't output something?
On Azure free instances, does some sort of resource conservation take place which might cause the app to be shut down (and thus for me to see these errors only after a period of inactivity)?
The Full Error
The full text of the error is below (I've turned debugging on for this reason):
iisnode encountered an error when processing the request.
HRESULT: 0x2
HTTP status: 500
HTTP reason: Internal Server Error
You are receiving this HTTP 200 response because system.webServer/iisnode/#devErrorsEnabled configuration setting is 'true'.
In addition to the log of stdout and stderr of the node.exe process, consider using debugging and ETW traces to further diagnose the problem.
The node.exe process has not written any information to stderr or iisnode was unable to capture this information. Frequent reason is that the iisnode module is unable to create a log file to capture stdout and stderr output from node.exe. Please check that the identity of the IIS application pool running the node.js application has read and write access permissions to the directory on the server where the node.js application is located. Alternatively you can disable logging by setting system.webServer/iisnode/#loggingEnabled element of web.config to 'false'.
I think it might be something in the Azure web config rather than Ghost itself. So look for logs based on that because Ghost is not throwing that error. I found this question that might help you out:
How to debug Azure 500 internal server error
Good luck!

Azure WebRole won't start after 1.7 upgrade

I've recently tried to upgrade my WebRole from Azure SDK v1.6 to v1.7. This appears to have worked OK. I can build and run the role in my devfabric just fine. When I try to deploy the upgraded project to the real cloud, the instances never start. They just sit in the "busy" state. Interestingly, they don't do the typical "recycle loop", they just sit at "busy" forever.
When I log into the instances with RDP, I see the following error in the event logs:
The application '/' belonging to site '1' has an invalid AppPoolId 'DefaultAppPool' set. Therefore, the application will be ignored.
Followed by:
Site 1 was disabled because the root application defined for the site is invalid. See the previous event log message for information about the root application is invalid.
Looking in IIS manager confirms that there is no AppPool called "DefaultAppPool". There also are none of the typical AppPools with GUIDs for names that Azure creates. Unsurprisingly, none of my sites exist either.
So how do I resolve this?
I had the same issue after upgrading to v1.7, but upon looking at the Windows Azure logs in the Azure VM I noticed the following exception:
An unhandled exception occurred. Type: System.ArgumentException Process ID: 2340
Process Name: DiagnosticsAgent
Thread ID: 1
AppDomain Unhandled Exception for role Backend_IN_0
Exception: Endpoint http://xxxx.blob.core.windows.net/ is not a secure connection.
So I changed the Diagnostics connection string to use https instead of http and voilá, that solved my problem.
Hope that works for you, I've been pulling my hair off for two days.

Azure Worker Role generating writing unexpected error to Trace log storage

We have a worker role running in the cloud which polls an Azure CloudQueue periodically retrieving messages that a web role has put on there for us. Currently the worker role and web role are housed in the same Cloud Service application and currently we are only running one instance.
As we are testing we have our logging switched on and so the contents of the messages and other useful information appear in our cloud storage which we view using Cerebrata Azure Diagnostics Manager. (Great product btw)
DiagnosticMonitorConfiguration diagConfig = DiagnosticMonitor.GetDefaultInitialConfiguration();
diagConfig.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
It all appears to work remarkably well actually, however occasionally we see a Verbose message in the trace log which simply has "Fail"as the message. The code it appears to be generated from is wrapped in a try catch so it is odd that we aren't seeing the message through those means.
It would appear that something is happening that is out of our code's control, perhaps the worker role is being restarted, or the cloud op system is detecting a major error that only it can deal with by restarting our worker role. It recovers and carries on so it is somewhat of a mystery to us what might be happening.
What we haven't ascertained yet is whether we are losing a message.
Any help would be gratefully appreciated.
Cheers
Kindo Malay
Without the stack trace it's hard to say too much, but with the logging set to verbose it's quite likely that you're seeing some internal logging from one of the dlls you're using.
For example if you run a Azure Table query that causes certain kinds of errors, the error will be logged out 3 times because the storage client library is catching the error, tracing it out and then retrying.
If the error is not being caught by your try catch block, then it's likely nothing you need to worry about.
If deliverability of queue messages is important to you, you should ensure that you make use of the visibility timeout overload of CloudQueue.GetMessage and only delete the message when you've finished processing it. You may end up processing some messages twice, but at least you will process all of them.
If your role instance is getting restarted after running for a while, it's often because your process exited due to an unhandled exception.

Resources