Diagnosing errors in StreamAnalytics Jobs - azure

I've got a series of services that generate events that are being written to an Azure Event Hub. This hub's connected to a StreamAnalytics Job that takes the event information and writes it to an Azure TableStorage and DataLake Store for later analysis by different teams and tools.
One of my services is reporting all events correctly, but the other isn't, after hooking up a listener to the hub I can see the events are being sent without a problem, but they aren't being processed or sent to the sinks on the job.
On the audit logs I see periodic transformation errors for one of the columns that's written to the storage, but seeing the data there's no problem on the format, and I can't seem to find a way to maybe look at the troubled events that are causing this failures.
The only error I see on the Management Services is
We are experiencing issues writing output for output TSEventStore right now. We will try again soon.

It sounds like there may be two issues:
1) The writing to the TableStorage TSEventStore is failing.
2) There are some data conversion errors.
I would suggest trying to troubleshoot one at a time. For the first one, are there any events being written to the TSEventStore? Is there another message in operations logs that may give more detail on why writing is failing?
For the second one, today we don't have a way to output events that have data conversation errors. The best way is by outputting the data only to one sync (data lake) and looking at it there.
Thanks,
Kati

Related

ASP.NET WebApp in Azure using lots of CPU

We have a long running ASP.NET WebApp in Azure which has no real endpoints exposed – it serves a single functional purpose primarily reading and manipulating database data, effectively a batched, scheduled task, triggered by a timer every 30 seconds.
The app runs fine most of the time but we are seeing occasional issues where the CPU load for the app goes close to the maximum for the AppServicePlan, instantaneously rather than gradually, and stops executing any more timer triggers and we cannot find anything explicitly in the executing code to account for it (no signs of deadlocks etc. and all code paths have try/catch so there should be no unhandled exceptions). More often than not we see errors getting a connection to a database but it’s not clear if those are cause or symptoms.
Note, this is the only resource within the AppService Plan. The Azure SQL database is in the same region and whilst utilised by other apps is very lightly used by them and they also exhibit none of the issues seen by the problem app.
It feels like this is infrastructure related but we have been unable to find anything to explain what is happening so if anyone has any suggestions for where we should be looking they would be gratefully received. We have enabled basic Application Insights (not SDK) but other than seeing CPU load spike prior to loss of app response there is little information of interest given our limited knowledge of how to best utilise Insights.
According to your description, I thought of two points to troubleshoot your problem. First of all, you can track the running status of your program through the code, and put a log at the beginning and end of your batch scheduled tasks to record the status of each run. If possible, record request and response information and start and end information. This can completely record the time and running status of your task.
Secondly, you can record logs before the program starts database operations, and whether the database connection is successful. The best case is to be able to record, what business will trigger CPU load when operating, and track the specific operating conditions, in order to specifically analyze what causes the database connection failure.
Because you cannot reproduce your problem, you can only guess the cause of the problem. If you still can't find where the problem is through the above two points, then modify your timer appropriately, and let the program trigger once every 5 minutes instead of 30s.

AddMessage in batch process for a queue

I want to include a lot of records in a queue of Azure.
I don't want to do it one to one. I would like to create a batch process to know if something went wrong.
Because I need to do a rollback proccess if something went wrong.
For example to run a batch of 50 and if the queue gets the 50 records receive a success.
if something went wrong receive a that information.
I know I can include records in a table in a batch way with this command:
cloudTable.ExecuteBatchAsync(tableBatchOperation);
And I saw on internet a way to do it batch process for queues.
But I think this post it is related with performance, more than batch process success or not.
Any idea? any magic library?
AFAIK, It’s not possible to send messages in a batch to a storage queue.
Azure Service Bus on the other hand supports this functionality. You might want to look into it if batching is important for you.

Move data between EventHubs in different regions

I have webapps spread out in a number of different regions. Each app put data in a region-local event hub. After this I want to collect all the data in a central event hub so I can do processing of all the data in one place. What is the best way to move data from one event hub to another? The different regions have on the order of 1000 messages per second they need to put into the hubs.
Ideas I have tried:
Let the webapp write directly to the central event hub. The downside is that the connection between regions can be bad. Every day I would get a lot of timeouts between southeast Asia and north Europe.
Use a stream analytics job to move from one to the other. This seems to work ok, except that it is not 100% reliable with high load. My job stopped for no reason and had to be manually restarted (after 15 minutes of downtime) to work again.
While my first answer would have been to try your #2 above, it didn't work for you (for whatever reason, I haven't tried Stream Analytics myself), you pretty much know what you have to do: copy data from one event hub to the other.
Thus write an EventHub consumer that copies the message from one EventHub to another potentially wrapping it in an envelope if you need to bring some of the metadata along with it (enqueued time for example). If your destination event hub goes down, just keep retrying and don't commit progress until you succeed in sending the message over (since unless you parse the bodies you shouldn't have poison messages). No matter which solution you use you're going to have duplicate messages arrive in the central eventhub so plan for that by including unique ids inside the payload or designing the matter otherwise.
Obviously ensure that you have enough partitions on the central Event Hub to handle the load from all the other ones and you'll certainly want local partitions since 1000/second is the per partition write limit.
You'll still have the choice to make of whether to put the copier locally or centrally, my inclination is locally but you can test it both ways with the same code (though your commit/offset tracker should probably be in the same place as the copier runs).
So yeah stuff can go down, just make sure to start it up again preferably automatically when it does (and put in monitoring on how far behind your copying processes are). It'd be great if Stream Analytics did it reliably enough, but alas.
You also have choices as to how partitions are assigned to copier workers. Constant assignment is not a bad choice if the workers are guaranteed to start up again quickly (ie are on managed thing that will keep X alive). The auto assignment of partitions seems somewhat likely to lead to partitions that are forgotten for brief periods of time before rebalancing but just choose your poison.

Deleting dead topics in Azure Service Bus

I've tried to do my homework on this issue but no searches I can make have gotten me closer to the answer. Closest hit was Detect and Delete Orphaned Queues, Topics, or Subscriptions on Azure Service Bus.
My scenario:
I have multiple services running (standard win service). At startup these processes starts to subscribe to a given topic in Azure Service Bus. Let's call the topic "Messages".
When the service is shut down it unsubcribes in a nice way.
But sometimes stuff happens and the service crashes, causing the unsubscription to fail and the subscription then is left hanging.
My questions:
1) From what I'm seeing, each dead topic subscription counts when a message is sent to that topic. Even if no one is ever going to pick it up. Fact or fiction?
2) Is there anyway to remove subscriptions that haven't been checked for a while, for example for the last 24h? Preferrably by a Power Shell script?
I've raised this issue directly with Microsoft but haven't received any answer yet. Surely, I can't be the first to experience this. I'll also update this if I get any third party info.
Thanks
Johan
In the Azure SDK 2.0 release we have addressed this scenario with the AutoDeleteOnIdle feature. This will allow you to set a timespan on a Queue/Topic/Subscription and the when no activity is detected for the specified duration, the entity will automatically be deleted. See details here, and the property to set is here.
On your 1) question, yes messages sent to a topic will be sent to any matching subscription, even if that is Idle (based on your own logic). A subscription is a permanent artifact that you create that is open to receive messages, even when no services are dequeuing messages.
To clean out subscriptions, you can probably use the AccessedAt property of the SubscriptionDescription and use that to check when someone last read the queue (by a Receive operation).
http://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.subscriptiondescription.accessedat.aspx
If you use that logic, you can build your own 'cleansing' mechanisms
HTH

Azure ServiceBus Queues -- taking a long time to get local messages?

I'm working through a basic tutorial on the ServiceBus. A web role adds objects to a ServiceBus Queue, while a worker role reads those messages off the queue and marks them complete. This is all within the local environment (compute emulator).
It seems like it should be incredibly simple, but I'm seeing the following behavior:
The call QueueClient.Receive() is always timing out.
Some messages are just hanging out in the queue and are not being picked up by the worker.
What could be going on? How can I debug the state of these messages?
You can check the length of the Queue (from the portal or looking at the MessageCount property).
Another possibility is that the messages are DeadLettered. You can read from the DeadLettered subqueue using this sample code.
First of all, please make sure you indeed have some messages in the queue. I would like to suggest you to run the end solution of this tutorial: http://msdn.microsoft.com/en-us/WAZPlatformTrainingCourse_ServiceBusMessaging. If that works fine, please compare your code with the sample code. If that doesn’t work as well, it is likely to be a configuration issue or a network issue. Then I would recommend you to check whether you have properly configured the Service Bus account, and check if you’re able to access internet from your machine.
Best Regards,
Ming Xu.

Resources