How to ingest blobs created by Azure Diagnostics into Azure Data Explorer by subscribing to Event Grid notifications - azure

I want to send Azure Diagnostics to Kusto tables.
The idea is to get logs and metrics from various Azure resources by sending them to a storage account.
I'm following both Ingest blobs into Azure Data Explorer by subscribing to Event Grid notifications and Tutorial: Ingest and query monitoring data in Azure Data Explorer,
trying to use the best of all worlds - cheap intermediate storage for logs, and using EventHub only for notifications about the new blobs.
The problem is that only part of the data is being ingested.
I'm thinking that the problem is in the append blobs which monitoring creates. When Kusto receives "Created" notification, only a part of the blob is written, and the rest of events are never ingested as the blob is appended to.
My question is, how to make this scenario work? Is it possible at all, or I should stick with sending logs to EventHub without using the blobs with Event Grid?

Append blobs do not work nicely with Event Grid ADX ingestion, as they generate multiple BlobCreated events.
If you are able to cause blob rename on update completion, that would sole the problem.

Related

Can azure event hub ingest json events from azure blog storage without writing any code?

Is it possible to use some ready made construct in azure cloud environment to ingest the events (in json format) that are currently stored in azure blob storage and have it submit those events directly to azure event hub without writing any (however small) custom code? In other words, I would like to use configuration driven approach only.
Sure. You can try to use Azure Logic Apps to realize your needs without any code or just with some function expressions, please refer to the offical documents of Azure Logic Apps to know more details.
The logic flow is as the figure below.
You can refer to my sample below to make it works.
Here is my sample to receive an event from my EventHub and transfer to Azure Blob Storage to create a new blob for storing the event data.
Create an Azure Logic App instance on Azure portal, it should be easy for you.
Move to the tab Logic app designer to configure the logic flow.
Click Save and Run buttons. Then, use ServiceBusExplorer (downloaded from https://github.com/paolosalvatori/ServiceBusExplorer/releases) to send event message and check whether new blob created using AzureStorageExplorer. It works fine after a few minutes.

How to trigger a pipeline in Azure Data Factory v2 or a Azure Databricks Notebook by a new file in Azure Data Lake Store gen1

I am using a Azure Data Lake Store gen1 for storing JSON files. Based on these files i have Notebooks in Azure Databricks for processing them. Now i want to trigger such a Azure Databricks Notebook when a new file is creating in Azure Data Lake Store gen1. I couldnt find any Trigger which could do this. do you know any way?
Currently, this is not yet implemented/Supported by Microsoft. But it is on their Roadmap(I believe).
You can do this in 2 ways,
Azure Functions(through Event Grid)
Logic Apps
Option #1
Currently, Microsoft is building on #1.
You can track the issue here.
As per this
This feature is not a high priority for us right now, but I will note
that the announcement for Azure Event Grid listed Data Lake as one of
the integrations they are building. Once you can subscribe to Data
Lake updates through Event Grid, running an Azure Function would be
trivial (see here for some info).
You can vote your voice to support the event grid (provider) in DataLake.
Option #2
This is also not yet implemented, but you can Upvote your voice here to support this feature

Azure activity logs not displaying any write data

I'm trying to set up logging for a storage resource (table specifically, though it seems like the activity log doesn't and just logs the entire Storage account).
The logging seems to log my ListKeys operations, occasional access from ApplicationInsights, but isn't logging any writes/reads I'm making to the tables themselves through either my app or the Microsoft Azure Storage Explorer. This table has been written to multiple times over the past few weeks, but yet none of that activity shows up.
Am I misinterpreting this page, which states that this activity log should track posts/deletes? Do I need any additional setup to track these operations?
Per my understanding, you could leverage Storage Analytics logging to log the operations on your storage. For the detailed operations that are logged for the corresponding storage service, you could refer to this official document.
According to your description, I have tested my operations against table storage by using REST API and Storage Explorer Tool. Here is my test result, you could refer to it.
Table Storage Analytics logging
Table Storage Metrics
As noted in this document:
As requests are logged, Storage Analytics will upload intermediate results as blocks. Periodically, Storage Analytics will commit these blocks and make them available as a blob.
In summary, please follow this tutorial to enable and configure Storage Analytics, then wait for some time and check your table storage logging.
If you are leveraging the Azure Activity log, remember that it is meant for control plane operations. So listkeys would show up there.
if you are looking for data plane operations (such as entity writes into a table), then make sure Diagnostics are turned on inside the Storage account that you are writing to.
Azure Activity Log is only for management plane records through Azure Resource Manager (ARM), specifically PUT/DELETE/POST which includes ListKeys which is an HTTP POST.
For storage analytics logging, you can use this article to see the types of data logged.

Stream Analytics: Dynamic output path based on message payload

I am working on an IoT analytics solution which consumes Avro formatted messages fired at an Azure IoT Hub and (hopefully) uses Stream Analytics to store messages in Data Lake and blob storage. A key requirement is the Avro containers must appear exactly the same in storage as they did when presented to the IoT Hub, for the benefit of downstream consumers.
I am running into a limitation in Stream Analytics with granular control over individual file creation. When setting up a new output stream path, I can only provide date/day and hour in the path prefix, resulting in one file for every hour instead of one file for every message received. The customer requires separate blob containers for each device and separate blobs for each event. Similarly, the Data Lake requirement dictates at least a sane naming convention that is delineated by device, with separate files for each event ingested.
Has anyone successfully configured Stream Analytics to create a new file every time it pops a message off of the input? Is this a hard product limitation?
Stream Analytics is indeed oriented for efficient processing of large streams.
For your use case, you need an additional component to implement your custom logic.
Stream Analytics can output to Blob, Event Hub, Table Store or Service Bus. Another option is to use the new Iot Hub Routes to route directly to an Event Hub or a Service Bus Queue or Topic.
From there you can write an Azure Function (or, from Blob or Table Storage, a custom Data Factory activity) and use the Data Lake Store SDK to write files with the logic that you need.

Where is Azure Event Hub messages stored?

I generated a SAS signature using this RedDog tool and successfully sent a message to Event Hub using the Events Hub API refs. I know it was successful because I got a 201 Created response from the endpoint.
This tiny success brought about a question that I have not been able to find an answer to:
I went to the azure portal and could not see the messages I created anywhere. Further reading revealed that I needed to create a storage account; I stumbled on some C# examples (EventProcessorHost) which requires the storage account creds etc.
Question is, are there any APIs I can use to persist the data? I do not want to use the C# tool.
Please correct me if my approach is wrong, but my aim is to be able to post telemetries to EventHub, persist the data and perform some analytics operations on it. The telemetry data should be viewable on Azure.
You don't have direct access to the transient storage used for EventHub messages, but you could write a consumer that reads from the EventHub continuously and persist the messages to Azure Table or to Azure Blob.
The closest thing you will find to a way to automatically persist messages (as with Amazon Kinesis Firehose vs Amazon Kinesis which EventHubs are basically equivalent to), would be to use Azure Streaming Analytics configured to write the output either to Azure Blob or to Azure Table. This example shows how to set up a Streaming Analytics job that passes the data through and stores it in SQL, but you can see the UI where you can choose a choice such as Azure Table. Or you can get an idea of the options from the output API.
Of course you should be aware of the requirements around serialization that led to this question
The Event Hub stores data for maximum of 7 days; that’s too in standard pricing tier. If you want to persist the data for longer in a storage account, you can use the Event Hub Capture feature. You don’t have to write a single line of code to achieve this. You can configure it through Portal or ARM template. This is described in this document - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-overview
The event hub stores it’s transient data in Azure storage. It doesn’t give any more detail in relation to the data storage. This is evident from this documentation - https://learn.microsoft.com/en-us/azure/event-hubs/configure-customer-managed-key
The storage account you need for EventProcessorHost is only used for checkpointing or maintaining the offset of the last read event in a partition.

Resources