Identify modified blobs from storage account changefeed - azure

I'm currently consuming the changefeed on Azure storage account and would like to distinguish between blobs created (uploaded) and those that are just modified.
In the example below I upload a blob (agent-diag.txt) and then edit the file (add some text).
In both cases it raises 'BlobCreated', there seems no concept of 'BlobUpdated'.
From MS Doc: The following event types are captured in the change feed records:
BlobCreated
BlobDeleted
BlobPropertiesUpdated
BlobSnapshotCreated
BlobPropertiesUpdated is recorded if the meta data or tags etc are changed. But if the file is modified I can't see any way to identify this. Any ideas?
Operation Name: PutBlob
Api: Azure.Storage.Blobs.ChangeFeed.BlobChangeFeedEventData
Subject: /blobServices/default/containers/myblobs/blobs/agent-diag.txt
Event Type: BlobCreated
Event Time: 17/11/2021 23:25:42 +00:00
Operation Name: PutBlob
Api: Azure.Storage.Blobs.ChangeFeed.BlobChangeFeedEventData
Subject: /blobServices/default/containers/myblobs/blobs/agent-diag.txt
Event Type: BlobCreated
Event Time: 17/11/2021 23:26:07 +00:00
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.ChangeFeed;
namespace Changefeed
{
class Program
{
const string conString = "DefaultEndpointsProtocol=BlahBlah";
public static async Task<List<BlobChangeFeedEvent>> ChangeFeedAsync(string connectionString)
{
// Get a new blob service client.
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
// Get a new change feed client.
BlobChangeFeedClient changeFeedClient = blobServiceClient.GetChangeFeedClient();
List<BlobChangeFeedEvent> changeFeedEvents = new List<BlobChangeFeedEvent>();
// Get all the events in the change feed.
await foreach (BlobChangeFeedEvent changeFeedEvent in changeFeedClient.GetChangesAsync())
{
changeFeedEvents.Add(changeFeedEvent);
}
return changeFeedEvents;
}
public static void showEventData(List<BlobChangeFeedEvent> changeFeedEvents)
{
foreach (BlobChangeFeedEvent changeFeedEvent in changeFeedEvents)
{
string subject = changeFeedEvent.Subject;
string eventType = changeFeedEvent.EventType.ToString();
string eventTime = changeFeedEvent.EventTime.ToString();
string api = changeFeedEvent.EventData.ToString();
string operation = changeFeedEvent.EventData.BlobOperationName.ToString();
Console.WriteLine("Subject: " + subject + "\n" +
"Event Type: " + eventType + "\n" +
"Event Time: " + eventTime + "\n" +
"Operation Name: " + operation + "\n" +
"Api: " + api);
}
}
public static void Main(string[] args)
{
Console.WriteLine("Hello World!");
List<BlobChangeFeedEvent> feedlist = ChangeFeedAsync(conString).GetAwaiter().GetResult();
Console.WriteLine("Feedlist :" + feedlist.Count());
showEventData(feedlist);
}
}
}

Each blob has 2 system defined properties - created date and last modified which tells you when a blob was created and when it was last modified respectively.
When a blob is created, both of these properties will have the same value. However when the same blob is overwritten (i.e. content updated), only the last modified value is changed.
What you could do is use these properties to identify whether a new blob is created or content of an existing blob is updated.
You would still work with BlobCreated event. One additional step you would need to do is fetch the properties of the blob and compare these two properties to make the distinction.

Related

Download Images stored in Azure Blob Storage as Images using C#

I have stored a bunch of images in Azure Blob Storage. Now I want to retrieve them & resize them.
I have successfully managed to read much information from the account, such as the filename, the date last modified, and the size, but how do I get the actual image? Examples I have seen show me how to download it to a file, but that is no use to me, I want to download it as an image so I can process it.
This is what I have so far:
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(containerName);
Console.WriteLine("Listing blobs...");
// build table to hold the info
DataTable table = new DataTable();
table.Columns.Add("ID", typeof(int));
table.Columns.Add("blobItemName", typeof(string));
table.Columns.Add("blobItemLastModified", typeof(DateTime));
table.Columns.Add("blobItemSizeKB", typeof(double));
table.Columns.Add("blobImage", typeof(Image));
// row counter for table
int intRowNo = 0;
// divider to convert Bytes to KB
double dblBytesToKB = 1024.00;
// List all blobs in the container
await foreach (BlobItem blobItem in containerClient.GetBlobsAsync())
{
// increment row number
intRowNo++;
//Console.WriteLine("\t" + blobItem.Name);
// length in bytes
long? longContentLength = blobItem.Properties.ContentLength;
double dblKb = 0;
if (longContentLength.HasValue == true)
{
long longContentLengthValue = longContentLength.Value;
// convert to double DataType
double dblContentLength = Convert.ToDouble(longContentLengthValue);
// Convert to KB
dblKb = dblContentLength / dblBytesToKB;
}
// get the image
// **** Image thisImage = what goes here ?? actual data from blobItem ****
// Last modified date
string date = blobItem.Properties.LastModified.ToString();
try
{
DateTime dateTime = DateTime.Parse(date);
//Console.WriteLine("The specified date is valid: " + dateTime);
table.Rows.Add(intRowNo, blobItem.Name, dateTime, dblKb);
}
catch (FormatException)
{
Console.WriteLine("Unable to parse the specified date");
}
}
You need to open a read stream for your image, and construct your .NET Image from this stream:
await foreach (BlobItem item in containerClient.GetBlobsAsync()){
var blobClient = containerClient.GetBlobClient(item.Name);
using Stream stream = await blobClient.OpenReadAsync();
Image myImage = Image.FromStream(stream);
//...
}
The blobclient class also exposes some other helpful methods, like a download to a stream.

Blob storage id can not inserting to Azure SQL database through Azure function

I have created a function app and the purpose is to insert the blob storage text details into Azure database
"Title" column in created, the function call "BlobTriggerCSharp1" and created with c#
#r "System.Configuration"
#r "System.Data"
using System.Configuration;
using System.Data.SqlClient;
using System.Threading.Tasks;
using System;
public static void Run(Stream myBlob, string name, TraceWriter log)
{
log.Info($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes");
string detail = ($"{name}");
var str = ConfigurationManager.ConnectionStrings["sqldb_connection"].ConnectionString;
using (SqlConnection conn = new SqlConnection(str))
{
conn.Open();
var text = "INSERT INTO PhotoTable(ID,CreatedAt,UpdatedAt,IsDeleted, Url, Title)VALUES (284,SYSDATETIMEOFFSET(),SYSDATETIMEOFFSET(), 'true', 'yrhrh', {name})";
using (SqlCommand cmd = new SqlCommand(text, conn))
{
// Execute the command and log the # rows affected.
var rows = cmd.ExecuteNonQueryAsync();
log.Info($"{rows} rows were updated");
}
}
}
so when the text blob uploads the function triggers but it doesn't enter the blob id in to the column in the Azure SQLdatabase ( database columns are defined in the query ID,CreatedAt,UpdatedAt,IsDeleted, Url, Title) as it shows above the parameter i am going to pass to the database is "{name}" ( because Name:{name} is the variable of id in the blob-storage) which is the Blob-storage unique id ( in other words need to transfer the {name} to the Title column in database) , i am so confused what i am missing and don't know my way of going to transfer is wrong or else the datatype is wrong, so confused and stuck, help will be greatly appreciate , thanks
You should change your INSERT statement to add parameters to it. Here is an example with Name:
var text = "INSERT INTO PhotoTable(ID,CreatedAt,UpdatedAt,IsDeleted, Url, Title) " +
"VALUES (284,SYSDATETIMEOFFSET(),SYSDATETIMEOFFSET(), 'true', 'yrhrh', #Name)";
using (SqlCommand cmd = new SqlCommand(text, conn))
{
cmd.Parameters.AddWithValue("#Name", name);
// Execute the command and log the # rows affected.
var rows = cmd.ExecuteNonQueryAsync();
log.Info($"{rows} rows were updated");
}

Get ListBlobs() sorted by LastModifiedDate?

I have 30000 images in blob storage and I want to fetch the images in descending order of modified date. Is there any way to fetch it in chunks of 1000 images per call?
Here is my code but this take too much time. Basically can i sort ListBlobs() by LastUpdated date?
CloudBlobContainer rootContainer = blobClient.GetContainerReference("installations");
CloudBlobDirectory dir1;
var items = rootContainer.ListBlobs(id + "/Cameras/" + camId.ToString() + "/", false);
foreach (var blob in items.OfType<CloudBlob>()
.OrderByDescending(b => b.Properties.LastModified).Skip(1000).Take(500))
{
}
Basically can i sort ListBlobs() by LastUpdated date?
No, you can't do server-side sorting on LastUpdated. Blob Storage service returns the data sorted by blob's name. You would need to fetch the complete data on the client and sort it there.
Other alternative would be to store the blob's information (like blob's URL, last modified date etc.) in a SQL Database and fetch the list from there. There you will have the ability to sort the data any way you like it.
I have sorted the blobs in last modified order as in the below example and it is the only solution I could think of :)
/**
* list the blob items in the blob container, ordered by the last modified date
* #return
*/
public List<FileProperties> listFiles() {
Iterable<ListBlobItem> listBlobItems = rootContainer.listBlobs();
List<FileProperties> list = new ArrayList<>();
for (ListBlobItem listBlobItem : listBlobItems) {
if (listBlobItem instanceof CloudBlob) {
String substring = ((CloudBlob) listBlobItem).getName();
FileProperties info = new FileProperties(substring, ((CloudBlob) listBlobItem).getProperties().getLastModified());
list.add(info);
}
}
// to sort the listed blob items in last modified order
list.sort(new Comparator<FileProperties>() {
#Override
public int compare(FileProperties o1, FileProperties o2) {
return new Long(o2.getLastModifiedDate().getTime()).compareTo(o1.getLastModifiedDate().getTime());
}
});
return list;
}

Saving & Testing Stored Procedures/Triggers (maybe User Defined Functions) For Partitioned Collections

I'm receiving the following error when attempting save modifications to a Stored Procedure that has been created within a partitioned collection:
Failed to save the script
Here is the details from within the Azure Portal:
Operation name Failed to save the script Time stamp Fri Feb 17 2017
08:46:32 GMT-0500 (Eastern Standard Time) Event initiated by
- Description Database Account: MyDocDbAccount, Script: bulkImport, Message: {"code":400,"body":"{\"code\":\"BadRequest\",\"message\":\"Replaces
and upserts for scripts in collections with multiple partitions are
not supported.
The Stored Procedure in question is the example "bulkImport" script that can be found here.
There is a known missing capability (bug, if you prefer) in DocumentDB right now where you cannot update existing stored procedures in a partitioned collection. The workaround is to delete it first and then recreate it under the same name/id.
Contrary to the error message, it turns out that _client.ReplaceStoredProcedureAsync(...) does work (as of June 2018) on partitioned collections. So you can do something like this:
try
{
await _client.CreateStoredProcedureAsync(...);
}
catch(DocumentClientException dex) when (dex.StatusCode == HttpStatusCode.Conflict)
{
await _client.ReplaceStoredProcedureAsync(...);
}
Once your SP gets created the 1st time, you will never have any time quanta when it isn't available (due to deletion + recreation).
This extension method can handle add or update of a stored procedure.
public static async Task AddOrUpdateProcedure(this DocumentClient client,
string databaseId,
string collectionId,
string storedProcedureId,
string storedProcedureBody)
{
try
{
var documentCollectionUri = UriFactory.CreateDocumentCollectionUri(databaseId, collectionId);
await client.CreateStoredProcedureAsync(documentCollectionUri, new StoredProcedure
{
Id = storedProcedureId,
Body = storedProcedureBody
});
}
catch (DocumentClientException ex) when (ex.StatusCode == HttpStatusCode.Conflict)
{
var storedProcedureUri = UriFactory.CreateStoredProcedureUri(databaseId, collectionId, storedProcedureId);
var storedProcedure = await client.ReadStoredProcedureAsync(storedProcedureUri);
storedProcedure.Resource.Body = storedProcedureBody;
await client.ReplaceStoredProcedureAsync(storedProcedure);
}
}
As of now, updating a stored procedure still does not work in Azure Portal / CosmosDB Data explorer. There is a Cosmos DB Extension for Visual Studio Code where this works. However I don't see a way of executing the procedure from the extension like I can from Data Explorer.
try
{
var spResponse = await dbClient.CreateStoredProcedureAsync($"/dbs/{dataRepoDatabaseId}/colls/{collectionName}", new StoredProcedure
{
Id = sp.Item1,
Body = sp.Item2
}, new RequestOptions { PartitionKey = new PartitionKey(partitionKey) });
}
catch (DocumentClientException dex) when (dex.StatusCode == HttpStatusCode.Conflict)
{
//Fetch the resource to be updated
StoredProcedure sproc = dbClient.CreateStoredProcedureQuery($"/dbs/{dataRepoDatabaseId}/colls/{collectionName}")
.Where(r => r.Id == sp.Item1)
.AsEnumerable()
.SingleOrDefault();
if(!sproc.Body.Equals( sp.Item2))
{
sproc.Body = sp.Item2;
StoredProcedure updatedSPResponse = await dbClient.ReplaceStoredProcedureAsync(sproc);
}
}

Azure autoscale metricname values

I need to define scale rule for my virtual machine I have read the following
The MetricName and MetricNamespace are not values I just made up.
These have to be precise. You can get these values from the
MetricsClient API and there is some sample code in this link to show
how to get the values.
http://rickrainey.com/2013/12/15/auto-scaling-cloud-services-on-cpu-percentage-with-the-windows-azure-monitoring-services-management-library/
But its still not clear ho do I get a MetricName list of possible values as I didn't found any sample code for it
Here is the code I used to get the available MetricNames for the cloud service. It was part of a unit test project, hence the [TestMethod] attribute.
[TestMethod]
public async Task GetMetricDefinitions()
{
// Build the resource ID string.
string resourceId = ResourceIdBuilder.BuildCloudServiceResourceId(
cloudServiceName, deploymentName, roleName );
Console.WriteLine("Resource Id: {0}", resourceId);
//Get the metric definitions.
var retrieveMetricsTask =
metricsClient.MetricDefinitions.ListAsync(resourceId, null, null, CancellationToken.None);
var metricListResponse = await retrieveMetricsTask;
MetricDefinitionCollection metricDefinitions = metricListResponse.MetricDefinitionCollection;
// Make sure something was returned.
Assert.IsTrue(metricDefinitions.Value.Count > 0);
// Display the metric definitions.
int count = 0;
foreach (MetricDefinition metricDefinition in metricDefinitions.Value)
{
Console.WriteLine("MetricDefinitio: " + count++);
Console.WriteLine("Display Name: " + metricDefinition.DisplayName);
Console.WriteLine("Metric Name: " + metricDefinition.Name);
Console.WriteLine("Metric Namespace: " + metricDefinition.Namespace);
Console.WriteLine("Is Altertable: " + metricDefinition.IsAlertable);
Console.WriteLine("Min. Altertable Time Window: " + metricDefinition.MinimumAlertableTimeWindow);
Console.WriteLine();
}
}
Here is the output of the test for my cloud service:

Resources